[4529] in Release_7.7_team
Disconnected operation testing results
daemon@ATHENA.MIT.EDU (Greg Hudson)
Thu May 13 14:56:00 2004
Date: Thu, 13 May 2004 14:55:53 -0400
Message-Id: <200405131855.i4DItrAd027570@equal-rites.mit.edu>
From: Greg Hudson <ghudson@MIT.EDU>
To: release-team@MIT.EDU
I borrowed Andrew's test laptop and did some disconnected operation
testing.
First, I reinstalled the machine, updated it to 9.3, and set
PUBLIC=false and DISCONNETABLE=true. The machine has both wireless
and wired interfaces; for a first testing run, I configured it to use
only the wired interface. I ran "offlinehome systest" to create a
local homedir for systest.
Then, I attempted booting the machine without the network cable
plugged in, and logging in as systest with and without network.
I found the following issues:
* There is a syntax error in offlinhome. (Fix already submitted.)
* ifplugd beeps when link beat is detected or lost (two different
frequencies). We will want to document how to turn that off,
although it's useful default behavior.
* Booting without the network cable plugged in, athstatusd came up
with the network device still configured, so the machine hung
trying to get the time. This is expected under 9.2 since I'm
using a static IP address, but not under 9.3 with ifplugd running.
ifplugd apparently does not bring down a network interface if it
initially detects no link beat. Its documentation says that
ifplugd-managed interfaces should not be configured automatically
by the native OS, which I took to mean that they should be set
ONBOOT=false.
I locally fixed this problem on the test laptop by setting
ONBOOT=false in ifcfg.eth0 and turning off SYNCCONFIG. We should
make syncconf do this on disconnectable machines. (I'm not sure
how this will interact with PC-card network adapters, since those
normally come up during /etc/init.d/pcmcia even with
ONBOOT=false.)
* Booting without network, the machine took a long time to come up.
Each AFS operation would try all the VL servers and take a couple
of seconds to time out on each one; this wouldn't be a problem
except that it happened over and over again. My guess is that
OpenAFS 1.2.11 has regressed by not caching failures for as long,
or something. (Derek claims that under Athena 9.2, a system comes
up much faster without network, without retrying the VL servers
for each AFS operation.)
* Logging in as systest without network (after the machine
eventually came up):
- I got "Login Incomplete" dialogs about Kerberos tickets and
about the group list.
- There was a substantial delay (maybe 30 seconds) at the
beginning of the login, but no message to indicate what was
going on.
- zwgc complained about being unable to contact zhm (no big
surprise; it isn't running).
- mailquota and from complain.
- authwatch immediately came up and bitched about not having
credentials.
- The panel icons didn't come up for several minutes (plugging in
the network cable also makes them come up). This is almost
certainly because /var/athena/menus was pointed into AFS space
after the panel menus synchronization failed.
* Upon plugging in the network cable, the console told me that my
address had changed and it was getting new krb4 tickets. (Should
this be silent, since it's meaningless to no users?) Of course,
my address had not changed, and I had no krb5 credentials to get
krb4 tickets with, so it then spit out an error about that.
At this point, if I wanted to access network services from that
login, I would have had to know to renew authentication by hand.
There is an argument for popping up a grenew window at this point
(and an argument against, particularly for wireless machines, but
it's not clear that it's a compelling one).
* I noticed that if I plugged in the network cable just before
logging in, the network hadn't come up (even though ifplugd had
beeped and we have it configured for a zero-second delay).
ifplugd beeped again a few seconds later and I had network.
* If I log in with network, start Evolution, and then disconnect the
network cable, Evolution predictably times out trying to retrieve
the next message. However, if I configure the inbox folder for
disconnected operation and click the disconnect button at the
lower left before unplugging, it works great. This is all
expected, and is just something we should document.
I also ran into some apparently deeper problems with RHEL3 on the
machine: it was pretty sluggish with only 128MB of RAM, it hung at
startup trying to configure the USB device once, and once after
logging out it gave me a blank screen and didn't respond to input
(although it did fiddle with the disk on and off for a little while
after that, according to the LEDs).
We need to fix the offlinehome syntax error (of course), the
booting-without-network error, and probably the new OpenAFS slowness.
We can document around the various inconveniences of logging in with
no network, but we should explore fixing them using the athneteventd
framework.