[4529] in Release_7.7_team

home help back first fref pref prev next nref lref last post

Disconnected operation testing results

daemon@ATHENA.MIT.EDU (Greg Hudson)
Thu May 13 14:56:00 2004

Date: Thu, 13 May 2004 14:55:53 -0400
Message-Id: <200405131855.i4DItrAd027570@equal-rites.mit.edu>
From: Greg Hudson <ghudson@MIT.EDU>
To: release-team@MIT.EDU

I borrowed Andrew's test laptop and did some disconnected operation
testing.

First, I reinstalled the machine, updated it to 9.3, and set
PUBLIC=false and DISCONNETABLE=true.  The machine has both wireless
and wired interfaces; for a first testing run, I configured it to use
only the wired interface.  I ran "offlinehome systest" to create a
local homedir for systest.

Then, I attempted booting the machine without the network cable
plugged in, and logging in as systest with and without network.

I found the following issues:

  * There is a syntax error in offlinhome.  (Fix already submitted.)

  * ifplugd beeps when link beat is detected or lost (two different
    frequencies).  We will want to document how to turn that off,
    although it's useful default behavior.

  * Booting without the network cable plugged in, athstatusd came up
    with the network device still configured, so the machine hung
    trying to get the time.  This is expected under 9.2 since I'm
    using a static IP address, but not under 9.3 with ifplugd running.
    ifplugd apparently does not bring down a network interface if it
    initially detects no link beat.  Its documentation says that
    ifplugd-managed interfaces should not be configured automatically
    by the native OS, which I took to mean that they should be set
    ONBOOT=false.

    I locally fixed this problem on the test laptop by setting
    ONBOOT=false in ifcfg.eth0 and turning off SYNCCONFIG.  We should
    make syncconf do this on disconnectable machines.  (I'm not sure
    how this will interact with PC-card network adapters, since those
    normally come up during /etc/init.d/pcmcia even with
    ONBOOT=false.)

  * Booting without network, the machine took a long time to come up.
    Each AFS operation would try all the VL servers and take a couple
    of seconds to time out on each one; this wouldn't be a problem
    except that it happened over and over again.  My guess is that
    OpenAFS 1.2.11 has regressed by not caching failures for as long,
    or something.  (Derek claims that under Athena 9.2, a system comes
    up much faster without network, without retrying the VL servers
    for each AFS operation.)

  * Logging in as systest without network (after the machine
    eventually came up):

    - I got "Login Incomplete" dialogs about Kerberos tickets and
      about the group list.

    - There was a substantial delay (maybe 30 seconds) at the
      beginning of the login, but no message to indicate what was
      going on.

    - zwgc complained about being unable to contact zhm (no big
      surprise; it isn't running).

    - mailquota and from complain.

    - authwatch immediately came up and bitched about not having
      credentials.

    - The panel icons didn't come up for several minutes (plugging in
      the network cable also makes them come up).  This is almost
      certainly because /var/athena/menus was pointed into AFS space
      after the panel menus synchronization failed.

  * Upon plugging in the network cable, the console told me that my
    address had changed and it was getting new krb4 tickets.  (Should
    this be silent, since it's meaningless to no users?)  Of course,
    my address had not changed, and I had no krb5 credentials to get
    krb4 tickets with, so it then spit out an error about that.

    At this point, if I wanted to access network services from that
    login, I would have had to know to renew authentication by hand.
    There is an argument for popping up a grenew window at this point
    (and an argument against, particularly for wireless machines, but
    it's not clear that it's a compelling one).

  * I noticed that if I plugged in the network cable just before
    logging in, the network hadn't come up (even though ifplugd had
    beeped and we have it configured for a zero-second delay).
    ifplugd beeped again a few seconds later and I had network.

  * If I log in with network, start Evolution, and then disconnect the
    network cable, Evolution predictably times out trying to retrieve
    the next message.  However, if I configure the inbox folder for
    disconnected operation and click the disconnect button at the
    lower left before unplugging, it works great.  This is all
    expected, and is just something we should document.

I also ran into some apparently deeper problems with RHEL3 on the
machine: it was pretty sluggish with only 128MB of RAM, it hung at
startup trying to configure the USB device once, and once after
logging out it gave me a blank screen and didn't respond to input
(although it did fiddle with the disk on and off for a little while
after that, according to the LEDs).

We need to fix the offlinehome syntax error (of course), the
booting-without-network error, and probably the new OpenAFS slowness.
We can document around the various inconveniences of logging in with
no network, but we should explore fixing them using the athneteventd
framework.

home help back first fref pref prev next nref lref last post