[3302] in Release_7.7_team

home help back first fref pref prev next nref lref last post

Re: Athena Disconnected Operation White Paper Draft 2.

daemon@ATHENA.MIT.EDU (Greg Hudson)
Sat May 25 12:33:22 2002

From: Greg Hudson <ghudson@MIT.EDU>
To: Bill Cattey <wdc@mit.edu>
Cc: source-developers@mit.edu, release-team@mit.edu
In-Reply-To: <1022277980.1310.80.camel@tokata.mit.edu>
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Date: 25 May 2002 12:33:18 -0400
Message-Id: <1022344398.3992.57.camel@error-messages.mit.edu>
Mime-Version: 1.0

On Fri, 2002-05-24 at 18:06, Bill Cattey wrote:
> Question 1: Do users expect to produce a file to print, and expect
> instant feedback on success of the job, or would they prefer a
> persistent queue like the outgoing Email is done?

I think the "fail on no network" model will be fairly acceptable to most
users.  Implementing a queue would probably be too hard for the amount
of benefit it would bring.

So rather than "queue the job in version 2," I would say, "don't plan to
implement local print job queueing until we see an easy opportunity to
do so."

> This is a variation on the old "kerberometer" theme where a window
> would pop up warning the user

Accuracy nit: Kerberometer displayed a fuel gauge showing your remaining
ticket lifetime.  It was cute, but poorly written, and nobody used it
anyway.  Popping up a window was handling by dash in releases up to 8.4,
and will be handled by mwhitson's "authwatch" in releases 9.1 on.

You may have been confused by the name "gkerberometer" being tossed
around as a project name.  People used that name either because they
thought it was cute, or because they envisioned that the program would
work as a panel applet which displayed the fuel gauge when it wasn't
popping up annoying windows.  (We can't add applets to people's panels
except for new users, though, so we didn't go with that idea.)

> Question 2: Should users be prompted for a password to renew their
> tickets, or should they be prompted to run the kerberos ticket
> fetching utility?

dash and authwatch behave the way they do because we didn't want to
teach users to type in their Kerberos passwords whenever a window came
up asking them to do so.  (Not that they probably won't anyway, but
there's a difference between accepting bad habits and encouraging
them.)  "Only type your Kerberos password at login time" was a nice,
simple rule.  Now, unfortunately, it's "only type your Kerberos password
at login time and when getting certificates."  I'm guessing "only type
your Kerberos password at login time and when getting certificates and
when you go on the network" isn't too bad, but I suppose it's worth
seeing what the integration team thinks.

Question: when we talk about network state changes, are we restricting
ourselves to explicit state changes (user unplugs network cable or
card), or implicit state changes (wireless laptop detects
loss/restoration of signal)?  If there are implicit state changes, than
popping up a window asking for a Kerberos password in response to such a
state change is just as bad as popping up such a window in response to
ticket expiry.  If we're only talking about explicit state changes, then
at least the window comes up in response to a specific user action.

> The Windows and Mac Leash user interface should probably set the
> standard of usability here.

Maybe.  My experience with the Winzephyr usability testing taught me
that you should only make these statements after determining that the
Windows user interface is at least marginally usable.

> Recommendation 5: Enable time synchronization.
> 
> This too is a case where appropriate demons can be created that act
> appropriately on network events.

That's a pretty vague recommendation.  I'm guessing that you aren't
saying something more specific because of lack of knowledge.  I'll fill
in what I know.

What we do now:

  * We run ntpd, but we only listen to broadcast time sync packets.  So
you only get time synchronization this way if you're on a network which
gets broadcast time sync packets (most or all of MITnet, but probably
not on some private MIT networks or outside-MIT networks).

  * We let AFS do time synchronization.  It has a two-second granularity
and, at least in the past, has been known to simply stop working in some
cases.

  * Every twelve hours, we run "gettime -s time", which will forcibly
set the clock to the right time even if that means going backwards. 
This is a hedge against the case where there are no broadcast time sync
packets and AFS time synchronization isn't operating.

Some things we could do:

  * We could make ntpd peer with time.mit.edu, like MIT MacOS machines
already do.  This would create additional network traffic but would work
on all networks.  It's not clear that that the additional traffic would
be above the noise level.

  * We could turn off AFS and gettime synchronization, since we would be
more confident in ntpd doing the right thing.

  * When the network comes back up, we could try to convince ntpd to at
least start synchronizing the time right then.

One issue is that, if the time is ahead by half an hour, we don't really
want to be nice and slowly step the time, because the user can't get
Kerberos tickets if we're off by more than five minutes.  Frequently,
coming back from a suspend or hibernation yields a wildly inaccurate
clock, so I suspect we'd see this case fairly often.
 
> Recommendation 7: Pop-up announcement of new versions.
> 
> Although pop-ups are annoying to users, one way to create an incentive
> to update is with annoying pop-ups every time the network comes back
> alerting the user that an update is available and that running the
> update_ws utility is strongly recommended.  Producing this is low
> effort.

One of the persistent issues here is that the user who sees the popup
isn't necessarily the machine administrator.  Less likely for laptops,
certainly, but we do have to worry about existing AUTOUPDATE=false
private workstations when we make our changes.

Your "hot-fix tier of updates" idea made me realize that, with some
effort, we could probably make the Linux update system never suffer due
to network failures during updates, at the expense of requiring some
extra free disk space during updates.  It would work like this:

  1. Determine what RPMs need to be installed.
  2. Copy them locally, checking carefully for errors.
  3. Begin the update using the local RPMs.

If network is lost during steps 1 or 2, no damage is done because the
machine hasn't been altered.  If network is lost during step 3, it
doesn't matter.

On Zephyr:
> Auto-renewing of
> subscriptions should be possible without user intervention without a
> lot of effort.

I'm not convinced.  Right now zwgc doesn't know what subscriptions you
have ("zctl sub" goes directly to the zephyr server, without zwgc ever
finding out about it), so it can't renew subscriptions unless it has
advance notification of the network going down.

We could do a half-assed job by making zwgc periodically pull down a
list of subscriptions.  That would generate a lot of traffic and load on
the servers (perhaps not enough to really matter, I suppose), and would
fail if you "zctl sub" shortly before taking down the network.

Also, remember that the Zephyr source base isn't very tractable in
general.

> The use of AFS in Athena has been based on the always connected
> assumption.  More so than Zephyr, the AFS internals assume the network
> is always connected.  For a while, AFS did not gracefully handle
> shutdowns as part of the usual system shutdown process, but that
> problem has finally been fixed.

I'm not sure what that last sentence is doing here.  It has nothing to
do with disconnected operation.

> Recommendation 15: Write a decent tool to synchronize a local
> file hierarchy with an AFS file hierarchy.

There is an existing tool called "unison"
(http://www.cis.upenn.edu/~bcpierce/unison/) which probably does
everything we could hope for except have a GUI.

But I'm not certain what this recommendation accomplishes.  Naive users
won't want to synchronize their home directories as a collection of
files; they won't understand what it means to run into a conflict in
~/.mozilla/username/garbagestring/history.dat.  They may want to
synchronize higher-level concepts, like documents and preferences. 
Unfortunately, doing that well requires application-specific knowledge.

I think the only thing we can do here is set user expectations: they
have a separate home directory on their laptop, they cannot synchronize
preferences, and they must manually copy around documents between home
directories.  Fortunately, this is exactly what users of non-Athena
systems are used to.


home help back first fref pref prev next nref lref last post