[3272] in Release_7.7_team

home help back first fref pref prev next nref lref last post

Re: Draft of Disconnected Operation White Paper.

daemon@ATHENA.MIT.EDU (John Hawkinson)
Thu May 9 11:35:37 2002

Date: Thu, 9 May 2002 11:35:34 -0400 (EDT)
Message-Id: <200205091535.LAA14998@multics.mit.edu>
To: Bill Cattey <wdc@mit.edu>
CC: release-team@mit.edu, warlord@mit.edu
In-reply-to: "[3271] in Release_7.7_team"
From: John Hawkinson <jhawk@MIT.EDU>

Hi, Bill,

  I hope you don't mind my chiming in here. I think this looks good,
but I have a fair amount of feedback, also.

| ----- operation.txt ----
| 	  Issues in Disconnected Operation
| 	  For UNIX Athena

It is worth noting to me that some of the things you label as
requirements for "disconnected" operation are really just things we
want for the desktop environment anyway, and really should have had
long ago.

It's not OK for desktop workstations to do most of these bad things
(e.g. go south when an AFS server falls off the planet). It's much
worse for laptops, but the requirements shouldn't be anything new.

| was restored.  The networking infrastructure evolved to a level of
| robustness where that error was infrequent enough that no real effort
| was expended in coping with disconnected operation.

So, I look at this and say, "while this is true, we really should have
cared then, too."

| Nowadays, nomadic computing, using laptop systems, is growing in

I have to say, "nomadic computing" feels like gratuitous buzzword
usage, to the point of making one want to stop reading entirely, print
your email, and run it through the cross-cut shreder. I suspect most
people would just stop reading ;-)

| Alan Kay, in describing the ideal scenarios for use of an ideal
| nomadic computing platform envisioned a computer, that could recover
| from being dropped off a cliff by noticing the sudden change of
| altitude, and radioing to have a replacement system parachuted over to
| the user.

Umm, this sentence does not parse. I think you need a comma after "platform,"
and need to remove the comma after "computer." Then there's a minor verb
agreement issue.

I mention this not to be anal and pedantic (as usual), but to point
out that it's really hard to figure out what exactly you were trying
to convey.

| The overarching principle guiding the development of Athena UNIX
| Disconnected operation is:
| 
|     Attempts to do ordinary things will not hang the system.

Maybe we should strive for this on the desktop, too (ls -l /afs). But
yes, you're quite right.

| One overarching recommendation to guide all services:
| 
| Recommendation 1:
| 
| Shift explicitly from the traditional Athena paradigm of utilizing
| network-based services by default or preferentially to local services 
| to a nomadic paradigm of utilizing local system capabilities first,
| and not going to the network unless specifically asked by the user to
| do so.

I like this, but I think it could be summarized in one line after
the words "Recommendation 1:", and that sort of summary has value.
e.g. "Local is better than Remote."

| Email delivery:
| 
| Email delivery is actually already a solved case.  The sendmail
| program already detects when mail cannot be immediately delivered.
| In that case outgoing mail is enqueued, and periodic attempts are
| automatically made to deliver what is in the queue.

I wouldn't call it solved. I want the queue to be run the instant
I plug in the network. This isn't hard, so I think its worth doing,
rather than letting it go by the wayside.

Users want and expect (and not unreasonably, in my opinion), instantaneous
email delivery, and are frequently frustrated with the current system
(mostly due to thinsg outside release-team's control). A laptop user
wants to be able to compose 100 email messages while offline,
and then go plug their laptop in for 5 seconds to send them all.

I think that means that not only should the queue be run automatically
when the network comes up, but that the user should be able to force
the queue to be run, and also to verify that the queue run has
completed successfully, so they can know they can unplug their laptop
and move on.

| Network printing:
| 
| Athena enhanced printing to utilize a network-based service, Hesiod,
| to suplement the hard-coded file of printers and their capabilities.
| The lprng subsystem that implements printing on Athena will look first
| in the local file before attempting a Hesiod query, but if the user
| misspells the printer name, the recovery is less than perfectly
| graceful: Seeing no entry in the local printcap file, lprng will then
| make a Hesiod query.  The DNS service, upon which Hesiod is based,
| will take 30 seconds or so to time out.

I think it's probably important to break this out seperately...there
are a lot of other things that depend on the DNS, and it is annoying
if/when they hang. 

Of course, I haven't used Athena Linux on a laptop, so I don't know,
but on my laptop, when the network is down, DNS returns pretty much
instantly; this is because I've accepted a trade-off and don't run
named, so there are no routes to functional nameservers. I suspect the
technical solution for Athena Linux isn't so trivial, but perhaps
worth thinking about.

| In theory, it would be possible to make printing act in analagous
| fashion to Email delivery, holding onto the job until the network comes
| up and send it out then.  This would create problems when
| communicating with authenticated print servers because the user's
| tickets may have expired by the time the network is back up, and
| communicating that fact back to a user might be tricky.

It seems like requiring the user to re-authenticate at network start
time is a reasonable thing (presuming it only bothers to do so if your
tickets are close to expiring); expecially once we have a gnome applet
to deal with this sort of thing.

| Recommendation 3: Implement a print model of, "Fail the job if we
| cannot print it now, and if we cannot make contact with the remote
| spooler now.

Failing the jobs can really suck if it is highly nontrivial to regenerate
them, which is often the case. I think a better solution is desirable...

| Kerberized login:
|
| Athena users normally expect that when they log on, that kerberos
| tickets are acquired for them automatically, and that no further
| action need be taken to access secure services.  Trying to do this by
| default on a sometimes disconnected system is probably the wrong
| design.  It would result in long pauses if the network was
| disconnected appearing to the user as a login hang.
| 
| Recommendation 4: Set explicit expectations that the default login
| mode is to NOT fetch Kerberos tickets at login time.

I think this is the wrong model. I think you want:

	Get tickets if the user logs in and the network is up, otherwise
	don't.

	When the network comes up, prompt the user to get tickets if they
	don't have nonexpired tickets. This is made easier with a gui
	renew.

| time synchronization:
| 
| Having a background task keep the clock in synch with an external
| source is a real convenience feature.  Kerberos will not authenticate
| if the client host is more than five minutes out of time synch with
| the kerberos host. But if there is no network, then there is no way to
| communicate with the time synchronization service.
| 
| Recommendation 5: Enable time synchronization, but remember to
| properly bring it down and up when the network goes down and up.

Err. I don't think is what you want to do with ntpd, at least;
it does good things to your clock even when you're not connected to the
network. Though making sure that it has the ability to be poked to
update the clock when the network comes back up would be good.

I assume you're going to skip over silly issues like, "What if the user
moves timezones." Because only some people (e.g. me) try to hurt themselves
that way...

| Auto update:
| 
| Recommendation 6: Disable auto update by default on laptops.
...
| Issue 2: Consider providing ways to notify nomadic users of the
| availability of updates that go beyond the current Athena
| implementations.

Sounds right to me. Again, a gui application that prompts you to
update your machine "if it's a good time" might be th eway to go.

| Zephyr instant messaging:
| 
| On the fase of it, it seems pretty silly to expect instant messaging

typo.

I think I must be getting tired of reading, because most of the rest
looks good to me and I don't have any little surgical comments. ;-)
But really, it does look good.

--jhawk

home help back first fref pref prev next nref lref last post