[1548] in Release_7.7_team

home help back first fref pref prev next nref lref last post

Release workaround for SGI AFS lossage?

daemon@ATHENA.MIT.EDU (Greg Hudson)
Wed Nov 25 10:34:28 1998

Date: Wed, 25 Nov 1998 10:34:22 -0500
From: Greg Hudson <ghudson@MIT.EDU>
To: release-team@MIT.EDU, ops@MIT.EDU

I should have thought of this months ago.  It's pretty evil, but the
situation right now with random hours-long outages of 72GB AFS servers
is intolerable.

We could probably decrease the incidences of SGI AFS lossage to zero
by having something in the release which destroys user AFS tokens one
minute before they would have expired anyway.  The simplest thing to
do is probably to hack the functionality into elmer.  It wouldn't add
any resource consumption (apart from a slightly larger elmer code
segment) for the normal login case.

Disadvantages:

	* It's not totally transparent.  Users with sufficient clue
	  might get confused when their tokens disappear instead of
	  expire.

	* We can't be 100% sure that destroying the tokens will
	  prevent the problem.  There are ways to escalate,
	  e.g. threatening at 9h30m to log the user out at 9h55m if
	  they don't renew their tickets, but they are much less
	  transparent and involve much more hair.

	* Eliminating incidences of a problem we can't reproduce would
	  make it essentially impossible to debug the problem, so we'd
	  be stuck with that bit of hair forever.

If we do go this route, there is a question of "when."  There is an
argument for doing it ASAP instead of waiting until the end of finals
(better the evil we don't know when the evil we know is this bad), but
of course it would take some time to develop the hack.

Comments?

home help back first fref pref prev next nref lref last post