[478] in athena10
Re: Simulating AFS outages with pyhesiodfs
daemon@ATHENA.MIT.EDU (Jonathan D Reed)
Wed Sep 3 14:36:37 2008
Date: Wed, 3 Sep 2008 14:35:51 -0400 (EDT)
From: Jonathan D Reed <jdreed@MIT.EDU>
To: Quentin Smith <quentin@MIT.EDU>
cc: athena10@MIT.EDU
In-Reply-To: <Pine.LNX.4.64L.0809031431540.11949@vinegar-pot.mit.edu>
Message-ID: <Pine.LNX.4.64L.0809031434110.555@infinite-loop.mit.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
I mean AFS outages, where some portion of AFS (usually the FS) has fallen
over.
I was not implying pyHesiodFS was at fault, but in a previous release
team, it was suggested that my previous tests were out of date, as they
were using afuse. As I said, pyHesiodFS recovers much faster from an
outage, so presumably it must be doing something different as compared to
afuse.
-Jon
On Wed, 3 Sep 2008, Quentin Smith wrote:
> Hi Jonathan,
>
> One question about this - when you say an "AFS outage", are you meaning a
> full network outage, as in the Hesiod servers are also unreachable, or are
> you just meaning that the AFS server containing the user's locker is
> unreachable?
>
> If it's the latter, I don't expect pyHesiodFS to contribute to any problems.
>
> --Quentin
>
> On Wed, 3 Sep 2008, Jonathan D Reed wrote:
>
>> pyhesiodfs seems to behave slightly better in an AFS outage than afuse did.
>> Here is what I encountered:
>>
>> -If the locker had already been accessed (say, by a successful login), then
>> the Failsafe Terminal option will work, but attempts to access other
>> directories will fail with "Timed out" errors. Presumably this is due to
>> AFS caching, and might eventually fail if enough time elapses.
>>
>> -Failsafe GNOME just plain doesn't work (it attempts to create the Nautilus
>> directories, presumably because it can't read them, and fails)
>>
>> -If the locker has not already been accessed (say, after a reboot), it
>> hangs at the login screen, and eventually asks the user if they want to log
>> in with / as their home directory. If you say yes, it eventually presents
>> the GNOME background, but just sits there (I gave up after 10 minutes).
>> If you attempt to Ctrl-Alt-Bksp out of the situation, X does not respawn.
>> ps shows that X and gdmgreeter are defunct processes, but nothing I tried
>> was able to cause them to completely die or to respawn X.
>>
>> However, pyhesiodfs seems to recover from server outages much faster than
>> afuse did.
>>
>> The situation is still pretty much unusuable, though. Rather than try and
>> let Ubuntu figure out what's up with the user's homedir, is it worth adding
>> code to the login sequence that explicitly checks if a user's homedir is
>> available and punts if it isn't?
>>
>