[478] in athena10

home help back first fref pref prev next nref lref last post

Re: Simulating AFS outages with pyhesiodfs

daemon@ATHENA.MIT.EDU (Jonathan D Reed)
Wed Sep 3 14:36:37 2008

Date: Wed, 3 Sep 2008 14:35:51 -0400 (EDT)
From: Jonathan D Reed <jdreed@MIT.EDU>
To: Quentin Smith <quentin@MIT.EDU>
cc: athena10@MIT.EDU
In-Reply-To: <Pine.LNX.4.64L.0809031431540.11949@vinegar-pot.mit.edu>
Message-ID: <Pine.LNX.4.64L.0809031434110.555@infinite-loop.mit.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

I mean AFS outages, where some portion of AFS (usually the FS) has fallen 
over.

I was not implying pyHesiodFS was at fault, but in a previous release 
team, it was suggested that my previous tests were out of date, as they 
were using afuse.   As I said, pyHesiodFS recovers much faster from an 
outage, so presumably it must be doing something different as compared to 
afuse.

-Jon

On Wed, 3 Sep 2008, Quentin Smith wrote:

> Hi Jonathan,
>
> One question about this - when you say an "AFS outage", are you meaning a 
> full network outage, as in the Hesiod servers are also unreachable, or are 
> you just meaning that the AFS server containing the user's locker is 
> unreachable?
>
> If it's the latter, I don't expect pyHesiodFS to contribute to any problems.
>
> --Quentin
>
> On Wed, 3 Sep 2008, Jonathan D Reed wrote:
>
>> pyhesiodfs seems to behave slightly better in an AFS outage than afuse did. 
>> Here is what I encountered:
>> 
>> -If the locker had already been accessed (say, by a successful login), then 
>> the Failsafe Terminal option will work, but attempts to access other 
>> directories will fail with "Timed out" errors.  Presumably this is due to 
>> AFS caching, and might eventually fail if enough time elapses.
>> 
>> -Failsafe GNOME just plain doesn't work (it attempts to create the Nautilus 
>> directories, presumably because it can't read them, and fails)
>> 
>> -If the locker has not already been accessed (say, after a reboot), it 
>> hangs at the login screen, and eventually asks the user if they want to log 
>> in with / as their home directory.  If you say yes, it eventually presents 
>> the GNOME background, but just sits there (I gave up after 10 minutes). 
>> If you attempt to Ctrl-Alt-Bksp out of the situation, X does not respawn. 
>> ps shows that X and gdmgreeter are defunct processes, but nothing I tried 
>> was able to cause them to completely die or to respawn X.
>> 
>> However, pyhesiodfs seems to recover from server outages much faster than 
>> afuse did.
>> 
>> The situation is still pretty much unusuable, though.  Rather than try and 
>> let Ubuntu figure out what's up with the user's homedir, is it worth adding 
>> code to the login sequence that explicitly checks if a user's homedir is 
>> available and punts if it isn't?
>> 
>

home help back first fref pref prev next nref lref last post