[227] in arla-drinkers
Re: linux/SMP/arla difficulties
daemon@ATHENA.MIT.EDU (Magnus Ahltorp)
Mon Aug 24 05:13:55 1998
From owner-arla-drinkers@stacken.kth.se Mon Aug 24 09:13:54 1998
Return-Path: <owner-arla-drinkers@stacken.kth.se>
Delivered-To: arla-drinkers-mtg@bloom-picayune.mit.edu
Received: (qmail 9090 invoked from network); 24 Aug 1998 09:13:53 -0000
Received: from unknown (HELO sundance.stacken.kth.se) (130.237.234.41)
by bloom-picayune.mit.edu with SMTP; 24 Aug 1998 09:13:53 -0000
Received: (from majordom@localhost)
by sundance.stacken.kth.se (8.8.8/8.8.8) id LAA08101
for arla-drinkers-list; Mon, 24 Aug 1998 11:08:18 +0200 (MET DST)
Received: (from map@localhost)
by sundance.stacken.kth.se (8.8.8/8.8.8) id LAA08095;
Mon, 24 Aug 1998 11:08:08 +0200 (MET DST)
To: Dave Morrison <dave@bnl.gov>
Cc: arla-drinkers <arla-drinkers@stacken.kth.se>
Subject: Re: linux/SMP/arla difficulties
References: <35DC5C49.79E6D3C0@bnl.gov>
From: Magnus Ahltorp <map@stacken.kth.se>
Date: 24 Aug 1998 11:08:07 +0200
In-Reply-To: Dave Morrison's message of Thu, 20 Aug 1998 13:26:33 -0400
Message-ID: <lv1iujirbm0.fsf@sundance.stacken.kth.se>
Lines: 28
X-Mailer: Gnus v5.3/Emacs 19.34
Sender: owner-arla-drinkers@stacken.kth.se
Precedence: bulk
> o once arla is up and running, the system crashes after anywhere from a few
> minutes to a few hours - no obvious correlation with AFS activity.
I can think of two things that may cause this to happen in such a situation:
* Callbacks from the server
* A sudden need for the VFS to clean some dentries out.
> o before the machine finally hangs, tons of messages appear of the
> form "sending wakeup: ..." (which are apparently generated in xfs).
This is probably the callbacks or the dentry deletes, but it's hard to
tell without log messages.
> o during this same time, arla goes nonlinear and chews up all the CPU
Does arlad chew up all CPU for a short time, or does it do that infinitely?
> o when arlad is first started (using the startarla script), the
> sysname can't be changed successfully for at least a minute.
This may be caused by the precreation of cache nodes, but that should
be a background thing.
> Is there some additional info that I could gather that could help
> diagnose things?
The xfs logs are very much useful, especially when combined with the
arlad logs (--debug="all,-cleaner").