[227] in arla-drinkers

home help back first fref pref prev next nref lref last post

Re: linux/SMP/arla difficulties

daemon@ATHENA.MIT.EDU (Magnus Ahltorp)
Mon Aug 24 05:13:55 1998

From owner-arla-drinkers@stacken.kth.se Mon Aug 24 09:13:54 1998
Return-Path: <owner-arla-drinkers@stacken.kth.se>
Delivered-To: arla-drinkers-mtg@bloom-picayune.mit.edu
Received: (qmail 9090 invoked from network); 24 Aug 1998 09:13:53 -0000
Received: from unknown (HELO sundance.stacken.kth.se) (130.237.234.41)
  by bloom-picayune.mit.edu with SMTP; 24 Aug 1998 09:13:53 -0000
Received: (from majordom@localhost)
	by sundance.stacken.kth.se (8.8.8/8.8.8) id LAA08101
	for arla-drinkers-list; Mon, 24 Aug 1998 11:08:18 +0200 (MET DST)
Received: (from map@localhost)
	by sundance.stacken.kth.se (8.8.8/8.8.8) id LAA08095;
	Mon, 24 Aug 1998 11:08:08 +0200 (MET DST)
To: Dave Morrison <dave@bnl.gov>
Cc: arla-drinkers <arla-drinkers@stacken.kth.se>
Subject: Re: linux/SMP/arla difficulties
References: <35DC5C49.79E6D3C0@bnl.gov>
From: Magnus Ahltorp <map@stacken.kth.se>
Date: 24 Aug 1998 11:08:07 +0200
In-Reply-To: Dave Morrison's message of Thu, 20 Aug 1998 13:26:33 -0400
Message-ID: <lv1iujirbm0.fsf@sundance.stacken.kth.se>
Lines: 28
X-Mailer: Gnus v5.3/Emacs 19.34
Sender: owner-arla-drinkers@stacken.kth.se
Precedence: bulk

> o once arla is up and running, the system crashes after anywhere from a few
> minutes to a few hours - no obvious correlation with AFS activity. 

I can think of two things that may cause this to happen in such a situation: 
* Callbacks from the server
* A sudden need for the VFS to clean some dentries out.

> o before the machine finally hangs, tons of messages appear of the
> form "sending wakeup: ..." (which are apparently generated in xfs).

This is probably the callbacks or the dentry deletes, but it's hard to
tell without log messages.

> o during this same time, arla goes nonlinear and chews up all the CPU

Does arlad chew up all CPU for a short time, or does it do that infinitely?

> o when arlad is first started (using the startarla script), the
> sysname can't be changed successfully for at least a minute.

This may be caused by the precreation of cache nodes, but that should
be a background thing.

> Is there some additional info that I could gather that could help
> diagnose things?

The xfs logs are very much useful, especially when combined with the
arlad logs (--debug="all,-cleaner").

home help back first fref pref prev next nref lref last post