[293] in arla-drinkers

home help back first fref pref prev next nref lref last post

Machine hanging with arla-0.12

daemon@ATHENA.MIT.EDU (Dr A V Le Blanc)
Thu Oct 1 11:27:02 1998

From owner-arla-drinkers@stacken.kth.se Thu Oct 01 15:27:02 1998
Return-Path: <owner-arla-drinkers@stacken.kth.se>
Delivered-To: arla-drinkers-mtg@bloom-picayune.mit.edu
Received: (qmail 16155 invoked from network); 1 Oct 1998 15:27:01 -0000
Received: from unknown (HELO sundance.stacken.kth.se) (130.237.234.41)
  by bloom-picayune.mit.edu with SMTP; 1 Oct 1998 15:27:01 -0000
Received: (from majordom@localhost)
	by sundance.stacken.kth.se (8.8.8/8.8.8) id RAA09992
	for arla-drinkers-list; Thu, 1 Oct 1998 17:20:26 +0200 (MET DST)
Received: from probity.mcc.ac.uk (probity.mcc.ac.uk [130.88.200.94])
	by sundance.stacken.kth.se (8.8.8/8.8.8) with ESMTP id RAA09987
	for <arla-drinkers@stacken.kth.se>; Thu, 1 Oct 1998 17:20:22 +0200 (MET DST)
Received: from cguhpc.cgu.mcc.ac.uk ([130.88.201.14] ident=zlsiial)
	by probity.mcc.ac.uk with esmtp (Exim 1.92 #2)
	for arla-drinkers@stacken.kth.se
	id 0zOkWi-0002g2-00; Thu, 1 Oct 1998 16:20:20 +0100
Received: (from zlsiial@localhost)
          by cguhpc.cgu.mcc.ac.uk (8.7.6/8.8.4)
	  id QAA10105 for arla-drinkers@stacken.kth.se; Thu, 1 Oct 1998 16:20:20 +0100 (BST)
Message-ID: <19981001162019.A10086@afs.mcc.ac.uk>
Date: Thu, 1 Oct 1998 16:20:20 +0100
From: Dr A V Le Blanc <LeBlanc@mcc.ac.uk>
To: arla-drinkers@stacken.kth.se
Subject: Machine hanging with arla-0.12
Reply-To: Dr A V Le Blanc <LeBlanc@mcc.ac.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.93.1i
Sender: owner-arla-drinkers@stacken.kth.se
Precedence: bulk

I've been able to look more closely at the machine hanging running
arla-0.12.  (The machine has linux-2.0.34 and libc 6 = glibc-2.0.7.)
The problem occurs when I transfer a lot of files at once; in this
case about 600mb.  (Note that the files are all small enough to
fit in the arla cache individually).

The machine begins to run slower, and to run out of memory.
Note that neither disk space nor swap space is ever used up,
and that the buffer cache always has space, at least according
to SHIFT-SCROLL_LOCK.  Soon (in a few minutes) the machine is
running slowly enough that even shell internal commands may
take minutes or even hours to complete.  Then processes begin
to die; console error messages report that kerneld, cron, etc.,
have run out of memory.  The process which is trying to read
these files dies.  Arlad, however, does not die.  I can't tell
much about the machine during the time it congeals, since, for
example, top won't display; I can only get the shift-scroll_lk
and ctrl-scroll_lk messages, though these continue to be
available.

I have had this problem for a long way back; i.e., at least as
far back as arla-0.6.  I presume it is not arlad but the
xfs module itself which is grabbing so much kernel memory that
things become paralysed?

     -- Owen
     LeBlanc@mcc.ac.uk

home help back first fref pref prev next nref lref last post