[598] in arla-drinkers
full debug log from arla lockup/non-responsiveness
daemon@ATHENA.MIT.EDU (Neulinger, Nathan R.)
Thu Feb 11 12:31:44 1999
From owner-arla-drinkers@stacken.kth.se Thu Feb 11 17:31:44 1999
Return-Path: <owner-arla-drinkers@stacken.kth.se>
Delivered-To: arla-drinkers-mtg@bloom-picayune.mit.edu
Received: (qmail 1197 invoked from network); 11 Feb 1999 17:31:43 -0000
Received: from unknown (HELO sundance.stacken.kth.se) (130.237.234.41)
by bloom-picayune.mit.edu with SMTP; 11 Feb 1999 17:31:43 -0000
Received: (from majordom@localhost)
by sundance.stacken.kth.se (8.8.8/8.8.8) id SAA15760
for arla-drinkers-list; Thu, 11 Feb 1999 18:25:06 +0100 (MET)
Received: from umr.edu (hermes.cc.umr.edu [131.151.1.68])
by sundance.stacken.kth.se (8.8.8/8.8.8) with ESMTP id SAA15746
for <arla-drinkers@stacken.kth.se>; Thu, 11 Feb 1999 18:24:36 +0100 (MET)
Received: from umr-mail01.cc.umr.edu (umr-mail01.cc.umr.edu [131.151.37.121]) via ESMTP by hermes.cc.umr.edu (8.8.7/R.4.20) id LAA11624; Thu, 11 Feb 1999 11:24:34 -0600 (CST)
Received: by umr-mail01.cc.umr.edu with Internet Mail Service (5.5.2232.9)
id <1NDN131N>; Thu, 11 Feb 1999 11:24:34 -0600
Message-ID: <9DA8D24B915BD1118911006094516EAF019C7EFB@umr-mail02.cc.umr.edu>
From: "Neulinger, Nathan R." <nneul@umr.edu>
To: "'arla-drinkers@stacken.kth.se'" <arla-drinkers@stacken.kth.se>
Subject: full debug log from arla lockup/non-responsiveness
Date: Thu, 11 Feb 1999 11:24:33 -0600
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2232.9)
Content-Type: text/plain;
charset="iso-8859-1"
Sender: owner-arla-drinkers@stacken.kth.se
Precedence: bulk
I set both the xfs and arla debugging to all.
The lockup seems to affect particular directories. Also, if the directory
gets in this state (where it locks up) a flushv on the directory will
completely freeze the system to where it will answer pings but nothing
else.
The full debug log is here:
http://www.umr.edu/~nneul/debug-traces/arla-webindex-19990211-ls
The first is the output from 'ls directory' then waiting a while, then
control-C.
The flushv only got the following before locking up the machine (no oops):
Feb 11 10:59:15 webindex arla[361]: probe (192.65.97.1)
Feb 11 10:59:20 webindex arla[361]: worker 0: processing
Feb 11 10:59:20 webindex arla[361]: Rec message: opcode = 22
(pioctl), size = 2096
Feb 11 10:59:20 webindex arla[361]: sending wakeup: seq = 1273148,
error = 22
The arla config is:
high_vnodes 4000
low_vnodes 3000
numcreds 100
numconns 100
numvols 100
fpriority 100
high_bytes 150M
low_bytes 140M
The cache status shortly after the time of the hangup:
Arla is using 2929 of the cache's available 153600 1K byte blocks
(and 805 of the cache's available 4000 vnodes)
If there is anything else you want me to get?
-- Nathan
------------------------------------------------------------
Nathan Neulinger EMail: nneul@umr.edu
University of Missouri - Rolla Phone: (573) 341-4841
Computing Services Fax: (573) 341-4216