[452] in arla-drinkers
another arlad crash on netbsd
daemon@ATHENA.MIT.EDU (Ken Raeburn)
Sun Jan 3 03:28:20 1999
From owner-arla-drinkers@stacken.kth.se Sun Jan 03 08:28:19 1999
Return-Path: <owner-arla-drinkers@stacken.kth.se>
Delivered-To: arla-drinkers-mtg@bloom-picayune.mit.edu
Received: (qmail 11288 invoked from network); 3 Jan 1999 08:28:18 -0000
Received: from unknown (HELO sundance.stacken.kth.se) (130.237.234.41)
by bloom-picayune.mit.edu with SMTP; 3 Jan 1999 08:28:18 -0000
Received: (from majordom@localhost)
by sundance.stacken.kth.se (8.8.8/8.8.8) id JAA09153
for arla-drinkers-list; Sun, 3 Jan 1999 09:22:32 +0100 (MET)
Received: from tweedledumb.cygnus.com (tweedledumb.cygnus.com [192.80.44.1])
by sundance.stacken.kth.se (8.8.8/8.8.8) with ESMTP id JAA09147
for <arla-drinkers@stacken.kth.se>; Sun, 3 Jan 1999 09:22:27 +0100 (MET)
Received: from kr-pc.cygnus.com (kr-pc.cygnus.com [192.80.44.193])
by tweedledumb.cygnus.com (8.8.5/8.8.5) with ESMTP id DAA05991
for <arla-drinkers@stacken.kth.se>; Sun, 3 Jan 1999 03:22:20 -0500 (EST)
Received: (from raeburn@localhost) by kr-pc.cygnus.com (8.8.8/8.6.9) id DAA14282; Sun, 3 Jan 1999 03:22:06 -0500 (EST)
Date: Sun, 3 Jan 1999 03:22:06 -0500 (EST)
Message-Id: <199901030822.DAA14282@kr-pc.cygnus.com>
X-Authentication-Warning: kr-pc.cygnus.com: raeburn set sender to raeburn@raeburn.org using -f
From: Ken Raeburn <raeburn@raeburn.org>
To: arla-drinkers@stacken.kth.se
Subject: another arlad crash on netbsd
Sender: owner-arla-drinkers@stacken.kth.se
Precedence: bulk
I was running a "du" across a modem line (ppp) that probably had a
bunch of other traffic as well (mail & news downloads, X11), and when
I went to look at the output, after some numbers for the first many
directories, I saw a lot of "network is down" messages for individual
files, then:
du: ./.mh/save/1610: Network is down
du: ./.mh/save/1616: Network is down
du: ./.mh/save/.mh_sequences: Network is down
751 ./.mh/save
du: ./.mh/Zephyr: Operation not supported by device
du: ./.mh/ANSI_C: Not a directory
du: ./.mh/tcl: Not a directory
The "not a directory" stuff seems to come up when arlad isn't running,
so I'm guessing that that's the point when it crashed, and the
"network is down" came from having a heavy load on the ppp link, but
I'm just guessing.
The crash was different this time:
#0 0x9734 in throw_entry (entry=0x118ae0) at ../../arlad/fcache.c:455
455 ret = RXAFS_GiveUpCallBacks (conn->connection, &fids, &cbs);
(gdb) p conn
$1 = (ConnCacheEntry *) 0x0
(gdb) bt
#0 0x9734 in throw_entry (entry=0x118ae0) at ../../arlad/fcache.c:455
#1 0x9c51 in cleaner (arg=0x0) at ../../arlad/fcache.c:567
#2 0x3a2fd in Create_Process_Part2 () at ../../lwp/lwp.c:629
#3 0xfffefdfc in ?? ()
#4 0x1 in ?? ()
#5 0x3a09a in LWP_MwaitProcess (wcount=1, evlist=0xefbfce6c)
at ../../lwp/lwp.c:567
#6 0x3a140 in LWP_WaitProcess (event=0x4c60) at ../../lwp/lwp.c:585
#7 0x5345 in main (argc=0, argv=0xefbfd8b8) at ../../arlad/arla.c:910
(gdb)
Looks like conn_get returned NULL. Which means that internal_get
returned NULL, or e->parent was null and e->flags.alivep was zero. A
NULL return from internal_get should only happen if connected_mode is
DISCONNECTED, but gdb shows it as being CONNECTED. I'm guessing that
the "network is down" messages imply that the connection's alivep flag
may have been zero....
And entry->host does correspond to the host holding the volume I was
examining. (However, using Transarc "fs whereis" on the volume after
restarting arlad, I get a backwards IP address printed out,
"30.0.185.18" when it should presumably be "18.185.0.30" or
"cronos.mit.edu". Perhaps AFS and Arla are using different byte
orders for that datum.)
Unfortunately, I didn't have debug logging turned on after recently
rebooting my system.
Ken