[452] in arla-drinkers

home help back first fref pref prev next nref lref last post

another arlad crash on netbsd

daemon@ATHENA.MIT.EDU (Ken Raeburn)
Sun Jan 3 03:28:20 1999

From owner-arla-drinkers@stacken.kth.se Sun Jan 03 08:28:19 1999
Return-Path: <owner-arla-drinkers@stacken.kth.se>
Delivered-To: arla-drinkers-mtg@bloom-picayune.mit.edu
Received: (qmail 11288 invoked from network); 3 Jan 1999 08:28:18 -0000
Received: from unknown (HELO sundance.stacken.kth.se) (130.237.234.41)
  by bloom-picayune.mit.edu with SMTP; 3 Jan 1999 08:28:18 -0000
Received: (from majordom@localhost)
	by sundance.stacken.kth.se (8.8.8/8.8.8) id JAA09153
	for arla-drinkers-list; Sun, 3 Jan 1999 09:22:32 +0100 (MET)
Received: from tweedledumb.cygnus.com (tweedledumb.cygnus.com [192.80.44.1])
	by sundance.stacken.kth.se (8.8.8/8.8.8) with ESMTP id JAA09147
	for <arla-drinkers@stacken.kth.se>; Sun, 3 Jan 1999 09:22:27 +0100 (MET)
Received: from kr-pc.cygnus.com (kr-pc.cygnus.com [192.80.44.193])
	by tweedledumb.cygnus.com (8.8.5/8.8.5) with ESMTP id DAA05991
	for <arla-drinkers@stacken.kth.se>; Sun, 3 Jan 1999 03:22:20 -0500 (EST)
Received: (from raeburn@localhost) by kr-pc.cygnus.com (8.8.8/8.6.9) id DAA14282; Sun, 3 Jan 1999 03:22:06 -0500 (EST)
Date: Sun, 3 Jan 1999 03:22:06 -0500 (EST)
Message-Id: <199901030822.DAA14282@kr-pc.cygnus.com>
X-Authentication-Warning: kr-pc.cygnus.com: raeburn set sender to raeburn@raeburn.org using -f
From: Ken Raeburn <raeburn@raeburn.org>
To: arla-drinkers@stacken.kth.se
Subject: another arlad crash on netbsd
Sender: owner-arla-drinkers@stacken.kth.se
Precedence: bulk

I was running a "du" across a modem line (ppp) that probably had a
bunch of other traffic as well (mail & news downloads, X11), and when
I went to look at the output, after some numbers for the first many
directories, I saw a lot of "network is down" messages for individual
files, then:

    du: ./.mh/save/1610: Network is down
    du: ./.mh/save/1616: Network is down
    du: ./.mh/save/.mh_sequences: Network is down
    751     ./.mh/save
    du: ./.mh/Zephyr: Operation not supported by device
    du: ./.mh/ANSI_C: Not a directory
    du: ./.mh/tcl: Not a directory

The "not a directory" stuff seems to come up when arlad isn't running,
so I'm guessing that that's the point when it crashed, and the
"network is down" came from having a heavy load on the ppp link, but
I'm just guessing.

The crash was different this time:

#0  0x9734 in throw_entry (entry=0x118ae0) at ../../arlad/fcache.c:455
455             ret = RXAFS_GiveUpCallBacks (conn->connection, &fids, &cbs);
(gdb) p conn
$1 = (ConnCacheEntry *) 0x0

(gdb) bt
#0  0x9734 in throw_entry (entry=0x118ae0) at ../../arlad/fcache.c:455
#1  0x9c51 in cleaner (arg=0x0) at ../../arlad/fcache.c:567
#2  0x3a2fd in Create_Process_Part2 () at ../../lwp/lwp.c:629
#3  0xfffefdfc in ?? ()
#4  0x1 in ?? ()
#5  0x3a09a in LWP_MwaitProcess (wcount=1, evlist=0xefbfce6c)
    at ../../lwp/lwp.c:567
#6  0x3a140 in LWP_WaitProcess (event=0x4c60) at ../../lwp/lwp.c:585
#7  0x5345 in main (argc=0, argv=0xefbfd8b8) at ../../arlad/arla.c:910
(gdb) 

Looks like conn_get returned NULL.  Which means that internal_get
returned NULL, or e->parent was null and e->flags.alivep was zero.  A
NULL return from internal_get should only happen if connected_mode is
DISCONNECTED, but gdb shows it as being CONNECTED.  I'm guessing that
the "network is down" messages imply that the connection's alivep flag
may have been zero....

And entry->host does correspond to the host holding the volume I was
examining.  (However, using Transarc "fs whereis" on the volume after
restarting arlad, I get a backwards IP address printed out,
"30.0.185.18" when it should presumably be "18.185.0.30" or
"cronos.mit.edu".  Perhaps AFS and Arla are using different byte
orders for that datum.)

Unfortunately, I didn't have debug logging turned on after recently
rebooting my system.

Ken

home help back first fref pref prev next nref lref last post