[3015] in testers
8.0C portnoy: panic: vn_rele vnode ref count 0
daemon@ATHENA.MIT.EDU (John Hawkinson)
Sun Jul 21 01:00:05 1996
Date: Sun, 21 Jul 1996 00:59:35 -0400
To: testers@MIT.EDU
Cc: sipb-staff@MIT.EDU, afsdev@MIT.EDU
From: John Hawkinson <jhawk@MIT.EDU>
kretch was logged in on portnoy
this evening when it crashed with
panic: vn_rele: vnode ref count 0
Here's his description of what he was doing:
} Before the machine paniced I had a tcl/tk zephyr client running that
} appeared to run out of swap space (although the error message was unclear
} about this) I killed the process and reloaded the zwgc.desc (which executes
} the tcl/tk client.) Later the same error occured and when I ran a script
} to kill the process and restart the zwgc desc the machine paniced.
} of possible relevance:
}
} portnoy:29521% limit
} cputime unlimited
} filesize unlimited
} datasize 2097148 kbytes
} stacksize 8192 kbytes
} coredumpsize 0 kbytes
} vmemoryuse unlimited
} descriptors 64
It's not clear to me how much this is relevent, as the panic appears
filesystem -related. Nevertheless, per-process limits may have been
relevent.
This was the panic message from the message buffer:
panic: vn_rele: vnode ref count 0
syncing file systems... [14] [14] [14] [14] [14] [14] [14] [14] [14] [14] [14] [14] [14] [14] [14] [14] [14] [14] [14] [14] done
3160 static and sysmap kernel pages
64 dynamic kernel data pages
110 kernel-pageable pages
0 segkmap kernel pages
0 segvn kernel pages
1769 current user process pages
5103 total pages (5103 chunks)
dumping to vp fc1e0e1c, offset 57712
WARNING: /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000/sd@3,0 (sd3):
SCSI transport failed: reason 'reset': retrying command
5103 total pages, dump succeeded
Note that the dump may be partially invalid due to the SCSI failure during
the dump, as well as the weirdo savecore contortions I went through
to obtain a "valid" dump (see previous mail).
Using both of the two dumps I had (both without and with
the bogus kernel module symbols in, unix/vmcore .4 and .5 respectively),
I get the same stack trace. Both are in /var/crash/portnoy on portnoy
(an SS5 in the sipb office). Ask someone if you'd like to see them;
we'll save them for a while, there's plenty of space.
[ Incidently, it appears to be a bug that said
dumps are world readable. That should be reported to Sun, I think. ]
Here's the stack trace:
[portnoy!jhawk] /var/crash/portnoy# adb -k unix.5 vmcore.5
$c
physmem 1e6b
complete_panic(0xf0048bb0,0xf0152b78,0xf05ae920,0x0,0x0,0x1) + f8
do_panic(?) + 1c
vcmn_err(0xf0165be0,0xf05aea94,0xf05aea94,0xa,0x4,0x3)
cmn_err(0x3,0xf0165be0,0x0,0x267c6,0xfc3c0c00,0x1) + 1c
vn_rele(0xfc47e960,0x0,0xe13664ff,0xfc3c3400,0x57,0x57) + 18
afs_PutVCache(0xfc47e960,0xfc3c3800,0xfc227678,0xfc55a8e8,0x209,0xfc3c3800) + a0
afsremove(0xfc47eba0,0xfc4a97b0,0xfc47e960,0xfc57eed0,0xfc55a5e8,0x2) + 5c8
afs_remunlink(0xfc47e960,0x1,0xfc23f440,0xfc57eed0,0xfffaffff,0xfc4a97b0) + 214
afs_inactive(0xfc47e960,0x10000,0x56,0xfc39f800,0x0,0x0) + 1d4
gafs_inactive(?) + 28
vn_rele(0xfc47e960,0xfc0e9260,0xe13664ff,0x0,0x104,0x0)
closef(0xfc25e658,0x0,0xe13664ff,0x1,0x0,0xfc47e960) + 94
closeall(0x1,0x0,0xfc227c08,0xfc25e658,0x7,0xfc4dbc38) + 4c
exit(?) + 248
psig(0x2,0xf,0xfc61dac8,0xfc26cc80,0xfc22768c,0xfc47a700)
psig_shared(0xf05aefb4,0xfc2276ec,0xfc26cc80,0xe,0x0,0xefffd478) + 4
post_syscall(0x0) + 430
The most important thing to note is that vn_rele() seems to have been
called twice with the same argument. Please note that adb does not
know how many arguments a function actually uses, generally assumes 6,
and prints 6 or fewer. In particular, vn_rele apparently
takes only one argument. Per ~afsdev/src/3.4a.patches.1/src/afs/afs_cache.c
around line 4749:
AFS_RELE((struct vnode *)filevp);
And per line 254 of osi.h:
#define AFS_RELE(vp) AFS_GUNLOCK(); VN_RELE(vp); AFS_GLOCK();
So it seems reasonable to assume that since VN_RELE() takes one
argument, vn_rele does as well. If so, please note that vn_rele() was
called twice with the same argument, recursively (I assume that it's
reentrant). This is presumably the source of the problem.
I can't speak to what in particular is the case.
Attached is some more debugging output, including decoded
values of the vnode pointer (possibly no longer meaningful since
it has been freed?), and a more complete stack trace
with the arguments given as relative addresses rather than
absolute noes. It didn't seem like crash(1m) could supply
much useful information, but I've included the output of it's
vnode command.
I hope you can make some sense of this. For my reference, does it
make sense for me to go into this level of detail in this type of bug
report, or is it sufficient to just tell you it lost, and provide the
core dump?
Thanks.
--jhawk
ps: I would suggest that if this bug is forwarded to Transarc, it
probably makes sense to forward this message, removing some sections
such as this one, rather than paraphrasing it, as good pertinent
useful details are often lost that way.
0xf0165be0: vn_rele: vnode ref count 0
0xfc47e960$<vnode
0xfc47e960: flag refcnt vfsmnt vop
0 0 0 afs_vnodeops
0xfc47e978: vfsp stream pages
fc433198 0 0
0xfc47e984: type rdev data
5 0 0
0xfc47e990: filocks
0
> vnode -l 0xfc47e960
VCNT VFSMNTED VFSP STREAMP VTYPE RDEV VDATA VFILOCKS VFLAG
0 0 fc433198 0 l - 0 0 -
mutex v_lock: waiters 0 lock 0
type: MUTEX_ADAPTIVE owner 0
Condition variable v_cv: 0
fc433198$<vfs
0xfc433198: next op vnodecvd flag
fc24d760 afs_vfsops fc4cf0a8 0
0xfc4331a8: bsize fstype fsid
8192 8 1234 63
0xfc4331b8: data maj min bcount nsubmnts
0 de05 e24c -402267572 0
0xfc4331c8: list hash
808d2002 0
0xfc4331d0: reflock (mutex)
0xfc4331d0: owner
e0000000
0xfc4331d0: lock type waiters
0 0 0
<sp$<stacktrace
l0 l1 l2 l3
l4 l5 l6 l7
i0 i1 i2 i3
i4 i5 i6 i7
0xf05ae920: 0 0 1 f0152400
fc26cc80 fc26cca4 f0172f14 f0179f60
f0048bb0 f0152b78 f05ae920 0
0 1 f05ae988 f0048790
0xf05ae920: 0 0 PR_BASE+1 backfs+0x44
0xfc26cc80 0xfc26cca4 panic_regs cpu_nodeid
complete_panic+0xf8 cpu 0xf05ae920
0 0 PR_BASE+1 0xf05ae988
do_panic+0x1c
0xf05ae988: 49010c5 fc380bc8 fc36f13c 76
76 76 7 0
f0165be0 f05aea94 f05aea94 a
4 3 f05ae9e8 f0080428
0xf05ae988: 0x49010c5 xdrrx_getint32+0x60 xdr_AFSFetchStatus+0x24
LEDPATCNT+0x1e LEDPATCNT+0x1e LEDPATCNT+0x1e PR_SIZE+3
0 0xf0165be0 0xf05aea94 0xf05aea94
VIK_NWINDOWS+2 PR_SIZE PR_BASE+3 0xf05ae9e8
cmn_err+0x1c
0xf05ae9e8: 44010c6 fc36f580 fc371f74 1
2 2 7 fc577040
3 f0165be0 0 267c6
fc3c0c00 1 f05aea48 f00e3f9c
0xf05ae9e8: 0x44010c6 xdr_AFSFetchStatus+0x468 RXAFS_RemoveFile+0xdc
PR_BASE+1 PR_BASE+2 PR_BASE+2 PR_SIZE+3
0xfc577040 PR_BASE+3 0xf0165be0 0
0x267c6 afs_cmstats+0x460 PR_BASE+1
0xf05aea48 vn_rele+0x18
0xf05aea48: 49010c7 f00743a4 f00743a8 95c9ab00
e32c4 4 2e094 fc3c0c00
fc47e960 0 e13664ff fc3c3400
57 57 f05aeaa8 fc33a50c
0xf05aea48: 0x49010c7 gethrestime+4 gethrestime+8 0x95c9ab00
0xe32c4 PR_SIZE 0x2e094 afs_cmstats+0x460
0xfc47e960 0 0xe13664ff afs_vhashT+0xda4
LEDPTR+0x2a LEDPTR+0x2a 0xf05aeaa8 afs_PutVCache+0xa0
0xf05aeaa8: 2 0 fc3c3400 4b54e
fc3c0800 0 0 1
fc47e960 fc3c3800 fc227678 fc55a8e8
209 fc3c3800 f05aeb10 fc3915b0
0xf05aeaa8: PR_BASE+2 0 afs_vhashT+0xda4
0x4b54e afs_cmstats+0x60 0
0 PR_BASE+1 0xfc47e960 cbHashT+0x14c
0xfc227678 hwc_debug+0xfac VIK_WINMASK+0x10a
cbHashT+0x14c 0xf05aeb10 afsremove+0x5c8
0xf05aeb10: fc3bfb88 fc3bfb88 2a5 f05aec70
2 fc227678 0 fc4a97b0
fc47eba0 fc4a97b0 fc47e960 fc57eed0
fc55a5e8 2 f05aec08 fc3975bc
0xf05aeb10: afs_stats_cmfullperf+0x370 afs_stats_cmfullperf+0x370
VIK_WINMASK+0x1a6 0xf05aec70 PR_BASE+2
0xfc227678 0 0xfc4a97b0 0xfc47eba0
0xfc4a97b0 0xfc47e960 0xfc57eed0 hwc_debug+0xcac
PR_BASE+2 0xf05aec08 afs_remunlink+0x214
0xf05aec08: fc227678 fc55a8e8 209 fc47eba0
fc47eba0 9f fc47eba0 1
fc47e960 1 fc23f440 fc57eed0
fffaffff fc4a97b0 f05aeca0 fc397830
0xf05aec08: 0xfc227678 hwc_debug+0xfac VIK_WINMASK+0x10a
0xfc47eba0 0xfc47eba0 LEDPATCNT+0x47 0xfc47eba0
PR_BASE+1 0xfc47e960 PR_BASE+1 0xfc23f440
0xfc57eed0 0xfffaffff 0xfc4a97b0 0xf05aeca0
afs_inactive+0x1d4
0xf05aeca0: 18a3fe fc3c0c00 14081 ffffff7f
14001 ffffffdf 14001 14001
fc47e960 10000 56 fc39f800
0 0 f05aed08 fc39c4e0
0xf05aeca0: 0x18a3fe afs_cmstats+0x460 0x14081
0xffffff7f 0x14001 0xffffffdf 0x14001
0x14001 0xfc47e960 0x10000 LEDPTR+0x29
sbox2+0xe8 0 0 0xf05aed08
gafs_inactive+0x28
0xf05aed08: 44010c1 f008b070 f008af70 8
10 10 0 104
fc47e960 fc0e9260 e13664ff 0
104 0 f05aed68 f008b030
0xf05aed08: 0x44010c1 closef+0xd4 closeall+0x54 PR_OFF
PGSHIFT_DEBUG+4 PGSHIFT_DEBUG+4 0 VIK_WINMASK+5
0xfc47e960 scsi_hba_mutex+0x67e0 0xe13664ff
0 VIK_WINMASK+5 0 0xf05aed68
closef+0x94
0xf05aed68: 0 1 f006a230 e0000000
40000 fc39be3c 4000 f0155498
fc25e658 0 e13664ff 1
0 fc47e960 f05aedc8 f008af68
0xf05aed68: 0 PR_BASE+1 cv_wait_sig_swap+0x160
0xe0000000 0x40000 gafs_close Syssize+0x300
stopdefault 0xfc25e658 0 0xe13664ff
PR_BASE+1 0 0xfc47e960 0xf05aedc8
closeall+0x4c
0xf05aedc8: 1342 f00e8a24 0 0
6 f015ccb0 fc227830 fc227c10
1 0 fc227c08 fc25e658
7 fc4dbc38 f05aee28 f0089e4c
0xf05aedc8: VIK_WINMASK+0x1243 nomod_zero 0
0 PR_SIZE+2 prexit_info 0xfc227830
0xfc227c10 PR_BASE+1 0 0xfc227c08
0xfc25e658 PR_SIZE+3 0xfc4dbc38 0xf05aee28
exit+0x248
0xf05aee28: 0 fc47a700 fffffeff 40
0 f018809c fc227678 fc22768c
2 f fc61dac8 fc26cc80
fc22768c fc47a700 f05aee88 f0077604
0xf05aee28: 0 0xfc47a700 0xfffffeff LEDPTR+0x13
0 pidlock 0xfc227678 0xfc22768c
PR_BASE+2 PGSHIFT_DEBUG+3 0xfc61dac8 0xfc26cc80
0xfc22768c 0xfc47a700 0xf05aee88 psig_shared+4
0xf05aee88: fc26cd04 fc22770c fc61dea7 80
0 4000000 fc22774c fc227714
f05aefb4 fc2276ec fc26cc80 e
0 efffd478 f05aeee8 f0077160
0xf05aee88: 0xfc26cd04 0xfc22770c 0xfc61dea7 LEDPATCNT+0x28
0 0x4000000 0xfc22774c 0xfc227714
0xf05aefb4 0xfc2276ec 0xfc26cc80 PGSHIFT_DEBUG+2
0 0xefffd478 0xf05aeee8 post_syscall+0x430
0xf05aeee8: 0 1 0 0
57 4 fc26cd60 0
0 0 0 f05aefb4
fc61dac8 fc26cc80 f05aef58 f007789c
0xf05aeee8: 0 PR_BASE+1 0 0
LEDPTR+0x2a PR_SIZE 0xfc26cd60 0
0 0 0 0xf05aefb4
0xfc61dac8 0xfc26cc80 0xf05aef58 _syscall_post+0x14
0xf05aef58: 101b24 ef6bf444 0 ffffffff
20 101b28 101aa8 ef6bba14
4 101a28 1 1
ffffffff 8 effff4e0 66954
0xf05aef58: 0x101b24 0xef6bf444 0 VADDR_MASK_DEBUG
PGSHIFT_DEBUG+0x14 0x101b28 0x101aa8
0xef6bba14 PR_SIZE 0x101a28 PR_BASE+1
PR_BASE+1 VADDR_MASK_DEBUG PR_OFF
0xeffff4e0 0x66954
data address not found