[3015] in testers

home help back first fref pref prev next nref lref last post

8.0C portnoy: panic: vn_rele vnode ref count 0

daemon@ATHENA.MIT.EDU (John Hawkinson)
Sun Jul 21 01:00:05 1996

Date: Sun, 21 Jul 1996 00:59:35 -0400
To: testers@MIT.EDU
Cc: sipb-staff@MIT.EDU, afsdev@MIT.EDU
From: John Hawkinson <jhawk@MIT.EDU>


kretch was logged in on portnoy
this evening when it crashed with

	panic: vn_rele: vnode ref count 0

Here's his description of what he was doing:

} Before the machine paniced I had a tcl/tk zephyr client running that
} appeared to run out of swap space (although the error message was unclear
} about this)  I killed the process and reloaded the zwgc.desc (which executes
} the tcl/tk client.)  Later the same error occured and when I ran a script
} to kill the process and restart the zwgc desc the machine paniced.
} of possible relevance:
}
} portnoy:29521% limit
} cputime         unlimited
} filesize        unlimited
} datasize        2097148 kbytes
} stacksize       8192 kbytes
} coredumpsize    0 kbytes
} vmemoryuse      unlimited
} descriptors     64 

It's not clear to me how much this is relevent, as the panic appears
filesystem -related. Nevertheless, per-process limits may have been
relevent.

This was the panic message from the message buffer:

panic: vn_rele: vnode ref count 0
syncing file systems... [14] [14] [14] [14] [14] [14] [14] [14] [14] [14] [14] [14] [14] [14] [14] [14] [14] [14] [14] [14] done
 3160 static and sysmap kernel pages
   64 dynamic kernel data pages
  110 kernel-pageable pages
    0 segkmap kernel pages
    0 segvn kernel pages
 1769 current user process pages
 5103 total pages (5103 chunks)

dumping to vp fc1e0e1c, offset 57712
WARNING: /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000/sd@3,0 (sd3):
        SCSI transport failed: reason 'reset': retrying command

5103 total pages, dump succeeded

Note that the dump may be partially invalid due to the SCSI failure during
the dump, as well as the weirdo savecore contortions I went through
to obtain a "valid" dump (see previous mail).

Using both of the two dumps I had (both without and with
the bogus kernel module symbols in, unix/vmcore .4 and .5 respectively),
I get the same stack trace. Both are in /var/crash/portnoy on portnoy
(an SS5 in the sipb office). Ask someone if you'd like to see them; 
we'll save them for a while, there's plenty of space.

[ Incidently, it appears to be a bug that said
dumps are world readable. That should be reported to Sun, I think. ]

Here's the stack trace:

[portnoy!jhawk] /var/crash/portnoy# adb -k unix.5 vmcore.5
$c
physmem 1e6b
complete_panic(0xf0048bb0,0xf0152b78,0xf05ae920,0x0,0x0,0x1) + f8
do_panic(?) + 1c
vcmn_err(0xf0165be0,0xf05aea94,0xf05aea94,0xa,0x4,0x3)
cmn_err(0x3,0xf0165be0,0x0,0x267c6,0xfc3c0c00,0x1) + 1c
vn_rele(0xfc47e960,0x0,0xe13664ff,0xfc3c3400,0x57,0x57) + 18
afs_PutVCache(0xfc47e960,0xfc3c3800,0xfc227678,0xfc55a8e8,0x209,0xfc3c3800) + a0
afsremove(0xfc47eba0,0xfc4a97b0,0xfc47e960,0xfc57eed0,0xfc55a5e8,0x2) + 5c8
afs_remunlink(0xfc47e960,0x1,0xfc23f440,0xfc57eed0,0xfffaffff,0xfc4a97b0) + 214
afs_inactive(0xfc47e960,0x10000,0x56,0xfc39f800,0x0,0x0) + 1d4
gafs_inactive(?) + 28
vn_rele(0xfc47e960,0xfc0e9260,0xe13664ff,0x0,0x104,0x0)
closef(0xfc25e658,0x0,0xe13664ff,0x1,0x0,0xfc47e960) + 94
closeall(0x1,0x0,0xfc227c08,0xfc25e658,0x7,0xfc4dbc38) + 4c
exit(?) + 248
psig(0x2,0xf,0xfc61dac8,0xfc26cc80,0xfc22768c,0xfc47a700)
psig_shared(0xf05aefb4,0xfc2276ec,0xfc26cc80,0xe,0x0,0xefffd478) + 4
post_syscall(0x0) + 430

The most important thing to note is that vn_rele() seems to have been
called twice with the same argument. Please note that adb does not
know how many arguments a function actually uses, generally assumes 6,
and prints 6 or fewer. In particular, vn_rele apparently
takes only one argument. Per ~afsdev/src/3.4a.patches.1/src/afs/afs_cache.c
around line 4749:

    AFS_RELE((struct vnode *)filevp);

And per line 254 of osi.h:

#define AFS_RELE(vp)     AFS_GUNLOCK(); VN_RELE(vp); AFS_GLOCK();       

So it seems reasonable to assume that since VN_RELE() takes one
argument, vn_rele does as well. If so, please note that vn_rele() was
called twice with the same argument, recursively (I assume that it's
reentrant). This is presumably the source of the problem.
I can't speak to what in particular is the case.


Attached is some more debugging output, including decoded
values of the vnode pointer (possibly no longer meaningful since
it has been freed?), and a more complete stack trace
with the arguments given as relative addresses rather than
absolute noes. It didn't seem like crash(1m) could supply
much useful information, but I've included the output of it's
vnode command.

I hope you can make some sense of this.  For my reference, does it
make sense for me to go into this level of detail in this type of bug
report, or is it sufficient to just tell you it lost, and provide the
core dump?

Thanks.

--jhawk

ps: I would suggest that if this bug is forwarded to Transarc, it
probably makes sense to forward this message, removing some sections
such as this one, rather than paraphrasing it, as good pertinent
useful details are often lost that way.


0xf0165be0:     vn_rele: vnode ref count 0

0xfc47e960$<vnode
0xfc47e960:     flag    refcnt          vfsmnt           vop
                0       0               0               afs_vnodeops
0xfc47e978:     vfsp            stream          pages
                fc433198        0               0
0xfc47e984:     type            rdev    data
                5               0               0
0xfc47e990:     filocks
                0

> vnode -l 0xfc47e960
VCNT  VFSMNTED   VFSP   STREAMP VTYPE   RDEV   VDATA   VFILOCKS VFLAG     
  0        0 fc433198    0      l       -     0        0   -
mutex v_lock: waiters 0   lock 0
        type: MUTEX_ADAPTIVE    owner 0
Condition variable v_cv: 0


fc433198$<vfs    
0xfc433198:     next            op              vnodecvd        flag
                fc24d760        afs_vfsops      fc4cf0a8        0
0xfc4331a8:     bsize           fstype          fsid
                8192            8               1234            63
0xfc4331b8:     data            maj     min     bcount          nsubmnts
                0               de05    e24c    -402267572      0
0xfc4331c8:     list            hash
                808d2002        0
0xfc4331d0:     reflock (mutex)
0xfc4331d0:     owner
                e0000000
0xfc4331d0:     lock    type    waiters
                0       0       0

<sp$<stacktrace
                l0              l1              l2              l3
                l4              l5              l6              l7
                i0              i1              i2              i3
                i4              i5              i6              i7

0xf05ae920:     0               0               1               f0152400
                fc26cc80        fc26cca4        f0172f14        f0179f60
                f0048bb0        f0152b78        f05ae920        0
                0               1               f05ae988        f0048790

0xf05ae920:     0               0               PR_BASE+1       backfs+0x44
                0xfc26cc80      0xfc26cca4      panic_regs      cpu_nodeid
                complete_panic+0xf8             cpu             0xf05ae920
                0               0               PR_BASE+1       0xf05ae988
                do_panic+0x1c

0xf05ae988:     49010c5         fc380bc8        fc36f13c        76
                76              76              7               0
                f0165be0        f05aea94        f05aea94        a
                4               3               f05ae9e8        f0080428

0xf05ae988:     0x49010c5       xdrrx_getint32+0x60             xdr_AFSFetchStatus+0x24
                LEDPATCNT+0x1e  LEDPATCNT+0x1e  LEDPATCNT+0x1e  PR_SIZE+3
                0               0xf0165be0      0xf05aea94      0xf05aea94
                VIK_NWINDOWS+2  PR_SIZE         PR_BASE+3       0xf05ae9e8
                cmn_err+0x1c

0xf05ae9e8:     44010c6         fc36f580        fc371f74        1
                2               2               7               fc577040
                3               f0165be0        0               267c6
                fc3c0c00        1               f05aea48        f00e3f9c

0xf05ae9e8:     0x44010c6       xdr_AFSFetchStatus+0x468        RXAFS_RemoveFile+0xdc
                PR_BASE+1       PR_BASE+2       PR_BASE+2       PR_SIZE+3
                0xfc577040      PR_BASE+3       0xf0165be0      0
                0x267c6         afs_cmstats+0x460               PR_BASE+1
                0xf05aea48      vn_rele+0x18

0xf05aea48:     49010c7         f00743a4        f00743a8        95c9ab00
                e32c4           4               2e094           fc3c0c00
                fc47e960        0               e13664ff        fc3c3400
                57              57              f05aeaa8        fc33a50c

0xf05aea48:     0x49010c7       gethrestime+4   gethrestime+8   0x95c9ab00
                0xe32c4         PR_SIZE         0x2e094         afs_cmstats+0x460
                0xfc47e960      0               0xe13664ff      afs_vhashT+0xda4
                LEDPTR+0x2a     LEDPTR+0x2a     0xf05aeaa8      afs_PutVCache+0xa0


0xf05aeaa8:     2               0               fc3c3400        4b54e
                fc3c0800        0               0               1
                fc47e960        fc3c3800        fc227678        fc55a8e8
                209             fc3c3800        f05aeb10        fc3915b0

0xf05aeaa8:     PR_BASE+2       0               afs_vhashT+0xda4
                0x4b54e         afs_cmstats+0x60                0
                0               PR_BASE+1       0xfc47e960      cbHashT+0x14c
                0xfc227678      hwc_debug+0xfac VIK_WINMASK+0x10a
                cbHashT+0x14c   0xf05aeb10      afsremove+0x5c8

0xf05aeb10:     fc3bfb88        fc3bfb88        2a5             f05aec70
                2               fc227678        0               fc4a97b0
                fc47eba0        fc4a97b0        fc47e960        fc57eed0
                fc55a5e8        2               f05aec08        fc3975bc

0xf05aeb10:     afs_stats_cmfullperf+0x370      afs_stats_cmfullperf+0x370
                VIK_WINMASK+0x1a6               0xf05aec70      PR_BASE+2
                0xfc227678      0               0xfc4a97b0      0xfc47eba0
                0xfc4a97b0      0xfc47e960      0xfc57eed0      hwc_debug+0xcac
                PR_BASE+2       0xf05aec08      afs_remunlink+0x214


0xf05aec08:     fc227678        fc55a8e8        209             fc47eba0
                fc47eba0        9f              fc47eba0        1
                fc47e960        1               fc23f440        fc57eed0
                fffaffff        fc4a97b0        f05aeca0        fc397830

0xf05aec08:     0xfc227678      hwc_debug+0xfac VIK_WINMASK+0x10a
                0xfc47eba0      0xfc47eba0      LEDPATCNT+0x47  0xfc47eba0
                PR_BASE+1       0xfc47e960      PR_BASE+1       0xfc23f440
                0xfc57eed0      0xfffaffff      0xfc4a97b0      0xf05aeca0
                afs_inactive+0x1d4

0xf05aeca0:     18a3fe          fc3c0c00        14081           ffffff7f
                14001           ffffffdf        14001           14001
                fc47e960        10000           56              fc39f800
                0               0               f05aed08        fc39c4e0

0xf05aeca0:     0x18a3fe        afs_cmstats+0x460               0x14081
                0xffffff7f      0x14001         0xffffffdf      0x14001
                0x14001         0xfc47e960      0x10000         LEDPTR+0x29
                sbox2+0xe8      0               0               0xf05aed08
                gafs_inactive+0x28

0xf05aed08:     44010c1         f008b070        f008af70        8
                10              10              0               104
                fc47e960        fc0e9260        e13664ff        0
                104             0               f05aed68        f008b030

0xf05aed08:     0x44010c1       closef+0xd4     closeall+0x54   PR_OFF
                PGSHIFT_DEBUG+4 PGSHIFT_DEBUG+4 0               VIK_WINMASK+5
                0xfc47e960      scsi_hba_mutex+0x67e0           0xe13664ff
                0               VIK_WINMASK+5   0               0xf05aed68
                closef+0x94

0xf05aed68:     0               1               f006a230        e0000000
                40000           fc39be3c        4000            f0155498
                fc25e658        0               e13664ff        1
                0               fc47e960        f05aedc8        f008af68

0xf05aed68:     0               PR_BASE+1       cv_wait_sig_swap+0x160
                0xe0000000      0x40000         gafs_close      Syssize+0x300
                stopdefault     0xfc25e658      0               0xe13664ff
                PR_BASE+1       0               0xfc47e960      0xf05aedc8
                closeall+0x4c

0xf05aedc8:     1342            f00e8a24        0               0
                6               f015ccb0        fc227830        fc227c10
                1               0               fc227c08        fc25e658
                7               fc4dbc38        f05aee28        f0089e4c

0xf05aedc8:     VIK_WINMASK+0x1243              nomod_zero      0
                0               PR_SIZE+2       prexit_info     0xfc227830
                0xfc227c10      PR_BASE+1       0               0xfc227c08
                0xfc25e658      PR_SIZE+3       0xfc4dbc38      0xf05aee28
                exit+0x248

0xf05aee28:     0               fc47a700        fffffeff        40
                0               f018809c        fc227678        fc22768c
                2               f               fc61dac8        fc26cc80
                fc22768c        fc47a700        f05aee88        f0077604

0xf05aee28:     0               0xfc47a700      0xfffffeff      LEDPTR+0x13
                0               pidlock         0xfc227678      0xfc22768c
                PR_BASE+2       PGSHIFT_DEBUG+3 0xfc61dac8      0xfc26cc80
                0xfc22768c      0xfc47a700      0xf05aee88      psig_shared+4


0xf05aee88:     fc26cd04        fc22770c        fc61dea7        80
                0               4000000         fc22774c        fc227714
                f05aefb4        fc2276ec        fc26cc80        e
                0               efffd478        f05aeee8        f0077160

0xf05aee88:     0xfc26cd04      0xfc22770c      0xfc61dea7      LEDPATCNT+0x28
                0               0x4000000       0xfc22774c      0xfc227714
                0xf05aefb4      0xfc2276ec      0xfc26cc80      PGSHIFT_DEBUG+2
                0               0xefffd478      0xf05aeee8      post_syscall+0x430


0xf05aeee8:     0               1               0               0
                57              4               fc26cd60        0
                0               0               0               f05aefb4
                fc61dac8        fc26cc80        f05aef58        f007789c

0xf05aeee8:     0               PR_BASE+1       0               0
                LEDPTR+0x2a     PR_SIZE         0xfc26cd60      0
                0               0               0               0xf05aefb4
                0xfc61dac8      0xfc26cc80      0xf05aef58      _syscall_post+0x14


0xf05aef58:     101b24          ef6bf444        0               ffffffff
                20              101b28          101aa8          ef6bba14
                4               101a28          1               1
                ffffffff        8               effff4e0        66954

0xf05aef58:     0x101b24        0xef6bf444      0               VADDR_MASK_DEBUG
                PGSHIFT_DEBUG+0x14              0x101b28        0x101aa8
                0xef6bba14      PR_SIZE         0x101a28        PR_BASE+1
                PR_BASE+1       VADDR_MASK_DEBUG                PR_OFF
                0xeffff4e0      0x66954


data address not found


home help back first fref pref prev next nref lref last post