[7223] in testers

home help back first fref pref prev next nref lref last post

Yet another kernel oops in 9.4.9, AFS issues

daemon@ATHENA.MIT.EDU (You (Yoyo) Zhou)
Tue Jul 5 01:25:42 2005

Date: Tue, 5 Jul 2005 01:25:22 -0400 (EDT)
From: "You (Yoyo) Zhou" <yoz@MIT.EDU>
To: testers@MIT.EDU
Message-ID: <Pine.GSO.4.58L.0507050030300.8326@nyy-avtug-gbby.zvg.rqh>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

I caused a kernel oops on character.mit.edu (Athena 9.4.9) accidentally
but deliberately.

What I did: I resumed a screen session and reacquired tickets and tokens.
I noticed my .anyone was not readable, an error message consistent with
expired tokens, so I renewed again.

'ls -l' in my homedir listed every file but .anyone normally; .anyone
appeared with ? entries (similar to what was reported in testers[7179]).
'ls -l .anyone' responded with 'No such file or directory'.
'fs flushv .' had no effect.
Finally, I attempted
'touch .anyone' which yielded 'Segmentation fault'. After this, my screen
session terminals hung after attempting to execute a command, and trying
to kill those bash processes hung screen. I believe this command caused
the kernel oops.

character.mit.edu was left in a state where it continued to accept
connections. However, attempts to log in locally on a tty or via ssh
failed as follows: attempts to login as yoz hung before requesting a
password, and attempts to login as root hung after reading a correct
password. At the time the kernel oops happened, only 10 MB of swap was in
use.

Relevant lines from /var/log/messages are those beginning with
Jul  5 00:15:35 character kernel:

Unable to handle kernel NULL pointer dereference at virtual address 0000001a
 printing eip:
c01bf732
*pde = 00000000
Oops: 0000 [#1]
Modules linked in: md5 ipv6 libafs(U) i2c_dev i2c_core sunrpc dm_mod ohci_hcd snd_cmipci snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_opl3_lib snd_timer snd_hwdep snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore sis900 floppy ext3 jbd
CPU:    0
EIP:    0060:[<c01bf732>]    Tainted: PF     VLI
EFLAGS: 00010286   (2.6.9-5.EL)
EIP is at inode_doinit_with_dentry+0x26/0x63f
eax: 00000000   ebx: c13e3800   ecx: c4c88ed8   edx: c4c88ed8
esi: c4c88ed8   edi: c4c88ed8   ebp: c82b8680   esp: c60c3e60
ds: 007b   es: 007b   ss: 0068
Process touch (pid: 3672, threadinfo=c60c3000 task=c32f5380)
Stack: 00000000 00000000 00000000 c13e3800 00008180 c13e3800 c4c88ed8 c60c3ea4
       c82b8680 c017ca68 00000000 c4c88ed8 c829022b c4c88f5c c0d80454 c1f1a000
       c13e3800 00008000 000002f6 00000000 00000002 00001000 00000000 00008180
Call Trace:
 [<c017ca68>] d_instantiate+0x12e/0x131
 [<c829022b>] afs_linux_create+0x1ff/0x2d5 [libafs]
 [<c0172887>] vfs_create+0xb8/0xef
 [<c0172c58>] open_namei+0x181/0x57e
 [<c0161412>] filp_open+0x23/0x3c
 [<c03003b2>] __cond_resched+0x14/0x3b
 [<c01d8e46>] direct_strncpy_from_user+0x3e/0x5d
 [<c01618e9>] sys_open+0x31/0x7d
 [<c0301bfb>] syscall_call+0x7/0xb
Code: 5b 5e 5f 5d c3 55 57 89 d7 56 53 83 ec 14 89 44 24 0c 8b 80 ac 01 00 00 c7 44 24 04 00 00 00 00 c7 04 24 00 00 00 00 89 44 24 08 <80> 78 1a 00 0f 85 85 04 00 00 89 c3 31 c9 ba 6b 00 00 00 b8 a4

Further testing from another machine did not reproduce the problem, so
my .anyone file is not broken in a special way, although since owl
reads it often (every 3 minutes or so, I think) there may be AFS cache
issues.



Other potentially interesting messages:

Because I could not login, I tried to reboot the machine, then noticed the
messages on the console
INIT: Pid 2476 [id 2] seems to hang
INIT: Pid 2477 [id 3] seems to hang
Stopping HAL daemon:

where it seemed to hang for several minutes, so I had to reset it through
the hardware. (Unfortunately I cannot tell what these pids correspond to.)

-- 
Catastrophically,
Yoyo Zhou

home help back first fref pref prev next nref lref last post