[18448] in Athena Bugs
sun4 8.4.15: apparent xcb machine hang
daemon@ATHENA.MIT.EDU (John Hawkinson)
Fri Oct 20 13:16:44 2000
Message-Id: <200010201716.NAA00674@x15-cruise-basselope.mit.edu>
To: bugs@MIT.EDU
Cc: ocschwar@MIT.EDU, alex_C@MIT.EDU
Date: Fri, 20 Oct 2000 13:16:40 -0400
From: John Hawkinson <jhawk@MIT.EDU>
System name: x15-cruise-basselope.mit.edu
Type and version: Ultra-5_10 8.4.15 (with mkserv)
Display type: afb
Shell: /bin/sh (/bin/athena/tcsh?)
Window manager: unknown
What were you trying to do?
Login.
What's wrong:
I arrived in front of xcb and found it with a white screen
and no X server. Standers by indicated it had been that way
for some time. Characters echoed in text mode, but ^C, ^\, etc.
didnt' do anything interesting.
There was no current ps implementation in the nvramrc, and I've
gotten lazier over the years, so I didn't feel like typing it
in.
"last" doesn't show anything too useful:
[x15-cruise-basselope!jhawk] ~> last -20
jhawk pts/0 :0.0 Fri Oct 20 12:40 still logged in
reboot system boot Fri Oct 20 12:35
alex_c pts/1 :0.0 Fri Oct 20 10:00 - 12:08 (02:07)
I forced a crash dump with "0 set-pc go". Here's the "proc"
listing from crash:
> proc
PROC TABLE SIZE = 1914
SLOT ST PID PPID PGID SID UID PRI NAME FLAGS
0 t 0 0 0 0 0 96 sched load sys lock
1 s 1 0 0 0 0 58 init load
2 s 2 0 0 0 0 98 pageout load sys lock nowait
3 s 3 0 0 0 0 60 fsflush load sys lock nowait
4 r 26954 1 26954 26954 0 99 dm load jctl
5 s 128 1 0 0 0 59 afsd load sys lock nowait
7 s 54 1 54 54 0 55 devfseventd load
8 s 56 1 56 56 0 43 devfsadm load
9 s 145 1 145 145 0 58 rpcbind load
10 s 226 1 226 226 0 58 nscd load
11 s 129 1 0 0 0 58 afsd load sys
12 s 130 1 0 0 0 59 afsd load sys
13 s 131 1 0 0 0 60 afsd load sys
14 s 132 1 0 0 0 58 afsd load sys
15 s 133 1 0 0 0 59 afsd load sys
16 s 134 1 0 0 0 60 afsd load sys
17 s 135 1 0 0 0 59 afsd load sys
18 s 136 1 0 0 0 60 afsd load sys
19 s 137 1 0 0 0 60 afsd load sys
20 s 194 1 194 194 0 58 inetd load
21 s 159 1 159 159 0 58 named load jctl
22 s 195 1 195 195 0 58 automountd load
23 s 211 1 211 211 0 58 syslogd load
25 s 232 1 232 232 0 59 utmpd load
26 s 191 1 191 191 0 50 lockd load
27 s 193 1 193 193 1 51 statd load
28 s 220 1 220 220 0 58 cron load
29 s 345 1 345 345 0 58 zhm load
30 s 339 1 339 339 0 51 afbdaemon load
32 s 341 1 341 341 0 58 vold load jctl
33 s 324 1 324 324 0 48 inetd load jctl
34 s 333 1 333 333 0 100 xntpd load
35 s 355 1 355 355 0 58 sshd load jctl
53 s 14209 1 14209 13919 15090 59 zlogoutd load
I note the presence of the zlogoutd process from uid 15090, ocschwar.
Perhaps there is some sort of a problem with his session gate and
8.4.15?
On the other hand, this seems a little unlikely because it's been running for
quite a few days (note "start"):
> u 53
PER PROCESS USER AREA FOR PROCESS 53
PROCESS MISC:
command: zlogoutd, psargs: zlogoutd
start: Wed Oct 18 21:32:08 2000
mem: 353, type: exec
vnode of current directory: 70669b48
OPEN FILES, POFILE FLAGS, AND THREAD REFCNT:
[0]: F 700b8410, 0, 0 [1]: F 700b8320, 0, 0
[2]: F 700b8320, 0, 0 [3]: F 700b8d70, 1, 0
[4]: F 709abf40, 0, 1 [17]: F 700b8320, 0, 0
[18]: F 700b8320, 0, 0 [19]: F 709abf68, 0, 0
cmask: 0077
RESOURCE LIMITS:
cpu time: 18446744073709551613/18446744073709551613
file size: 18446744073709551613/18446744073709551613
swap size: 18446744073709551613/18446744073709551613
stack size: 8388608/18446744073709551613
coredump size: 0/18446744073709551613
file descriptors: 64/1024
address space: 18446744073709551613/18446744073709551613
SIGNAL DISPOSITION:
1: 4281430764 2: 4281430764 3: ignore 4: default
5: default 6: default 7: default 8: default
9: default 10: default 11: default 12: default
13: default 14: default 15: 4281430764 16: ignore
17: default 18: default 19: default 20: default
21: default 22: default 23: default 24: ignore
25: default 26: ignore 27: ignore 28: default
29: default 30: ignore 31: ignore 32: default
33: default 34: default 35: default 36: default
37: default 38: default 39: default 40: default
41: default 42: default 43: default 44: default
45: default
Consulting the syslogs, we see:
Oct 20 01:34:10 x15-cruise-basselope.mit.edu root: Non-empty session record
/var/athena/sessions/kcr
Oct 20 10:10:14 x15-cruise-basselope.mit.edu su: 'su root' succeeded for alex_c
on /dev/pts/5
Oct 20 12:08:07 x15-cruise-basselope.mit.edu unix: afs: failed to store file
(13)
Oct 20 12:34:49 x15-cruise-basselope.mit.edu unix: BAD TRAP: cpu=0 type=0x9
rp=0 x4002b928 addr=0x0 mmu_fsr=0x0
So the only potentially intersting thing in the failed-to-store-file,
which seems consistent with alex_c's logout time. The kernel message buffer
in the crash dump doesn't show any more messages, either.
Oh, perhaps it's useful to look at the state of dm:
> u 4
PER PROCESS USER AREA FOR PROCESS 4
PROCESS MISC:
command: dm, psargs: /etc/athena/dm /etc/athena/login/config ttyp0 console
start: Fri Oct 20 01:27:03 2000
mem: 226, type: exec su-user
vnode of current directory: 7032de90
OPEN FILES, POFILE FLAGS, AND THREAD REFCNT:
cmask: 0000
RESOURCE LIMITS:
cpu time: 18446744073709551613/18446744073709551613
file size: 18446744073709551613/18446744073709551613
swap size: 18446744073709551613/18446744073709551613
stack size: 8388608/18446744073709551613
coredump size: 18446744073709551613/18446744073709551613
file descriptors: 64/1024
address space: 18446744073709551613/18446744073709551613
SIGNAL DISPOSITION:
1: 4279857900 2: 4279857900 3: default 4: default
5: default 6: default 7: default 8: 4279857900
9: default 10: default 11: default 12: default
13: ignore 14: 4279857900 15: 4279857900 16: ignore
17: default 18: 4279857900 19: default 20: default
21: default 22: default 23: default 24: ignore
25: default 26: ignore 27: ignore 28: default
29: default 30: ignore 31: ignore 32: default
33: default 34: default 35: default 36: default
37: default 38: default 39: default 40: default
41: default 42: default 43: default 44: default
45: default
That's about all I'm prepared to offer.
The crash dump will go away at some point, but
if someone wants access, just pipe up.
--jhawk