[18448] in Athena Bugs

home help back first fref pref prev next nref lref last post

sun4 8.4.15: apparent xcb machine hang

daemon@ATHENA.MIT.EDU (John Hawkinson)
Fri Oct 20 13:16:44 2000

Message-Id: <200010201716.NAA00674@x15-cruise-basselope.mit.edu>
To: bugs@MIT.EDU
Cc: ocschwar@MIT.EDU, alex_C@MIT.EDU
Date: Fri, 20 Oct 2000 13:16:40 -0400
From: John Hawkinson <jhawk@MIT.EDU>

System name:		x15-cruise-basselope.mit.edu
Type and version:	Ultra-5_10 8.4.15 (with mkserv)
Display type:		afb

Shell:			/bin/sh (/bin/athena/tcsh?)
Window manager:		unknown

What were you trying to do?
	Login.


What's wrong:
	I arrived in front of xcb and found it with a white screen
	and no X server. Standers by indicated it had been that way
	for some time. Characters echoed in text mode, but ^C, ^\, etc.
	didnt' do anything interesting.

	There was no current ps implementation in the nvramrc, and I've
	gotten lazier over the years, so I didn't feel like typing it
	in.

	"last" doesn't show anything too useful:
[x15-cruise-basselope!jhawk] ~> last -20
jhawk     pts/0        :0.0             Fri Oct 20 12:40   still logged in
reboot    system boot                   Fri Oct 20 12:35 
alex_c    pts/1        :0.0             Fri Oct 20 10:00 - 12:08  (02:07)


	I forced a crash dump with "0 set-pc go". Here's the "proc"
	listing from crash:

> proc
PROC TABLE SIZE = 1914
SLOT ST  PID  PPID  PGID   SID   UID PRI   NAME        FLAGS
   0 t     0     0     0     0     0  96 sched          load sys lock
   1 s     1     0     0     0     0  58 init           load
   2 s     2     0     0     0     0  98 pageout        load sys lock nowait
   3 s     3     0     0     0     0  60 fsflush        load sys lock nowait
   4 r 26954     1 26954 26954     0  99 dm             load jctl
   5 s   128     1     0     0     0  59 afsd           load sys lock nowait
   7 s    54     1    54    54     0  55 devfseventd    load
   8 s    56     1    56    56     0  43 devfsadm       load
   9 s   145     1   145   145     0  58 rpcbind        load
  10 s   226     1   226   226     0  58 nscd           load
  11 s   129     1     0     0     0  58 afsd           load sys
  12 s   130     1     0     0     0  59 afsd           load sys
  13 s   131     1     0     0     0  60 afsd           load sys
  14 s   132     1     0     0     0  58 afsd           load sys
  15 s   133     1     0     0     0  59 afsd           load sys
  16 s   134     1     0     0     0  60 afsd           load sys
  17 s   135     1     0     0     0  59 afsd           load sys
  18 s   136     1     0     0     0  60 afsd           load sys
  19 s   137     1     0     0     0  60 afsd           load sys
  20 s   194     1   194   194     0  58 inetd          load
  21 s   159     1   159   159     0  58 named          load jctl
  22 s   195     1   195   195     0  58 automountd     load
  23 s   211     1   211   211     0  58 syslogd        load
  25 s   232     1   232   232     0  59 utmpd          load
  26 s   191     1   191   191     0  50 lockd          load
  27 s   193     1   193   193     1  51 statd          load
  28 s   220     1   220   220     0  58 cron           load
  29 s   345     1   345   345     0  58 zhm            load
  30 s   339     1   339   339     0  51 afbdaemon      load
  32 s   341     1   341   341     0  58 vold           load jctl
  33 s   324     1   324   324     0  48 inetd          load jctl
  34 s   333     1   333   333     0  100 xntpd          load
  35 s   355     1   355   355     0  58 sshd           load jctl
  53 s 14209     1 14209 13919 15090  59 zlogoutd       load

I note the presence of the zlogoutd process from uid 15090, ocschwar.
Perhaps there is some sort of a problem with his session gate and
8.4.15?

On the other hand, this seems a little unlikely because it's been running for
quite a few days (note "start"):

> u 53
PER PROCESS USER AREA FOR PROCESS 53
PROCESS MISC:
        command: zlogoutd, psargs: zlogoutd
        start: Wed Oct 18 21:32:08 2000
        mem: 353, type: exec
        vnode of current directory: 70669b48
OPEN FILES, POFILE FLAGS, AND THREAD REFCNT:
        [0]: F 700b8410, 0, 0   [1]: F 700b8320, 0, 0
        [2]: F 700b8320, 0, 0   [3]: F 700b8d70, 1, 0
        [4]: F 709abf40, 0, 1   [17]: F 700b8320, 0, 0
        [18]: F 700b8320, 0, 0  [19]: F 709abf68, 0, 0
 cmask: 0077
RESOURCE LIMITS:
        cpu time: 18446744073709551613/18446744073709551613
        file size: 18446744073709551613/18446744073709551613
        swap size: 18446744073709551613/18446744073709551613
        stack size: 8388608/18446744073709551613
        coredump size: 0/18446744073709551613
        file descriptors: 64/1024
        address space: 18446744073709551613/18446744073709551613
SIGNAL DISPOSITION:
           1: 4281430764   2: 4281430764   3:   ignore   4:  default
           5:  default   6:  default   7:  default   8:  default
           9:  default  10:  default  11:  default  12:  default
          13:  default  14:  default  15: 4281430764  16:   ignore
          17:  default  18:  default  19:  default  20:  default
          21:  default  22:  default  23:  default  24:   ignore
          25:  default  26:   ignore  27:   ignore  28:  default
          29:  default  30:   ignore  31:   ignore  32:  default
          33:  default  34:  default  35:  default  36:  default
          37:  default  38:  default  39:  default  40:  default
          41:  default  42:  default  43:  default  44:  default
          45:  default

Consulting the syslogs, we see:

Oct 20 01:34:10 x15-cruise-basselope.mit.edu root: Non-empty session record
/var/athena/sessions/kcr
Oct 20 10:10:14 x15-cruise-basselope.mit.edu su: 'su root' succeeded for alex_c
on /dev/pts/5
Oct 20 12:08:07 x15-cruise-basselope.mit.edu unix: afs: failed to store file
(13)
Oct 20 12:34:49 x15-cruise-basselope.mit.edu unix: BAD TRAP: cpu=0 type=0x9
rp=0 x4002b928 addr=0x0 mmu_fsr=0x0

So the only potentially intersting thing in the failed-to-store-file,
which seems consistent with alex_c's logout time. The kernel message buffer
in the crash dump doesn't show any more messages, either.

Oh, perhaps it's useful to look at the state of dm:

> u 4
PER PROCESS USER AREA FOR PROCESS 4
PROCESS MISC:
        command: dm, psargs: /etc/athena/dm /etc/athena/login/config ttyp0 console
        start: Fri Oct 20 01:27:03 2000
        mem: 226, type: exec su-user
        vnode of current directory: 7032de90
OPEN FILES, POFILE FLAGS, AND THREAD REFCNT:
 cmask: 0000
RESOURCE LIMITS:
        cpu time: 18446744073709551613/18446744073709551613
        file size: 18446744073709551613/18446744073709551613
        swap size: 18446744073709551613/18446744073709551613
        stack size: 8388608/18446744073709551613
        coredump size: 18446744073709551613/18446744073709551613
        file descriptors: 64/1024
        address space: 18446744073709551613/18446744073709551613
SIGNAL DISPOSITION:
           1: 4279857900   2: 4279857900   3:  default   4:  default
           5:  default   6:  default   7:  default   8: 4279857900
           9:  default  10:  default  11:  default  12:  default
          13:   ignore  14: 4279857900  15: 4279857900  16:   ignore
          17:  default  18: 4279857900  19:  default  20:  default
          21:  default  22:  default  23:  default  24:   ignore
          25:  default  26:   ignore  27:   ignore  28:  default
          29:  default  30:   ignore  31:   ignore  32:  default
          33:  default  34:  default  35:  default  36:  default
          37:  default  38:  default  39:  default  40:  default
          41:  default  42:  default  43:  default  44:  default
          45:  default


That's about all I'm prepared to offer.
The crash dump will go away at some point, but
if someone wants access, just pipe up.

--jhawk

home help back first fref pref prev next nref lref last post