[16147] in Athena Bugs

home help back first fref pref prev next nref lref last post

8.2.9 ultra 10: x15-cruise-basselope

daemon@ATHENA.MIT.EDU (John Hawkinson)
Wed Aug 12 23:26:13 1998

Date: Wed, 12 Aug 1998 23:26:08 -0400
To: bugs@MIT.EDU
From: John Hawkinson <jhawk@MIT.EDU>


A couple of things:

  1) It'd be really nice if SIPB got our Ultra10 before they showed
     up in clusters. We finally did (yay!), but it seemed like we were kinda
     lagging behind...

  2) The machine seems repeatedly in the state where logging in as root
     from xlogin causes the X server to immediately quit, a brief flash of
     white screen, and then xlogin comes back. Running reactivate seemed to
     fix this, but for no discernable reason. I think I may have seen this
     before on other 8.2 machines.

  3) When the machine came the NVRAM variable "boot-device" was set to
     "disk:b", yet there was nothing on the b partition of the disk. I
     dunno if the machine came this way, if hotline did this (as part
     of the install??), or something weird. I changed it to just
     'disk', and all functioned normally. If no one else mentions
     this, it should be discounted...

  4) After rebooting the machine (to play with PROM stuff -- note the
     existance of "probe-ide"!), and logging in as root, an xterm and
     the console window come up, but no prompt is displyed in the xterm,
     and characters do not echo. I can log in remotely and su, here's some
     nominal amount of data:

[x15-cruise-basselope!jhawk] ~# ps -fp 314
     UID   PID  PPID  C    STIME TTY      TIME CMD
    root   314   306 97 23:02:56 ttyp0    6:29 xterm -ls -geometry 80x56+0+30
[x15-cruise-basselope!jhawk] ~# ps -lp 314
 F S   UID   PID  PPID  C PRI NI     ADDR     SZ    WCHAN TTY      TIME CMD
 8 R     0   314   306 98  89 20 60b188a0    397          ttyp0    6:34 xterm

xterm is spinning:

[x15-cruise-basselope!jhawk] ~# truss  -p 314  |& head
write(5, " s r v d15 t w m &\r t a".., 54)      Err#22 EINVAL
read(5, 0x000501E8, 4096)                       Err#22 EINVAL
poll(0xEFFFDA68, 2, -1)                         = 1
write(5, " s r v d15 t w m &\r t a".., 54)      Err#22 EINVAL
read(5, 0x000501E8, 4096)                       Err#22 EINVAL
poll(0xEFFFDA68, 2, -1)                         = 1
write(5, " s r v d15 t w m &\r t a".., 54)      Err#22 EINVAL
read(5, 0x000501E8, 4096)                       Err#22 EINVAL
poll(0xEFFFDA68, 2, -1)                         = 1
write(5, " s r v d15 t w m &\r t a".., 54)      Err#22 EINVAL
[x15-cruise-basselope!jhawk] ~# 
[x15-cruise-basselope!jhawk] ~# truss -w all -p 314 |& head
read(5, 0x000501E8, 4096)                       Err#22 EINVAL
poll(0xEFFFDA68, 2, -1)                         = 1
write(5, 0x00065238, 54)                        Err#22 EINVAL
   s r v d15 t w m &\r t a i l   / s r v d / . r v d i n f o\r\r\r
  \r\r        030303\r\r\r\r s g h s g h s g h
read(5, 0x000501E8, 4096)                       Err#22 EINVAL
poll(0xEFFFDA68, 2, -1)                         = 1
write(5, 0x00065238, 54)                        Err#22 EINVAL
   s r v d15 t w m &\r t a i l   / s r v d / . r v d i n f o\r\r\r
  \r\r        030303\r\r\r\r s g h s g h s g h

But perhaps it's irrlevent because there's a tcsh running as it's child:

[x15-cruise-basselope!jhawk] /bin/athena# ps -fp 315
     UID   PID  PPID  C    STIME TTY      TIME CMD
    root   315   314  0 23:02:56 pts/0    0:00 -tcsh
[x15-cruise-basselope!jhawk] /bin/athena# ps -lp 315
 F S   UID   PID  PPID  C PRI NI     ADDR     SZ    WCHAN TTY      TIME CMD
 8 S     0   315   314  0  40 20 60b181e0    244 60b0cd86 pts/0    0:00 tcsh
[x15-cruise-basselope!jhawk] /bin/athena# /usr/ucb/ps lxw 315
 F   UID   PID  PPID %C PRI NI   SZ  RSS    WCHAN S TT        TIME COMMAND
 8     0   315   314  0  59 20 1952 1536 pm_state S pts/0     0:00 -tcsh

(ucb ps for wchan decoding). /usr/include/sac.h suggests this has something
to do with SAC< which is weird:

/*
 * message to SAC (header only).  This header is forever fixed.  The
 * size field (pm_size) defines the size of the data portion of the
 * message, which follows the header.  The form of this optional
 * data portion is defined strictly by the message type (pm_type).
 */

struct  pmmsg {
        char    pm_type;                /* type of message */
        unchar  pm_state;               /* current state of port monitor */
        char    pm_maxclass;            /* max message class this PM */
                                        /* understands */
        char    pm_tag[PMTAGSIZE + 1];  /* port monitor's tag */
        int     pm_size;                /* size of optional data portion */
};

But maybe it's a different pm_state? 
Running gdb and attaching:

(gdb) where
#0  0xef5b8510 in _libc_read ()
#1  0x46940 in GetNextChar ()
#2  0x4678c in GetNextCommand ()
#3  0x45b6c in Inputl ()
#4  0x2e5e0 in bgetc ()
#5  0x2e1e0 in readc ()
#6  0x2b2d0 in lex ()
#7  0x1e834 in process ()
#8  0x1da68 in main ()

And "finish" continues to run for quite a while, suggesting we're
stuck reading. "pfiles" doesn't give any clue as to what stdin and stdout
are for this tcsh process.

x15-cruise-basselope!jhawk] ~# /usr/proc/bin/pfiles 315
315:    -tcsh
  Current rlimit: 64 file descriptors
  15: S_IFCHR mode:0620 dev:134,0 ino:17981 uid:0 gid:7 rdev:24,0
      O_RDWR FD_CLOEXEC
  16: S_IFCHR mode:0620 dev:134,0 ino:17981 uid:0 gid:7 rdev:24,0
      O_RDWR FD_CLOEXEC
  17: S_IFCHR mode:0620 dev:134,0 ino:17981 uid:0 gid:7 rdev:24,0
      O_RDWR FD_CLOEXEC
  18: S_IFCHR mode:0620 dev:134,0 ino:17981 uid:0 gid:7 rdev:24,0
      O_RDWR FD_CLOEXEC
  19: S_IFCHR mode:0620 dev:134,0 ino:17981 uid:0 gid:7 rdev:24,0
      O_RDWR FD_CLOEXEC
[x15-cruise-basselope!jhawk] ~# find / -xdev -inum 17981
/devices/pseudo/pts@0:0

This looks about the same as the working tcsh process.

In short, I can find no satisfactory explanation for this behavior.
Logging out and logging back in, everything is hunky-dory and this
is inexplicable...

That's all for now.

--jhawk

home help back first fref pref prev next nref lref last post