[16147] in Athena Bugs
8.2.9 ultra 10: x15-cruise-basselope
daemon@ATHENA.MIT.EDU (John Hawkinson)
Wed Aug 12 23:26:13 1998
Date: Wed, 12 Aug 1998 23:26:08 -0400
To: bugs@MIT.EDU
From: John Hawkinson <jhawk@MIT.EDU>
A couple of things:
1) It'd be really nice if SIPB got our Ultra10 before they showed
up in clusters. We finally did (yay!), but it seemed like we were kinda
lagging behind...
2) The machine seems repeatedly in the state where logging in as root
from xlogin causes the X server to immediately quit, a brief flash of
white screen, and then xlogin comes back. Running reactivate seemed to
fix this, but for no discernable reason. I think I may have seen this
before on other 8.2 machines.
3) When the machine came the NVRAM variable "boot-device" was set to
"disk:b", yet there was nothing on the b partition of the disk. I
dunno if the machine came this way, if hotline did this (as part
of the install??), or something weird. I changed it to just
'disk', and all functioned normally. If no one else mentions
this, it should be discounted...
4) After rebooting the machine (to play with PROM stuff -- note the
existance of "probe-ide"!), and logging in as root, an xterm and
the console window come up, but no prompt is displyed in the xterm,
and characters do not echo. I can log in remotely and su, here's some
nominal amount of data:
[x15-cruise-basselope!jhawk] ~# ps -fp 314
UID PID PPID C STIME TTY TIME CMD
root 314 306 97 23:02:56 ttyp0 6:29 xterm -ls -geometry 80x56+0+30
[x15-cruise-basselope!jhawk] ~# ps -lp 314
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
8 R 0 314 306 98 89 20 60b188a0 397 ttyp0 6:34 xterm
xterm is spinning:
[x15-cruise-basselope!jhawk] ~# truss -p 314 |& head
write(5, " s r v d15 t w m &\r t a".., 54) Err#22 EINVAL
read(5, 0x000501E8, 4096) Err#22 EINVAL
poll(0xEFFFDA68, 2, -1) = 1
write(5, " s r v d15 t w m &\r t a".., 54) Err#22 EINVAL
read(5, 0x000501E8, 4096) Err#22 EINVAL
poll(0xEFFFDA68, 2, -1) = 1
write(5, " s r v d15 t w m &\r t a".., 54) Err#22 EINVAL
read(5, 0x000501E8, 4096) Err#22 EINVAL
poll(0xEFFFDA68, 2, -1) = 1
write(5, " s r v d15 t w m &\r t a".., 54) Err#22 EINVAL
[x15-cruise-basselope!jhawk] ~#
[x15-cruise-basselope!jhawk] ~# truss -w all -p 314 |& head
read(5, 0x000501E8, 4096) Err#22 EINVAL
poll(0xEFFFDA68, 2, -1) = 1
write(5, 0x00065238, 54) Err#22 EINVAL
s r v d15 t w m &\r t a i l / s r v d / . r v d i n f o\r\r\r
\r\r 030303\r\r\r\r s g h s g h s g h
read(5, 0x000501E8, 4096) Err#22 EINVAL
poll(0xEFFFDA68, 2, -1) = 1
write(5, 0x00065238, 54) Err#22 EINVAL
s r v d15 t w m &\r t a i l / s r v d / . r v d i n f o\r\r\r
\r\r 030303\r\r\r\r s g h s g h s g h
But perhaps it's irrlevent because there's a tcsh running as it's child:
[x15-cruise-basselope!jhawk] /bin/athena# ps -fp 315
UID PID PPID C STIME TTY TIME CMD
root 315 314 0 23:02:56 pts/0 0:00 -tcsh
[x15-cruise-basselope!jhawk] /bin/athena# ps -lp 315
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
8 S 0 315 314 0 40 20 60b181e0 244 60b0cd86 pts/0 0:00 tcsh
[x15-cruise-basselope!jhawk] /bin/athena# /usr/ucb/ps lxw 315
F UID PID PPID %C PRI NI SZ RSS WCHAN S TT TIME COMMAND
8 0 315 314 0 59 20 1952 1536 pm_state S pts/0 0:00 -tcsh
(ucb ps for wchan decoding). /usr/include/sac.h suggests this has something
to do with SAC< which is weird:
/*
* message to SAC (header only). This header is forever fixed. The
* size field (pm_size) defines the size of the data portion of the
* message, which follows the header. The form of this optional
* data portion is defined strictly by the message type (pm_type).
*/
struct pmmsg {
char pm_type; /* type of message */
unchar pm_state; /* current state of port monitor */
char pm_maxclass; /* max message class this PM */
/* understands */
char pm_tag[PMTAGSIZE + 1]; /* port monitor's tag */
int pm_size; /* size of optional data portion */
};
But maybe it's a different pm_state?
Running gdb and attaching:
(gdb) where
#0 0xef5b8510 in _libc_read ()
#1 0x46940 in GetNextChar ()
#2 0x4678c in GetNextCommand ()
#3 0x45b6c in Inputl ()
#4 0x2e5e0 in bgetc ()
#5 0x2e1e0 in readc ()
#6 0x2b2d0 in lex ()
#7 0x1e834 in process ()
#8 0x1da68 in main ()
And "finish" continues to run for quite a while, suggesting we're
stuck reading. "pfiles" doesn't give any clue as to what stdin and stdout
are for this tcsh process.
x15-cruise-basselope!jhawk] ~# /usr/proc/bin/pfiles 315
315: -tcsh
Current rlimit: 64 file descriptors
15: S_IFCHR mode:0620 dev:134,0 ino:17981 uid:0 gid:7 rdev:24,0
O_RDWR FD_CLOEXEC
16: S_IFCHR mode:0620 dev:134,0 ino:17981 uid:0 gid:7 rdev:24,0
O_RDWR FD_CLOEXEC
17: S_IFCHR mode:0620 dev:134,0 ino:17981 uid:0 gid:7 rdev:24,0
O_RDWR FD_CLOEXEC
18: S_IFCHR mode:0620 dev:134,0 ino:17981 uid:0 gid:7 rdev:24,0
O_RDWR FD_CLOEXEC
19: S_IFCHR mode:0620 dev:134,0 ino:17981 uid:0 gid:7 rdev:24,0
O_RDWR FD_CLOEXEC
[x15-cruise-basselope!jhawk] ~# find / -xdev -inum 17981
/devices/pseudo/pts@0:0
This looks about the same as the working tcsh process.
In short, I can find no satisfactory explanation for this behavior.
Logging out and logging back in, everything is hunky-dory and this
is inexplicable...
That's all for now.
--jhawk