[3017] in testers

home help back first fref pref prev next nref lref last post

8.0C: Solaris boxes don't syslog panics

daemon@ATHENA.MIT.EDU (John Hawkinson)
Sun Jul 21 14:51:59 1996

Date: Sun, 21 Jul 1996 14:51:51 -0400
To: testers@MIT.EDU
In-Reply-To: "[3016] in testers" entitled "8.0C sparc: Not taking dumps by default a mistake"
From: John Hawkinson <jhawk@MIT.EDU>


And perhaps 7.7 machines didn't either.

Last night I wrote:

} In particular, I wonder how many similar crashes have occured that
} people have ignored or not known what to deal with, or will occur.  A
} cursory inspection of the dslogger discuss meeting shows that very few
} have paniced under 2.3; pessimistically I think things will worsen
} under 2.4 (8.0).

Sure enough, in this morning's dslogger report, portnoy is not
mentioned.

The problem seems to be that the panic detection mechanism (savecore)
is run in /etc/rc2.d/S20sysetup, but syslogd is not started until
/etc/rc2.d/S74syslog.

My recollection has been that in the past there has been some
mechanism for syslogs being saved somewhere (kernel message socket??)
such that syslogd obtains queued sylogs after starting. This does
not appear to be functional.

In particular, with syslogd -d running (in multiuser mode), I ran
savecore -vd and observed the syslog being receivied and forwarded
properly. I killed syslogd, reran savecore, restarted syslogd -d, and
it failed to process any queued syslogs.

This is kind of disturbing.

I'm not sure whether this is bug in /dev/log (log(7)) or something
else. This behavior of queueing syslogs is not mentioned in log(7),
syslogd(1m), syslog(3), or anywhere else I looked.


This also points out a more serious problem -- I think (I'm not 100%
certain) that the running of savecore is the only thing that detects
the fact that a machine has paniced and then rebooted.  This means
that there may in fat have been widespread cases of kernel panics
within the 7.7 Solaris deployment that were not noticed, as they were
not logged (both because savecore was not run, and because syslogd was
not running to receive the log message).

I'm not entirely sure of the conclusions I've drawn above, so someone
should probably try to verify them.

--jhawk

home help back first fref pref prev next nref lref last post