[681] in Release_Engineering
common collection point for syslog errors
daemon@ATHENA.MIT.EDU (mar@ATHENA.MIT.EDU)
Tue Jan 10 12:36:32 1989
From: <mar@ATHENA.MIT.EDU>
Date: Tue, 10 Jan 89 12:35:16 EST
To: rel-eng@ATHENA.MIT.EDU
This is in the release notes, but we haven't done it yet. I've been
investigating for Dan, and would like to recommend the following:
1. The line
kern.notice,local7.notice @wslogger.mit.edu
be added to the standard /etc/syslog.conf, and the update script setup
to grep for "wslogger", and if it's not there append this line to
the existing file.
2. Request Ron to make WSLOGGER.MIT.EDU be an alias for one of the 750s.
Then make sure that this 750 is configured to log all of these
messages somewhere with sufficient disk space into files that get
turned over daily. Automated tools to scan the messages can be
written after we have some idea of what data we're collecting.
3. Change the way syslog is started at reboot:
if [ -f /etc/syslogd ]; then
echo -n "Starting syslog: " >/dev/console
(sleep `echo $ADDR | awk -F. '{print $2+$3+$4}'`; /etc/syslogd)&
echo "done." >/dev/console
else
echo "can't find syslog daemon!" >/dev/console
fi
Commentary:
All kernel messages are trapped. This means that we will get disk
errors which we want, plus 12 lines at reboot time, and possibly other
errors as well. local7 messages are also sent to that we can generate
log messages ourselves out of scripts. /usr/ucb/logger can generate
a log message for any subsystem other than kernel, and can easily be
used in shell scripts (if we wanted to have activate or deactivate log
certain kinds of errors, for instance). I have purposely chosen a
different hostname from the ones the server machines use
(SYSLOGGER.MIT.EDU) as we probably want to keep workstation errors
separate from server errors.
The change in invocation of syslogd at boot time is to introduce some
random delay in when the reboot messages are sent in the event of a
campus-wide reboot. We would have 800 (workstations) * 12 (messages)
or 9600 packets being thrown at the server. The server probably won't
have finished fscking it's disks, so won't have it's network up to
receive these packets. However, they still will all have to through
gateways and across the spine. The random backoff means that only
about 40 packets per second would be sent for about 4 minutes
following a campus-wide reboot, which is acceptable.
-Mark