[681] in Release_Engineering


home	help	back	first	fref	pref	prev	next	nref	lref	last	post

common collection point for syslog errors

daemon@ATHENA.MIT.EDU (mar@ATHENA.MIT.EDU)
Tue Jan 10 12:36:32 1989

From: <mar@ATHENA.MIT.EDU>
Date: Tue, 10 Jan 89 12:35:16 EST
To: rel-eng@ATHENA.MIT.EDU
This is in the release notes, but we haven't done it yet.  I've been
investigating for Dan, and would like to recommend the following:

1.  The line
	kern.notice,local7.notice	@wslogger.mit.edu
    be added to the standard /etc/syslog.conf, and the update script setup
    to grep for "wslogger", and if it's not there append this line to
    the existing file.

2.  Request Ron to make WSLOGGER.MIT.EDU be an alias for one of the 750s.
    Then make sure that this 750 is configured to log all of these
    messages somewhere with sufficient disk space into files that get
    turned over daily.  Automated tools to scan the messages can be
    written after we have some idea of what data we're collecting.

3.  Change the way syslog is started at reboot:

if [ -f /etc/syslogd ]; then
	echo -n "Starting syslog: "				>/dev/console
	(sleep `echo $ADDR | awk -F. '{print $2+$3+$4}'`; /etc/syslogd)&
	echo "done."						>/dev/console
else
	echo "can't find syslog daemon!"			>/dev/console
fi

Commentary:

All kernel messages are trapped.  This means that we will get disk
errors which we want, plus 12 lines at reboot time, and possibly other
errors as well.  local7 messages are also sent to that we can generate
log messages ourselves out of scripts.  /usr/ucb/logger can generate
a log message for any subsystem other than kernel, and can easily be
used in shell scripts (if we wanted to have activate or deactivate log
certain kinds of errors, for instance).  I have purposely chosen a
different hostname from the ones the server machines use
(SYSLOGGER.MIT.EDU) as we probably want to keep workstation errors
separate from server errors.

The change in invocation of syslogd at boot time is to introduce some
random delay in when the reboot messages are sent in the event of a
campus-wide reboot.  We would have 800 (workstations) * 12 (messages)
or 9600 packets being thrown at the server.  The server probably won't
have finished fscking it's disks, so won't have it's network up to
receive these packets.  However, they still will all have to through
gateways and across the spine.  The random backoff means that only
about 40 packets per second would be sent for about 4 minutes
following a campus-wide reboot, which is acceptable.

					-Mark


home	help	back	first	fref	pref	prev	next	nref	lref	last	post

[681] in Release_Engineering

common collection point for syslog errors

daemon@ATHENA.MIT.EDU (mar@ATHENA.MIT.EDU)Tue Jan 10 12:36:32 1989

daemon@ATHENA.MIT.EDU (mar@ATHENA.MIT.EDU)
Tue Jan 10 12:36:32 1989