[994] in Release_7.7_team

home help back first fref pref prev next nref lref last post

Re: syslogd

daemon@ATHENA.MIT.EDU (John Hawkinson)
Mon Jun 16 03:18:56 1997

Date: Mon, 16 Jun 1997 03:18:48 -0400
To: Jonathon Weiss <jweiss@MIT.EDU>
Cc: release-team@MIT.EDU, op@MIT.EDU, testers@MIT.EDU, bugs@MIT.EDU
In-Reply-To: "[3229] in testers"
From: John Hawkinson <jhawk@MIT.EDU>


| Ops recently realized that we are not getting reboot syslogs for our
| solaris machines logged centrally.

It's times like this that various phrases about folks who reside in Hamelin
arise. More seriously, does someone own tracking bugs sent to testers
(as it appears that Mike owns tracking bugs sent to bugs)?

I reported this problem in testers[3017] (Sun, 21 Jul 1996 14:51:51 -0400):

/ The problem seems to be that the panic detection mechanism (savecore)
/ is run in /etc/rc2.d/S20sysetup, but syslogd is not started until
/ /etc/rc2.d/S74syslog.
/ 
/ My recollection has been that in the past there has been some
/ mechanism for syslogs being saved somewhere (kernel message socket??)
/ such that syslogd obtains queued sylogs after starting. This does
/ not appear to be functional.
...
/ I'm not sure whether this is bug in /dev/log (log(7)) or something
/ else. This behavior of queueing syslogs is not mentioned in log(7),
/ syslogd(1m), syslog(3), or anywhere else I looked.

The last paragraph was an attempt to point out the desired behavior was not
necessarily documented as being expected to work...

| This is particularly probalematic for our servers, but it would also
| be nice if cluster machines logged reboots.

You bet! Worse:

/ This also points out a more serious problem -- I think (I'm not 100%
/ certain) that the running of savecore is the only thing that detects
/ the fact that a machine has paniced and then rebooted.  This means
/ that there may in fat have been widespread cases of kernel panics
/ within the 7.7 Solaris deployment that were not noticed, as they were
/ not logged (both because savecore was not run, and because syslogd was
/ not running to receive the log message).

| Based on some tests we have run, it appears that the sun syslogd
| simply does not log the reboot messages, locally, remotely, or
| otherwise.

Yup, it appears to not pick up queued messages. The theory I had
was that it was reading the queued log messages at start time (which
is nothing special, that's how /dev/log works) and flushing them.
Someone with Solaris source should look, of course.

I had a sneaking suspicion that this problem was fixed in 2.5.1, but
I have no hard evidence to support that other than wishful thinking and
the old "How could Sun *still* let that bug be there?". [*]

| However, the athena syslogd seems to do so

Yup, my implementation just reads straight the /dev/log interface
and it just seems to work.

| (although it ends up with a somewhat garbled message, that may be
| what it is being given.

The Solaris syslogd calls isprint() on it's input, I believe. It's
possible that the Zephyr syslogd ought to do the same (though perhaps not;
perhaps it should just escape nonprintables). I never extensively
tested or stared at that particular interaction (I was hoping someone
would worry about it from my testers bug report, and then I forgot...).

I suspect this relates to these Sun Bug-IDs (but they doesn't have
useful version data):

-  Bug Id:     1072875                                  
-  Category:  kernel                                    
-  Subcategory:  other                                  
-  Release summary: mt-on5.0                            
-  Synopsis:  dmesg(1) prints garbage.                  
-  Description:                                         
-         Build a device driver that uses printf() or cmn_err(); for example
-         the one in bugtraq/etc/attached/1072873.  This calls cmn_err()
-         in several places.  The output on the console looks ok, but if
-         you review it with dmesg(1), it's full of garbage.
-                                                         
-         I.e.                                          
-                 foo: compiled 22:06:20, Oct 23 1991   
-                 foo_open, flag=1                      
-                 foo_close(0)                          
-                 foo_close(0)                          
-                 foo_close(0)                          
-         looks like                                    
-                 foo: compiled 22:06:20, Oct 23 1991   
-                 @foo_open, flag=1                     
-                 %Jfoo_close(0)                                
-                 1                                     
-                 @foo_close(0)                         
-                 c                                     
-                 foo_close(0)                          
-  History:                                             
-         Submitter:              falk    Date:   10/23/91
-         Dispatch Operator:      bugtraq Date:   11/30/93
-         Evaluator:              sebree  Date:   11/30/93
-         Closeout Operator:      sebree  Date:   11/30/93

- Bug Id:     1062548
-  Category:  kernel
-  Subcategory:  other
-  Release summary: 4.1
-  Synopsis:  "dmesg -" sends garbage to first line of /var/adm/msgbuf
-  Description:
- In testing the dmesg command with the dash option (dmesg _) I noticed that
- when the original /var/adm/msgbuf file is removed, dmesg creates the new
- one but sends garbage to the first line. Below is what I got after running
- "dmesg -":
- ^F0b^_p^UF^UFSunOS Release 4.1 (GENERIC_SMALL) #1: Wed Mar 7 10:54:19 PST 1990
-  
- There should be no garbage anywhere in the file.
-  History:
-         Submitter:              dnuss   Date:   06/18/91
-         Dispatch Operator:      bugtraq Date:   06/18/91
-         Evaluator:              craigr  Date:   04/05/93
-         Closeout Operator:      craigr  Date:   04/05/93

My search returned a few more, in summary:

bug      1247906   dmesg(1) returns garbage characters
bug      1165096   dmesg gets garbage characters in its output, and then never
                   logs anything after
bug      1072875   dmesg(1) prints garbage.
bug      1062548   "dmesg -" sends garbage to first line of /var/adm/msgbuf
bug      1267004   dmesg will produce garbage characters that will lock up user
                   window
bug      1225905   running "dmesg" on a CDE terminal outputs garbage and makes
                   dtterm unusable

The first one is really disturbing since it references 2.6. [Sigh]

| What are the possibilities of getting suns to run the athenized
| syslogd?

Here's a vote for it :-)

| Actually, while composing this, it occured to me that we might be able
| to solve the problem by simply having the Athena rc script log
| something at a level that will go to the remote logger. eg.:
|  
| /usr/bin/logger -p user.notice Yo, I am rebooting.
|  
| Ted, I have the feeling I'm forgetting somethign.  Will this, in fact,
| be sufficient for us?

I think it's insufficient because it does not notice panics. It's certainly
better than nothing, of course, but is kludgey and suboptimal. I think
that you might also want to use a facility other than "user" ("messages
generated by random user processes").

---
[*] Sun Bug-ID would appear that someone is royally, mightily, and totally
confused at Stanford University Networking:

- Bug Id:     1245180                                                     
-  Category:  utility                                                     
-  Subcategory:  program                                
-  State:  closed                               
-  Release summary: 5.5                                                   
-  Synopsis:  syslogd: getmsg() read  boot-messages from /dev/log on Solaris2.5
-         Integrated in releases:                                         
-  Patch id:                                                              
-  Description:                                                           
- getmsg() reads boot messages from /dev/log and add it to /var/adm/messages.
- This is no problem on Solaris2.4 and Solaris2.3, it happens on Solaris2.5.

I'm not sure what to make of this. It suggests that 2.5 did the
"right" thing, but someone may have "fixed" it, perhaps for 2.5.1 (though
the "Integrated in releases" field is blank). Corroborating it is:

- Bug Id:     1245451
-  Category:  utility
-  Subcategory:  logger
-  State:  accepted
-  Release summary: 5.5
-  Synopsis:  *syslogd* bug in syslogd failing to log messages every 12-48 hours
-             of operation
-         Integrated in releases:  
-  Patch id:   
-  Description:
...
 Work around:
- U of Liverpool plans to try to work around this by restarting syslogd
- every night via cron.  
-                          
- Royal Greenwich Observatory notes that this work around has an
- interesting side-effect.  Syslog put the latest dmesg info into
- /var/adm/messages, AGAIN.

This bug (124551) is ostensibly fixed as of revision -09
of Sun patch 103612-23 (a recommended patch):

- Patch-ID# 103612-23
- Keywords: security libc memory su getdate NIS+ libnsl nis_cachemgr rpc.nisd
-           NIS+
- Synopsis: SunOS 5.5.1: libc, libnsl, nis_cachemgr and rpc.nisd patch
- Date: Apr/21/97
- 
- Solaris Release: 2.5.1
- 
- SunOS Release: 5.5.1
...
- BugId's fixed with this patch:
-                           1212974 1213016 1219671 1221809 1223323 1223326
-                           1230570 1232758 1234630 1235867 1242395 1244872
-                           1244917 1245451 1246630 1246864 1247052 1248090
-                           1249373 1249903 1255623 1258916 1259200 1262666
-                           1264708 1265785 4005483 4005686 4006674 4016724
-                           4018883 4018887 4025665 4029971
...
- (from 103612-09)
- 
- 1245451 syslogd failing to log messages every 12-48 hours of operation

And of course, Athena has 103612-07 installed.  It's not clear that
this (presence of the fix in this patch) is necessarily problematic,
as the bug here really didn't have to do with syslog's interaction
with /dev/log and/or buffered messages, but simply that it hung every
12-48 hours due to a locking problem in the particular environment it
ran it.

It does not appear that any of the aforementioned bugs are fixed in
any Sun patches, though of course they may be fixed in releases.

--jhawk

home help back first fref pref prev next nref lref last post