[485] in Moira

home help back first fref pref prev next nref lref last post

Re: Mailhub ALARM CONDITION

daemon@ATHENA.MIT.EDU (Theodore Ts'o)
Sat Nov 28 01:36:05 1992

Date: Sat, 28 Nov 92 01:35:49 EST
From: tytso@ATHENA.MIT.EDU (Theodore Ts'o)
To: "Richard Basch" <basch@MIT.EDU>
Cc: network@MIT.EDU, bug-moira@MIT.EDU
In-Reply-To: Richard Basch's message of Sat, 28 Nov 1992 01:22:06 -0500,

Some more followup:

Here's the relevant log entries from that hung DCM:

Nov 25 23:05:07 <10227> dcm: checking aliases...
Nov 25 23:05:09 <10227> dcm:  running /u1/sms/bin/aliases.gen
Nov 25 23:05:13 <10227> Loading machines
Nov 25 23:05:17 <10227> Loading strings
Nov 25 23:06:31 <10227> Loading users
Nov 25 23:08:47 <10227> Loading lists
Nov 25 23:10:55 <10227> Loading members
Nov 25 23:14:10 <10227> Dumping information
Nov 25 23:18:46 <10227> dcm: starting update for ATHENA.MIT.EDU:aliases
Nov 26 00:00:12 <10227> dcm: starting update for ATHENA-AS-WELL.MIT.EDU:aliases
Nov 28 01:09:46 <10227> exited on termination signal

.... and here are the times that athena-as-well has rebooted in the past
week:

Dumping events from "Sat 21-Nov-92  0:08am" to "Sat 28-Nov-92  0:08am".
Looking for athena-as-well.

Mon Nov 23 07:38:19 1992   echo.athena-as-well.mit.edu timeout
Mon Nov 23 07:45:39 1992   echo.athena-as-well.mit.edu ok
Mon Nov 23 12:35:26 1992   echo.athena-as-well.mit.edu timeout
Mon Nov 23 12:42:46 1992   echo.athena-as-well.mit.edu ok
Mon Nov 23 18:53:39 1992   echo.athena-as-well.mit.edu timeout
Mon Nov 23 19:00:26 1992   echo.athena-as-well.mit.edu ok
Wed Nov 25 01:02:03 1992   echo.athena-as-well.mit.edu timeout
Wed Nov 25 01:08:49 1992   echo.athena-as-well.mit.edu ok
Fri Nov 27 14:42:02 1992   echo.athena-as-well.mit.edu timeout
Fri Nov 27 14:48:50 1992   echo.athena-as-well.mit.edu ok

No real correlation here.  

What was also interesting was that the hanging mailhub checking program
were doing something similar: there were three hung mailhub checking
programs, which were started on Wednesday, Thursday, and Friday morning
at 6:30.  I killed off one of them before I did this check, but ofiles
showed that the other two had been polling athena-as-well and somehow
managed to get their connection wedged up, such that they didn't even go
away when athena-as-well rebooted today at 2pm.

Given the similarity of how the DCM and the mailhub checking programs
had wedged connections, perhaps we're seeing some obscure "feature" of
Buglix.... 

						- Ted

home help back first fref pref prev next nref lref last post