[4955] in Athena Bugs
Re: New NOC
daemon@ATHENA.MIT.EDU (tom@MIT.EDU)
Sun May 20 13:24:13 1990
From: tom@MIT.EDU
Date: Sun, 20 May 90 13:23:43 EDT
To: dennis@MIT.EDU
Cc: network@MIT.EDU, bugs@MIT.EDU
To: network@MIT.EDU
Subject: New NOC
From: dbaron@MIT.EDU (Dennis Baron)
Date: Sun, 20 May 90 11:15:06 EDT
Sender: dennis@ATHENA.MIT.EDU
Looked nice but I think it died. I got:
Sender: THE NOC! <rcmd.doghouse>
Time: 10:36:50
echo service on kerberos.mit.edu has not responded in 13 seconds.
(2 attempts.)
A very strange thing happened. At approximately 10:58am, doghouse dropped
off the network (no errors). Monitor did crash as well, soon after it sent
the notice out about kerberos. It attempted to send two more notices within
the next 30 minutes about two other events. The first notice no one saw...
and the second notice caused the following crash. The date of the core dump
was about the time doghouse dropped of the network, but well after the
program stopped updating its logs, in other words it was caught somewhere,
perhaps in the zephyr library (6.4r).
If this is so, then it makes sense that it took 549 seconds (see below) to
make three polls, since it may have been caught in a previous notification
sent out. This machine is configured such that it should take about 15-30
seconds to note the outage.
I'm sending this to bugs because it crashed in a standard library.
send_to_kdc.send_to_kdc(0x1aae0, 0x1afcc, 0x170bc) at 0x6eb7
get_ad_tkt.get_ad_tkt(0x134b8, 0x134bf, 0x170bc, 0x60) at 0x655e
krb_mk_req(0x7fffd5e0, 0x134b8, 0x134bf, 0x170bc, 0x0) at 0x5ae8
ZMakeAuthentication(0x7fffe040, 0x7fffdb40, 0x320, 0x7fffdb3c) at 0x3b71
Z_FormatHeader(0x7fffe040, 0x7fffdb40, 0x320, 0x7fffdb3c, 0x3b4c) at 0x4c3d
ZFormatNoticeList(0x7fffe040, 0x7fffe104, 0x4, 0x7fffde94, 0x7fffde90, 0x3b4c) at 0x53fd
ZSrvSendList(0x7fffe040, 0x7fffe104, 0x4, 0x3b4c, 0x534e) at 0x4259
ZSendList.ZSendList(0x7fffe040, 0x7fffe104, 0x4, 0x3b4c) at 0x4236
zsend(0x7fffe040, 0x7fffe104, 0x4, 0x1) at 0x183c
zsend_message(0x1285d, 0x22618, 0x22638, 0x1269c, 0x7fffe104, 0x4) at 0x1811
sm_notify.sm_notify(0x22600, 0x2ee50, 0x32e00, 0x7fffe224) at 0x177e
sm_event(event = 0x2265c, status = -102, level = 1, blurb = "timeout", message ="has not responded in 549 seconds.\n(3 attempts)", misc = "The following services affected: \necho.hyperion.mit.edu\n"), line 122 in "sm_events.c"
echo_monitor(0x22600) at 0x1dc4
sm_mainloop() at 0x68a
main.main(0x1, 0x7fffe71c, 0x7fffe724) at 0x25b