[4304] in Hotline Meeting
Power outage on Sat Mar 30 morning
daemon@ATHENA.MIT.EDU (salemme@ATHENA.MIT.EDU)
Sun Mar 31 02:42:17 1991
From: salemme@ATHENA.MIT.EDU
Date: Sun, 31 Mar 91 02:41:26 -0500
To: op@ATHENA.MIT.EDU
Cc: eichin@ATHENA.MIT.EDU, hoffmann@ATHENA.MIT.EDU, hotline@ATHENA.MIT.EDU,
Here is a brief status report regarding the power outage this morning,
and where we stand now...
- The power went out across much of the MIT campus Sat morning at
around 10:45; E40 was not affected; B11, B37, W20 and other buildings
were affected. (Note: I was paged by Mark Eichin of SIPB... Phys Plant
sent us no notification of this outage.) Electricity returned around 12:15,
stayed up until around 3pm when it went out for a few minutes again. It has
been up since then, as far as I can tell.
- Lucien Van Elsen and I shut down all machines in B11 around 11am. We went
to W20 to do the same, but could not get into the building which was locked
after having been evacuated.
- Brian checked E40, B66, and other machine rooms. Consultants had shut down
some workstations in B11.
- At aound 1:30, after the power had been back for an hour, Brian and I
started bring up the machines in the B11 machine room. When the power
went out again at 3, we shut everything down again and left, returning
at 5:30 to try again.
- In addition to workstations that will need to be rebooted, and many other
problems that will probably be reported to hotline through the week, here
are the major problems that I'm aware of:
File servers: Themis is in bad shape; we've left it halted.
I suspect a controller problem (since various disks
couldn't come online on rebooting).
RVD servers: All but uranus seem to be ok now. Note that many of the
RVD servers that were hit came back when the power
came back, but *weren't*providing* RVD service.
(rvdshow showed "service not yet available"; rebooting
caused the machine and service to come back)
time masters: uranus is the one for 18.83... it's way off...
n10-210-p is on 18.84... can't reach it remotely...
(the rest seem ok)
print servers: all public ones are ok (Brian and I rebooted a number
of them). Here are the ones that I can't reach remotely
which may need to to rebooted or looked at on Monday
(all others seem ok... I ran 'lpc stat' on all remotely)
castor.MIT.EDU: Connection refused
e52-364-p.MIT.EDU: Connection refused
e52-504-p.MIT.EDU: Connection refused
elsa.MIT.EDU: Connection refused
elsa.MIT.EDU: Connection refused
iona.MIT.EDU: Connection refused
m14n-336-p.MIT.EDU: Connection timed out
m14n-336-p.MIT.EDU: Connection timed out
m24-021-p.MIT.EDU: Connection timed out
m24-021-p.MIT.EDU: Connection timed out
m36-813-1.MIT.EDU: Connection timed out
m36-813-1.MIT.EDU: Connection timed out
m38-246-p.MIT.EDU: Connection refused
m38-246-p.MIT.EDU: Connection refused
m54-419-p.MIT.EDU: Connection refused
n10-210-p.MIT.EDU: Connection timed out
n10-210-p.MIT.EDU: Connection timed out
pal-p.MIT.EDU: Connection refused
pal-p.MIT.EDU: Connection refused
sol.MIT.EDU: Connection timed out
sol.MIT.EDU: Connection timed out
tim.MIT.EDU: Connection refused
eve: Permission denied
p13-470-1: Permission denied
workstations in public clusters: we checked W20, B11, B37, B66
to make sure machines were on and logins were possible
- I put a motd on telling users that Themis is unavailable.
- Thanks to Mark Eichin for letting us know about this outage as it was
occurring. Thanks to Lucien for helping shut things down. Thanks to Brian
for working diligently throughout the day.
- If noteworthy events occur on Sunday, I'll write again.
Anne (maybe I'll find chocolate eggs in B11!)