[4304] in Hotline Meeting

home help back first fref pref prev next nref lref last post

Power outage on Sat Mar 30 morning

daemon@ATHENA.MIT.EDU (salemme@ATHENA.MIT.EDU)
Sun Mar 31 02:42:17 1991

From: salemme@ATHENA.MIT.EDU
Date: Sun, 31 Mar 91 02:41:26 -0500
To: op@ATHENA.MIT.EDU
Cc: eichin@ATHENA.MIT.EDU, hoffmann@ATHENA.MIT.EDU, hotline@ATHENA.MIT.EDU,

Here is a brief status report regarding the power outage this morning,
and where we stand now...

- The power went out across much of the MIT campus Sat morning at
  around 10:45; E40 was not affected; B11, B37, W20 and other buildings
  were affected. (Note: I was paged by Mark Eichin of SIPB... Phys Plant
  sent us no notification of this outage.) Electricity returned around 12:15,
  stayed up until around 3pm when it went out for a few minutes again. It has
  been up since then, as far as I can tell.

- Lucien Van Elsen and I shut down all machines in B11 around 11am. We went
  to W20 to do the same, but could not get into the building which was locked
  after having been evacuated. 

- Brian checked E40, B66, and other machine rooms. Consultants had shut down
  some workstations in B11.

- At aound 1:30, after the power had been back for an hour, Brian and I
  started bring up the machines in the B11 machine room. When the power
  went out again at 3, we shut everything down again and left, returning
  at 5:30 to try again.

- In addition to workstations that will need to be rebooted, and many other
  problems that will probably be reported to hotline through the week, here
  are the major problems that I'm aware of:

	File servers: Themis is in bad shape; we've left it halted.
		      I suspect a controller problem (since various disks
		      couldn't come online on rebooting).

	RVD servers: All but uranus seem to be ok now. Note that many of the
		      RVD servers that were hit came back when the power
		      came back, but *weren't*providing* RVD service.
		      (rvdshow showed "service not yet available"; rebooting
		      caused the machine and service to come back)

	time masters: uranus is the one for 18.83... it's way off...
		      n10-210-p is on 18.84... can't reach it remotely...
		      (the rest seem ok)

	print servers: all public ones are ok (Brian and I rebooted a number
		      of them). Here are the ones that I can't reach remotely
		      which may need to to rebooted or looked at on Monday
		      (all others seem ok... I ran 'lpc stat' on all remotely)

		castor.MIT.EDU: Connection refused
		e52-364-p.MIT.EDU: Connection refused
		e52-504-p.MIT.EDU: Connection refused
		elsa.MIT.EDU: Connection refused
		elsa.MIT.EDU: Connection refused
		iona.MIT.EDU: Connection refused
		m14n-336-p.MIT.EDU: Connection timed out
		m14n-336-p.MIT.EDU: Connection timed out
		m24-021-p.MIT.EDU: Connection timed out
		m24-021-p.MIT.EDU: Connection timed out
		m36-813-1.MIT.EDU: Connection timed out
		m36-813-1.MIT.EDU: Connection timed out
		m38-246-p.MIT.EDU: Connection refused
		m38-246-p.MIT.EDU: Connection refused
		m54-419-p.MIT.EDU: Connection refused
		n10-210-p.MIT.EDU: Connection timed out
		n10-210-p.MIT.EDU: Connection timed out
		pal-p.MIT.EDU: Connection refused
		pal-p.MIT.EDU: Connection refused
		sol.MIT.EDU: Connection timed out
		sol.MIT.EDU: Connection timed out
		tim.MIT.EDU: Connection refused
		eve: Permission denied
		p13-470-1: Permission denied

	workstations in public clusters: we checked W20, B11, B37, B66
		to make sure machines were on and logins were possible

- I put a motd on telling users that Themis is unavailable.

- Thanks to Mark Eichin for letting us know about this outage as it was
  occurring. Thanks to Lucien for helping shut things down. Thanks to Brian
  for working diligently throughout the day.

- If noteworthy events occur on Sunday, I'll write again.

			Anne (maybe I'll find chocolate eggs in B11!)


home help back first fref pref prev next nref lref last post