[6149] in Hotline Meeting
Status of Artemis and Aphrodite
daemon@ATHENA.MIT.EDU (salemme@ATHENA.MIT.EDU)
Thu Sep 19 12:44:42 1991
From: salemme@ATHENA.MIT.EDU
Date: Thu, 19 Sep 91 12:45:15 -0400
To: hotline@ATHENA.MIT.EDU, op@ATHENA.MIT.EDU
Cc: carla@ATHENA.MIT.EDU, epeisach@ATHENA.MIT.EDU, probe@ATHENA.MIT.EDU
At about 10am this morning, aphrodite was reported by users to be very
slow. Brian found the B11 machine room very warm, the ac unit seemed non-
functional, aphro's disk 2 had dropped offline. Brian notified hotline
and called PhysPlant about the ac. Someone from PhysPlant came and toggled
the ac switch, which got it running again. By around 10:45, the room temperature
seemed about normal.
Evidently as a result of the high temperature in the room, two disk drives
which are side-by-side in the rack which is closest to the ac unit (drive
number 2 for aphro and drive number 2 for artemis) started dropping offline.
We had to hard reboot both systems. Artemis (an AFS server) seemed to come
back with no problem. Aphro's drive repeatedly faulted so that it couldn;t
reboot. We were able to get the drive spinning after about 20 min, we let
it spin for another 10 min, then rebooted apparently successfully.
Both machines are now back in service, BUT both have reported many (around
200) of the "vmunix: uda0: soft error datagram: unit 2..." messages. We
should keep an eye on these throughout the afternoon and tomorrow for
further problems.
Anne