[6159] in Hotline Meeting

home help back first fref pref prev next nref lref last post

Aphrodite

daemon@ATHENA.MIT.EDU (ylsul@ATHENA.MIT.EDU)
Fri Sep 20 17:31:19 1991

From: ylsul@ATHENA.MIT.EDU
To: op@ATHENA.MIT.EDU
Cc: hotline@ATHENA.MIT.EDU
Date: Fri, 20 Sep 91 17:31:31 EDT


	Drive 2 on Aphrodite faulted this afternoon around 2:50. It was
the same drive that faulted when the refrigeration unit in the building
11 machine room failed. We halted aphrodite and left the disks to spin
until the repairs on the refreigeration unit were completed. Aphrodite
was then brought back up at about 5:30 pm.

Pertinent errors: (note the hard error -- to date, there is only one)

Sep 20 14:50:38 aphrodite vmunix: uda1: hard error datagram: unit 4095: lbn 0: d
rive error (ctlr detected protocol) (code 11, subcode 2)
Sep 20 14:50:38 aphrodite vmunix: uda1: attempt to bring ra2 on line failed: uni
t offline (unknown drive) (code 3, subcode 0)
Sep 20 14:50:40 aphrodite last message repeated 41 times
Sep 20 14:50:40 aphrodite vmunix: uda1: a
Sep 20 14:50:40 aphrodite vmunix: ttempt to bring ra2 on line failed: unit offli
ne (unknown drive) (code 3, subcode 0)
Sep 20 14:50:40 aphrodite vmunix: uda1: attempt to bring ra2 on line failed: uni
t offline (unknown drive) (code 3, subcode 0)
Sep 20 14:50:41 aphrodite last message repeated 14 times
Sep 20 14:50:41 aphrodite vmunix: uda1: soft error datagram (continuing): unit 4
095: lbn 0: drive error (drive detected error) (code 11, subcode 7)
Sep 20 14:50:41 aphrodite vmunix: uda1: soft error datagram (continuing): unit 2
: lbn 0: drive error (drive detected error) (code 11, subcode 7)
Sep 20 14:50:41 aphrodite vmunix: uda1: attempt to bring ra2 on line failed: uni
t offline (unknown drive) (code 3, subcode 0)
Sep 20 14:50:41 aphrodite last message repeated 9 times

***The fault button on drive 2 was reset here***

Sep 20 14:50:41 aphrodite vmunix: ra2: ra90, size = 2376153 sectors
Sep 20 14:50:42 aphrodite vmunix: ra2: unit 2, nspt 69, group 1, ntpc 13, rctsiz
e 414, nrpt 1, nrct 4


Aphrodite is currently logging A LOT of:

Sep 20 16:51:02 aphrodite vmunix: uda1: soft error datagram: unit 2: level 0 ret
ry 0, lbn 2305536: data error (6 symbol ecc) (code 8, subcode 13)
Sep 20 16:51:02 aphrodite vmunix: uda1: soft error datagram: unit 2: level 0 ret
ry 0, lbn 2305537: data error (uncorrectable ecc) (code 8, subcode 7)

	There is a sensor in the building 11 machine room that SHOULD
have alerted Physical Plant that there was something wrong. It obviously
did not do its job. This needs to be repaired.





home help back first fref pref prev next nref lref last post