[57828] in Hotline Meeting

home help back first fref pref prev next nref lref last post

SGI O2 capacitor-bank.mit.edu possible bad CPU/motherboard

daemon@ATHENA.MIT.EDU (Tom Yu)
Tue Jan 8 21:12:41 2002

To: hotline@mit.edu
From: Tom Yu <tlyu@MIT.EDU>
Date: 08 Jan 2002 21:12:39 -0500
Message-ID: <ldvofk4xofc.fsf@saint-elmos-fire.mit.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii

I have an SGI O2, capacitor-bank.mit.edu, that has been recently
logging some ECC errors.  Given that they're showing as being in
different DIMM slots, I suspect motherboard or CPU trouble.  The
system isn't crashing due to these, but I would like to try swapping
the CPU or the motherboard to see if that improves the situation.

Some sample syslog messages are:

Jan  2 00:01:39 6A:capacitor-bank unix: MEMORY Error/Addr 0x9020a0<SOFT ECC ERROR ON READ during MACE ACCESS>: 0x1cb6820
Jan  2 00:01:39 1A:capacitor-bank unix: |$(0x37)ALERT: Soft ECC Error in front side of DIMM Slot 1, data bit 33
Jan  2 00:01:39 1A:capacitor-bank unix: |$(0x37)ALERT: Soft ECC Error in front side of DIMM Slot 1, data bit 33 [filter /usr/sbin/sysmonpp failed: exit status 0x1]
Jan  3 00:01:27 6A:capacitor-bank unix: MEMORY Error/Addr 0x9023a3<SOFT ECC ERROR ON READ during MACE ACCESS>: 0x1cb6820
Jan  3 00:01:27 1A:capacitor-bank unix: |$(0x37)ALERT: Soft ECC Error in front side of DIMM Slot 1, data bit 33
Jan  3 00:01:27 1A:capacitor-bank unix: |$(0x37)ALERT: Soft ECC Error in front side of DIMM Slot 1, data bit 33 [filter /usr/sbin/sysmonpp failed: exit status 0x1]
Jan  5 19:17:48 6A:capacitor-bank unix: MEMORY Error/Addr 0x940000<SOFT ECC ERROR ON READ during CPU ACCESS>: 0x7ff0000
Jan  5 19:17:48 1A:capacitor-bank unix: |$(0x37)ALERT: Soft ECC Error in back side of DIMM Slot 3, check bit 15
Jan  5 19:17:48 1A:capacitor-bank unix: |$(0x37)ALERT: Soft ECC Error in back side of DIMM Slot 3, check bit 15 [filter /usr/sbin/sysmonpp failed: exit status 0x1]

hinv shows:

CPU: MIPS R5000 Processor Chip Revision: 2.1
FPU: MIPS R5000 Floating Point Coprocessor Revision: 1.0
1 200 MHZ IP32 Processor
Main memory size: 128 Mbytes

The machine is in W92-145; I'd prefer to have spare parts dropped off,
if possible, or to be present when it's being worked on.

Thanks,
---Tom

home help back first fref pref prev next nref lref last post