[1105] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

Oh yuck - now my Seagate disks are brain-dead?!

daemon@ATHENA.MIT.EDU (Andy Poling)
Sun Dec 15 13:46:03 1996

Date: 	Sun, 15 Dec 1996 13:42:34 -0500 (EST)
From: Andy Poling <andy@realbig.com>
To: linux-eata@i-connect.net
cc: linux-scsi@vger.rutgers.edu


A funny thing happened on the way to building a RAID-5 array on my DPT
3334UW.  It apparently somehow scrambled the disks' manufacturers params.

This SCSI system consists of a DPT PM3334UW (SCSI ID 0) and 5 Seagate
ST-31055W disks (SCSI IDs 1-5).

I installed PC-DOS and windoze on a spare IDE drive, installed the DPT
windoze software and started configuring away.  

Drive 1 has my Linux (2.0.26) system on it, so it stays as a single drive
until I get it working.  No problem.

Drives 2-5 get configured as a 4-way RAID-5 array, with drive 1 to
eventually be the hot spare once I get moved over to booting off the RAID
array.

I configured all drives to have read and write write-back caching (all) on
and to use SMART emulation, scheduled (read-only) diagnostics for each drive
every morning, did a "Set System Configuration", verified that it was
building the array and exited windoze and rebooted to test whether the claim
of rebuilds spanning reboots was true.

Uh-oh.  The system reset, the DPT warbled it's boot-up greeting and the
activity lights on the drives all went out... except for disk 5, which
started slowly flashing.  The rebuild seems to have stopped. :-) The BIOS
said drive 2 was not responding and the warning beeper started beeping.
Double uh-oh.

I run windoze and run Storage Manager and see that drive 2 is red, drives 3
and 4 are white and drive 5 is black.  Oh boy.  It looks like I'm in trouble
deep now.

After a bunch of screwing around, I have determined that the drives spin up
on command, and identify themselves (partially) correctly as ST-31055W's,
but any attempt to read or write yields a check-condition (0x02) status with
a "medium error" (0x03), with a sense code of 0x31 and a sense code
qualifier of 0x00.  This is according to my Adaptec 2940.  The DPT just says
"device not ready".

When I boot Linux, the aic7xxx driver negotiates 16-bit transfers with all
the drives, but only reports the two working disks as negotiating
synchronous transfers at 20MHz.

So now, 3 of the 4 disks I configured into the RAID-5 array are playing
stupid.  I've never seen anything like this.

Help!  I'm stumped.  The obvious questions:

1) what happened?  Did I do something I shouldn't-a done?

2) what do I do with these disks?  Given that I still have two sane disks
   and they're all identical, I still hold out hope that I can somehow
   either restore the default manufacturer params (perhaps via one of the
   "reserved" jumpers on the disks?) or copy them from a working disk to a
   non-working one.  

My experience with SCSI drivers is limited to tape drives, but I assume the
get/set manufacturer params commands don't require that the medium be
ready?  Is there a tool out there anywhere for Linux (or DOS) to manipulate
these params?  I've gotten spoiled by IRIX's "fx".

Are the manufacturers params even likely the problem?

Any pointers will be gratefully accepted.  Time for the aspirin and antacid
I guess... :-)

-Andy

(this sig left intentionally blank - too many hats for one sig)



home help back first fref pref prev next nref lref last post