[3046] in linux-scsi channel archive
RE: RAID & unhappy scsi driver
daemon@ATHENA.MIT.EDU (Mr M S Aitchison)
Tue Jan 6 16:28:41 1998
Date: Wed, 07 Jan 1998 09:56:35 +1300
From: physmsa@cantua.canterbury.ac.nz (Mr M S Aitchison)
To: linux-scsi@vger.rutgers.edu
We have had problems under Solaris with disk faults (Seagate again, as
it happens) which also make the system virtually dead, and I can agree
that problems like yours are annoying. While I can see it is not the
operating system's fault, I'd love there to be some option ('cause it
would soak up some RAM) in the system, especially for SCSI, that
explains as clearly as possible what is going on when faced with
excessive retries, repeated errors, or important drives not
responding.
Something like the following fictional message:
SERIOUS WARNING: The sdb disk (SEAGATE ST15230N 0638)
is not responding; it has failed 8 times in the past 6 hours so
after a SCSI reset I will sync all the disks possible, disable logins,
and try to spin-down the disk for an hour in case it is overheating.
The response is governed by the /etc/panic.action file, read at startup.
The other SCSI devices are responding correctly. The faulty disk is
important to the system (contains /usr/local) and is not mirrored.
You should shut down the system and examine/repair the disk with
target: 01, Lun:00 on the first SCSI controller (Adaptec AIC-7880U).
The last error message (Jun 17 06:16:12) from the drive was:
Vendor 'SEAGATE': ASC = 0x31 (<vendor unique code 0x31>), ASCQ = 0x0, FRU = 0x7
> From dledford@dialnet.net Mon Jan 5 22:39:59 1998
> On 05-Jan-98 Linas Vepstas wrote:
> >I am disappointed to point out the following kernel "bug":
> >
> >Recently set up RAID w/ several seagates & adaptec 2940 on 2.0.33
> >kernel.
> >After a few weeks, one of the drives failed.
> >
> >I was unhappy to find the machine all-but locked up as a result,
> >un pingable, un telnetable, etc. (although the keyboard did wake
> >up the sleeping monitor.) Appearently the aic7xxx driver entered
> >into some sort of infinite loop attempting to reset the scsi disk.
>
> Not likely. More likely the mid level SCSI code sent the same commands back
> time after time and they timed out resulted in the SCSI code calling the
> aic7xxx_reset() routine repeatedly.
-------------------------------------------------------------------------
Mark Aitchison, \_ Phone: +64 3 364-2947 home 337-1225
Dept of Physics & Astronomy, </ Fax: +64 3 364-2469 or 364-2999
University of Canterbury, /) E-mail: phys169@csc.canterbury.ac.nz
Christchurch, NEW ZEALAND. (/' www.phys.canterbury.ac.nz/~physmsa
-------------------------------------------------------------------------