[3046] in linux-scsi channel archive


home	help	back	first	fref	pref	prev	next	nref	lref	last	post

RE: RAID & unhappy scsi driver

daemon@ATHENA.MIT.EDU (Mr M S Aitchison)
Tue Jan 6 16:28:41 1998

Date: 	Wed, 07 Jan 1998 09:56:35 +1300
From: physmsa@cantua.canterbury.ac.nz (Mr M S Aitchison)
To: linux-scsi@vger.rutgers.edu

We have had problems under Solaris with disk faults (Seagate again, as
it happens) which also make the system virtually dead, and I can agree
that problems like yours are annoying.  While I can see it is not the
operating system's fault, I'd love there to be some option ('cause it
would soak up some RAM) in the system, especially for SCSI, that
explains as clearly as possible what is going on when faced with
excessive retries, repeated errors, or important drives not
responding.

Something like the following fictional message:

 SERIOUS WARNING: The sdb disk (SEAGATE  ST15230N         0638)
 is not responding; it has failed 8 times in the past 6 hours so
 after a SCSI reset I will sync all the disks possible, disable logins,
 and try to spin-down the disk for an hour in case it is overheating.
 The response is governed by the /etc/panic.action file, read at startup.
 The other SCSI devices are responding correctly. The faulty disk is
 important to the system (contains /usr/local) and is not mirrored.
 You should shut down the system and examine/repair the disk with
 target: 01, Lun:00 on the first SCSI controller (Adaptec AIC-7880U). 
 The last error message (Jun 17 06:16:12) from the drive was:
 Vendor 'SEAGATE': ASC = 0x31 (<vendor unique code 0x31>), ASCQ = 0x0, FRU = 0x7

> From dledford@dialnet.net Mon Jan  5 22:39:59 1998
> On 05-Jan-98 Linas Vepstas wrote:
> >I am disappointed to point out the following kernel "bug":
> >
> >Recently set up RAID w/ several seagates & adaptec 2940 on 2.0.33
> >kernel.
> >After a few weeks, one of the drives failed.
> >
> >I was unhappy to find the machine all-but locked up as a result,
> >un pingable, un telnetable, etc.  (although the keyboard did wake
> >up the sleeping monitor.)  Appearently the aic7xxx driver entered
> >into some sort of infinite loop attempting to reset the scsi disk.
> 
> Not likely.  More likely the mid level SCSI code sent the same commands back
> time after time and they timed out resulted in the SCSI code calling the
> aic7xxx_reset() routine repeatedly.

-------------------------------------------------------------------------
Mark Aitchison,                 \_   Phone: +64 3 364-2947 home 337-1225
Dept of Physics & Astronomy,    </     Fax: +64 3 364-2469  or  364-2999
University of Canterbury,      /)   E-mail: phys169@csc.canterbury.ac.nz
Christchurch, NEW ZEALAND.    (/'     www.phys.canterbury.ac.nz/~physmsa
-------------------------------------------------------------------------


home	help	back	first	fref	pref	prev	next	nref	lref	last	post

[3046] in linux-scsi channel archive

RE: RAID & unhappy scsi driver

daemon@ATHENA.MIT.EDU (Mr M S Aitchison)Tue Jan 6 16:28:41 1998

daemon@ATHENA.MIT.EDU (Mr M S Aitchison)
Tue Jan 6 16:28:41 1998