[8203] in linux-scsi channel archive


home	help	back	first	fref	pref	prev	next	nref	lref	last	post

Re: recovery behaviour with 1 bad + 1 good drive (aic7xxx)

daemon@ATHENA.MIT.EDU (Doug Ledford)
Thu Feb 24 00:58:30 2000

Message-ID: <38B4C01F.14CD5E4@redhat.com>
Date:   Thu, 24 Feb 2000 00:22:39 -0500
From: Doug Ledford <dledford@redhat.com>
MIME-Version: 1.0
To: Matthias Andree <ma@dt.e-technik.uni-dortmund.de>
Cc: Ishikawa <ishikawa@yk.rim.or.jp>,
	Guest section DW <dwguest@win.tue.nl>, linux-scsi@vger.rutgers.edu
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Matthias Andree wrote:
> 
> >  - SCSI bus slowed down due to certain errors. Again, I forgot
> >    what caused this type of messages, but there are occasions
> >    when the SunOS reported that the transfer between certain
> >    devices is graded down to a slower speed after seeing certain
> >    type of errors. I think this DOES involve RESET.
> 
> AFAICS, there is no reason, the speed is negotiated for every transfer,
> it's just cached at some point.

The speed is negotiated when needed, and once an agreement is in place it
stays there until a new one is negotiated or a bus/device reset occurs.

> > Anyway, it would be nice to see linux's SCSI system to become more robust
> > in terms of handling these exceptional cases.
> > These are exceptional indeed, but when they happen, we are
> > often in a mess.
> 
> Still, it's a normal condition that a drive fails, be it that bad
> blocks are not transparently reassigned, be it that a CD-ROM is
> scratched.

A failing disk is not normal, it is an error condition.  If it happens to be
something that occurs more often that we would like to see it happen, then so
be it, but that doesn't make it normal, just a common error condition.  It
also is far from being the only error condition the mid level SCSI code tries
to handle.  There happen to be quite a few error conditions that will lock the
SCSI bus up and require a reset in order to continue that are not permanent
and fatal errors.  In those cases the bus resets we throw are quite the life
saver.

> > I would rather see the system run at reduced scsi bus
> > speed if necessary and/or a process or two become hung
> > than to face a non-working system not responding to our keyboard
> > in the morning.
> 
> No way. If the bus goes too fast, let the machine crash hard. The
> administrator can switch the speed, not the machine itself. If the bus
> cable is too long, the user should know that and set a lower speed.

Well, you and the SCSI-3 SPI interface spec are formally at odds then (and I
disagree with you as well).  If I want to implement the complete SCSI-3 SPI
spec, then that includes domain validation, the whole point of which is to
find the highest reliable transfer speed by repeated testing of the SCSI bus
and to automatically slow the bus down on speed related failures.

-- 

 Doug Ledford <dledford@redhat.com>  http://people.redhat.com/dledford
      Please check my web site for aic7xxx updates/answers before
                      e-mailing me about problems

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.rutgers.edu


home	help	back	first	fref	pref	prev	next	nref	lref	last	post

[8203] in linux-scsi channel archive

Re: recovery behaviour with 1 bad + 1 good drive (aic7xxx)

daemon@ATHENA.MIT.EDU (Doug Ledford)Thu Feb 24 00:58:30 2000

daemon@ATHENA.MIT.EDU (Doug Ledford)
Thu Feb 24 00:58:30 2000