[8171] in linux-scsi channel archive
Re: recovery behaviour with 1 bad + 1 good drive (aic7xxx)
daemon@ATHENA.MIT.EDU (Ricky Beam)
Wed Feb 23 00:52:42 2000
Date: Tue, 22 Feb 2000 23:04:48 -0500 (EST)
From: Ricky Beam <jfbeam@bluetopia.net>
To: Guest section DW <dwguest@win.tue.nl>
Cc: linux-scsi@vger.rutgers.edu
In-Reply-To: <20000222212936.A1097@win.tue.nl>
Message-ID: <Pine.LNX.4.04.10002222246490.12259-100000@beaker>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
On Tue, 22 Feb 2000, Guest section DW wrote:
>> Do you think a SCSI blacklist entry for 'don't bus reset a bus with
>> this piece of junk on' would help ?
>
>No. The problem is not the device, it is our code.
I would have to say it's both. Handling a CDROM the same as a hard drive,
for example, is very bad idea. But not being able to reset a device and/or
entire SCSI bus is a major flaw in the code -- it's also one of the hardest
things to test.
>If you read scsi_error.c the philosophy is: "Something went
>wrong, what can we do to get it working again?".
>And increasingly powerful measures are taken.
Also increasingly disruptive... a bus reset will require every command
submitted to every device on the bus to aborted. This can be a really
ugly mess once a command is set to a device -- you may not be able to
stop it at which point it's going to be in an indeterminate state eventually
causing timeouts... I've got an old plextor CD-ROM that disappears from
the bus when the driver does a bus reset. (Don't ask me how it shows up
following the driver's initial bus reset.)
>Indeed, there never are any disk errors. And when there are,
>then I do not want to touch that drive anymore. I do not want
In some cases, the error is non-fatal. The system can continue properly
if it simply tries again. This mostly true for removable media devices
and things with "power saving" modes.
...
>When I got to this machine the next morning, the disk was
>too hot to touch, and did not react to anything anymore.
A "hard failure" counter would be nice. How many times do you need to
walk into a brick wall before you realize it's a brick wall and move on?
>[But this is an old discussion. Maybe I am alone with the point of
>view that bus resets are terrible. At least Eric seems to think
>that the probability that something useful is achieved by a reset
>is larger than the probability that the situation only gets worse.
>I think that bus resets should be initiated by a human only.]
Yes, it is. As I recall, it's been hashed about twice before. (I may still
have the mail.)
Perhaps you should use AHA1542's :-) Last time I cared to look (ahum, back
in 1.2), it panics when ever it's "necessary" to do an abort/bus reset.
(I guess it was deemed an unnecessary portion of code.)
--Ricky
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.rutgers.edu