[726] in linux-scsi channel archive
Re: Serious problem with SCSI error handling
daemon@ATHENA.MIT.EDU (Leonard N. Zubkoff)
Fri Oct 27 02:39:10 1995
Date: Thu, 26 Oct 1995 10:04:02 -0700
From: "Leonard N. Zubkoff" <lnz@dandelion.com>
To: eric@aib.com
Cc: jbotz@orixa.mtholyoke.edu, angio@aros.net, ncr53c810@mroe.cs.colorado.edu,
linux-scsi@vger.rutgers.edu
In-Reply-To: <9510251938.ZM3838@aib.com> (eric@aib.com)
From: "Eric Youngdale" <eric@aib.com>
Date: Wed, 25 Oct 1995 19:38:36 -0400
I would beg to differ. The mid level code has all sorts of
checks so that it will retry commands that timeout and attempt to abort
reset if the bus would appear to be hung. Unfortunately the low level
driver needs to know what to do when this sort of request comes along.
I would have to concur with this. Getting the driver error recovery code
working correctly is not an easy task, and takes extensive testing. Moreover,
it's often hard to generate errors to even test the driver error recovery.
I've yet to see the mid level code fail to retry when it should.
Most people do not experience these problems - usually the
drive will remap sectors automatically for you if it can and if this feature
is enabled. In other cases, the drive reports a bad sector, and the
filesystem or whatever must deal with it somehow.
This is an area that needs work, in my opinion. If an error occurs reading a
block that the EXT2 file system really wants (e.g. directory or indirect), it
tends to panic killing the process attempting the access, and all too often
leaving that entire disk inaccessible to any other process. I think the file
system code should be as tolerant as possible of errors; perhaps a printk and
sleeping followed by a high level retry wouldn't hurt, since there isn't
anything else for that process to do anyway.
Leonard