[1747] in linux-scsi channel archive
Re: Bug in SCSI drivers w/2.0.29
daemon@ATHENA.MIT.EDU (Leonard N. Zubkoff)
Wed Apr 23 04:46:02 1997
Date: Wed, 23 Apr 1997 00:39:05 -0700
From: "Leonard N. Zubkoff" <lnz@dandelion.com>
To: Titus_Boxberg@public.uni-hamburg.de
CC: linux-scsi@vger.rutgers.edu
In-reply-to: <335B249B.36D24EEE@public.uni-hamburg.de> (message from Titus von
Boxberg on Mon, 21 Apr 1997 10:26:03 +0200)
Date: Mon, 21 Apr 1997 10:26:03 +0200
From: Titus von Boxberg <Titus_Boxberg@public.uni-hamburg.de>
I recently encountered a bug in the SCSI-Drivers:
I'm using 2.0.29 with adaptec 2940 with harddisk and a tape drive
When you disallow the tape to be disconnected from the SCSIbus during
operation you get a reproducible error when the tape executes a long
operation (like rewind).
Kerneltrace shows that the SCSIbus is reset because the DISK driver
(sd.c) thinks that the bus is dead.
The bug is that the sd.c (or someone else on behalf of the command of
sd.c which has a much shorter timeout - 15s compared to 14000s with
st.c) doesn't recognize that a valid command (that of st.c) is using the
scsi bus and everything is fine but simply taking very long.
The scsi.c should schedule the command of sd.c at a later time or at
least disable reset when it is executing a valid command from another
module (like tape driver).
You can reproduce the error with disconnection of tape device disabled
together with the command
tar cvWf /dev/st0 usr
You should back up not a small dir so that the rewind takes enough time
to let sd.c get into its timeout.
I'm no kernel hacker, so it takes much longer time for me to find the
appropriate place. I think this should be easy for someone who knows the
drivers.
The alternative is to leave the bug and to bail out while autodetection
if disconnection is disabled (it's really stupid to disable
disconnection...)
Yes, this is correct. Disabling disconnect/reconnect should only be done as a
last resort. The conservative approach is always to leave disconnect/reconnect
enabled.
We are looking at ways of improving this timeout handling. Unfortunately, this
particular case will probably always be problematic and require a runtime
configuration option. If the kernel has no way of knowing whether
disconnect/reconnect is disabled, it isn't necessarily a good idea to inhibit
aborting a timed out disk command. In cases where we know disconnect/reconnect
is disabled, however, holding any timeouts until after a long tape command
completes would be appropriate.
Leonard