[1747] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

Re: Bug in SCSI drivers w/2.0.29

daemon@ATHENA.MIT.EDU (Leonard N. Zubkoff)
Wed Apr 23 04:46:02 1997

Date: 	Wed, 23 Apr 1997 00:39:05 -0700
From: "Leonard N. Zubkoff" <lnz@dandelion.com>
To: Titus_Boxberg@public.uni-hamburg.de
CC: linux-scsi@vger.rutgers.edu
In-reply-to: <335B249B.36D24EEE@public.uni-hamburg.de> (message from Titus von
	Boxberg on Mon, 21 Apr 1997 10:26:03 +0200)

  Date: 	Mon, 21 Apr 1997 10:26:03 +0200
  From: Titus von Boxberg <Titus_Boxberg@public.uni-hamburg.de>

  I recently encountered a bug in the SCSI-Drivers:
  I'm using 2.0.29 with adaptec 2940 with harddisk and a tape drive

  When you disallow the tape to be disconnected from the SCSIbus during
  operation you get a reproducible error when the tape executes a long
  operation (like rewind).
  Kerneltrace shows that the SCSIbus is reset because the DISK driver
  (sd.c) thinks that the bus is dead.
  The bug is that the sd.c (or someone else on behalf of the command of
  sd.c which has a much shorter timeout - 15s compared to 14000s with
  st.c) doesn't recognize that a valid command (that of st.c) is using the
  scsi bus and everything is fine but simply taking very long.
  The scsi.c should schedule the command of sd.c at a later time or at
  least disable reset when it is executing a valid command from another
  module (like tape driver).

  You can reproduce the error with disconnection of tape device disabled
  together with the command
  tar cvWf /dev/st0 usr
  You should back up not a small dir so that the rewind takes enough time
  to let sd.c get into its timeout.

  I'm no kernel hacker, so it takes much longer time for me to find the
  appropriate place. I think this should be easy for someone who knows the
  drivers.
  The alternative is to leave the bug and to bail out while autodetection
  if disconnection is disabled (it's really stupid to disable
  disconnection...)

Yes, this is correct.  Disabling disconnect/reconnect should only be done as a
last resort.  The conservative approach is always to leave disconnect/reconnect
enabled.

We are looking at ways of improving this timeout handling.  Unfortunately, this
particular case will probably always be problematic and require a runtime
configuration option.  If the kernel has no way of knowing whether
disconnect/reconnect is disabled, it isn't necessarily a good idea to inhibit
aborting a timed out disk command.  In cases where we know disconnect/reconnect
is disabled, however, holding any timeouts until after a long tape command
completes would be appropriate.

		Leonard

home help back first fref pref prev next nref lref last post