[2534] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

Re: Timeout for aborting a SCSI command is too short.

daemon@ATHENA.MIT.EDU (Gerard Roudier)
Thu Sep 25 11:52:28 1997

Date: 	Wed, 24 Sep 1997 19:11:38 +0200 (MET DST)
From: Gerard Roudier <groudier@club-internet.fr>
To: "Leonard N. Zubkoff" <lnz@dandelion.com>
cc: linux-scsi@vger.rutgers.edu
In-Reply-To: <199709232318.QAA26037@dandelion.com>


On Tue, 23 Sep 1997, Leonard N. Zubkoff wrote:

>   Date: Sat, 20 Sep 1997 10:49:38 +0200 (MET DST)
>   From: Gerard Roudier <groudier@club-internet.fr>
> 
>   Leonard,
> 
>   It seems that the middle scsi driver only allows 2 seconds to low-level 
>   drivers to abort a command.
>   When the device has disconnected the bus for a nexus, it is sane
>   to inform the device that this nexus is canceled prior to completing 
>   the command.
>   No behaving so required not to send a new command to the device in
>   order to avoid overlapped command condition (especially when tags are 
>   not used).
> 
>   Why did the middle scsi driver forces the timeout to be so short (2s) ?
>   (Did I miss something?)
> 
>   A simple strategy for a disconnected command to be properly canceled is 
>   to wait the relection, then to send it an ABORT (or ABORT TAG message if 
>   the command is tagged) and then to complete with error the corresponding
>   SCSI command.
> 
>   A 2 seconds time-out for this to be done is not enough for devices like 
>   tapes that may reconnect a long time after having disconnected for a
>   nexus.
> 
>   I suggest the abort timeout value to be the same as the corresponding
>   command time-out value or something deduced from (for example half the 
>   command time-out value).
> 
> 
>   Regards, Gerard.
> 
> I don't think the 2 second timeout is really a problem.  The usual host adapter
> implementation of an abort is for the initiator to select the target at the
> earliest possible opportunity and then issue an ABORT or ABORT TAG message as

It is just the active strategy I did'nt wish to implement.

> appropriate.  In at least one case I've seen where an abort clears a catatonic
> drive condition, waiting for the target to reselect is useless as the target is
> never going to.

I agree that this active strategy is the right one, and that
waiting for the device to reconnect is too weak, however,
For which reason, the device will never going on?
It is not allowed to a device to just drop a nexus on the floor.
So, in my opinion, if we consider the device will never reconnect the bus,
we should send it a 'at least' a BUS DEVICE RESET message.

Even if we actively abort the command, this 2 seconds timeout seems to 
me still too short. This assumes that no device will get the bus more 
than 2 seconds for a single selection.
I heared that some devices like scanners may lock the bus for a long time.

In my opinion, there is no reason this timeout to be so short.
In fact, its value should take into account the worse case. So its 
deadline should be based on the deadline of all commands currently
queued to the controller, and just assume the worse case.

With such considerations, even the timeouts per device should'nt be fixed 
values ...

My goal is not to complexify things, even if it seems clear that the way 
our current scsi driver handles timeout is a lot too simplistic.

If we want to keep things simple and to stick with fixed timeout values,
we should _never_ use _short_ values for timeouts.


Gerard.


home help back first fref pref prev next nref lref last post