[1659] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

Re: SCSI_RESET_SYNCHRONOUS/ASYNCRONOUS

daemon@ATHENA.MIT.EDU (Leonard N. Zubkoff)
Tue Apr 1 12:37:15 1997

Date: 	Tue, 1 Apr 1997 09:09:29 -0800
From: "Leonard N. Zubkoff" <lnz@dandelion.com>
To: eich@crunch.ikp.physik.th-darmstadt.de
CC: linux-scsi@vger.rutgers.edu
In-reply-to: <9704011359.AA21195@crunch> (message from Egbert Eich on Tue, 1
	Apr 1997 15:59:36 +0200)

  Date: 	Tue, 1 Apr 1997 15:59:36 +0200
  From: Egbert Eich <eich@crunch.ikp.physik.th-darmstadt.de>

  is there anyone on this list who is able to explain to me
  how the low-level scsi drivers are supposed to handle
  the flags SCSI_RESET_SYNCHRONOUS/SCSI_RESET_ASYNCHRONOUS
  defined in drivers/scsi/scsi.h?
  I haven't found any documentation on these flags in the
  sources.

Is linux-scsi archived somewhere?  These changes were announced with
explanation last year.  Appended below is the explanation.

		Leonard


From lnz@dandelion.com Mon Apr 22 08:35:56 1996
Date: 	Thu, 18 Apr 1996 11:32:21 -0700
From: "Leonard N. Zubkoff" <lnz@dandelion.com>
To: linux-scsi@vger.rutgers.edu
Subject: SCSI Changes in Recent Kernels
Precedence: bulk

This message describes a number of modifications that I recently made to the
SCSI subsystem in the hope of increasing the robustness by correcting some
design problems and reducing the probability of race conditions occurring.  In
addition, I've installed some performance improvements that should ameliorate
some of the bottlenecks I discovered while performance tuning my BusLogic
driver.

Driver authors please note: Nothing I've done should make an existing driver
less reliable than it was before.  However, to get maximum benefit from these
changes you should read this message carefully and consider whether your driver
is vulnerable to some of these same problems.

No doubt there's something I've forgotten to mention here, so feel free to ask
or consult the BusLogic driver to see what it does.  I've also made these SCSI
changes to 1.2.13, and they are packaged with the new release of my BusLogic
driver.  It may well be overly paranoid in some areas at this point, but it can
survive commands timing out and being aborted or reset every 30 seconds for
many hours without crashing.

		Leonard


				RACE CONDITIONS

One of the problems I've reported on previously is that it was effectively
impossible for the mid level SCSI code and the driver to be sure that when a
timeout occurs, any subsequent action is referring to the correct command
rather than a new command that happens to be using the same Scsi_Cmnd
structure.  If an interrupt and command completion happens at the wrong
instant, the timeout code may already be irrevocably committed to calling the
driver's abort or reset function.  To resolve this problem, I have introduced
two new elements into the Scsi_Cmnd structure:

    /*
      A SCSI Command is assigned a nonzero serial_number when internal_cmnd
      passes it to the driver's queue command function.  The serial_number
      is cleared when scsi_done is entered indicating that the command has
      been completed.  If a timeout occurs, the serial number at the moment
      of timeout is copied into serial_number_at_timeout.  By subsequently
      comparing the serial_number and serial_number_at_timeout fields
      during abort or reset processing, we can detect whether the command
      has already completed.  This also detects cases where the command has
      completed and the SCSI Command structure has already being reused
      for another command, so that we can avoid incorrectly aborting or
      resetting the new command.
    */

    unsigned long serial_number;
    unsigned long serial_number_at_timeout;

The scsi_main_timeout, scsi_abort, and scsi_reset functions all compare these
elements to be certain that the present serial number matches the one for which
the timeout was requested.  The scsi_main_timeout routine now makes two passes
over the SCSI command structures for each host.  First, it determines hich
commands have timed out and copies their serial_number field to
serial_number_at_timeout.  Then it makes another pass over the commands calling
scsi_times_out to handle abort/reset processing.  However, before each such
call and down inside scsi_abort and scsi_reset, we again make sure that we are
working on the correct command.

Since the driver's abort or reset function is generally called with interrupts
enabled, they too is vulnerable to an interrupt occurring and the command
completing.  To handle this, the abort and reset functions must disable
interrupts or otherwise prevent any completion processing from occurring, and
include code such as:

  /*
    Acquire exclusive access to Host Adapter.
  */
  BusLogic_AcquireHostAdapterLock(HostAdapter, &Lock);
  /*
    If this Command has already completed, then no Abort is necessary.
  */
  if (Command->serial_number != Command->serial_number_at_timeout)
    {
      printk("scsi%d: Unable to Abort Command to Target %d - "
	     "Already Completed\n", HostAdapter->HostNumber, TargetID);
      Result = SCSI_ABORT_NOT_RUNNING;
      goto Done;
    }

The call to BusLogic_AcquireHostAdapterLock is equivalent to save_flags() and
cli() at the moment.

In the case of the reset function, life is a little more complicated.  The
reset function can be called both asynchronously when a timeout occurs, or
synchronously due to trying a reset when a command fails and gets halfway
through the allotted retries.  The present interface was incapable of
maintaining this critical distinction, and so a reset_flags argument was added
to the reset function.  For this reason, compiling hosts.c will cause a warning
until the driver's reset command function adopts the new argument.

The problem was that correct behavior for the driver is different depending on
whether a reset is asynchronous or synchronous.  For a synchronous reset, the
command cannot still be active in the driver, and the driver is responsible for
restarting it if it returns SCSI_RESET_SUCCESS.  For an asynchronous reset, the
driver is responsible for restarting the command *only* if it still has
ownership of it.  With the old interface, there was no way for the driver to
determine whether or not it should restart a command that it did not think was
presently active.

With the new interface, the driver's responsibilities are clear:

If SCSI_RESET_SYNCHRONOUS is set in reset_flags and SCSI_RESET_SUCCESS is
returned, then the driver is responsible for restarting the command by calling
the completion routine with result DID_RESET<<16.

If SCSI_RESET_ASYNCHRONOUS is set in reset_flags, then the driver is
responsible for restarting the command if and only if the command was actually
active at the driver level.

If neither of these are set, the call must be internal to the driver itself.

Not keeping these two cases distinct led to some nasty crashes since calling
the completion routine twice for the same command or incorrectly when the
command structure is being used for another operation is not very healthy.

Furthermore, I've introduced the return status SCSI_RESET_NOT_RUNNING so that
the reset function can avoid race conditions in the asynchronous case just as
the abort function must.

Also, since I had to add the argument anyway, it has the additional flags bits
SCSI_RESET_SUGGEST_BUS_RESET and SCSI_RESET_SUGGEST_HOST_RESET which replace
the suggest_bus_reset field in the scsi_host structure.  The distinction
between bus reset and host adapter reset is necessary for multi-channel host
adapters.  Mike Neuffer ahs been working on the multi-channel support, and the
remaining changes to handle this correctly will be available from him soon.

At the present time, the only way for drivers to be safe from race conditions
is to look exceedingly carefully at the effects of an interrupt occurring at
any point in the driver's processing.  The comparison of serial_number and
serial_number_at_timeout in scsi_abort and scsi_reset will help, but there is
still a timing window where problems can arise, and of course there are
problems that are specific to each host adapter and driver.


				  PERFORMANCE

The handling of Scsi_Cmnd structures was a significant problem when large queue
depths are used and when large numbers of commands are executed per second.  In
the old scheme, allocate_device and request_queuable did a linear search
through all the Scsi_Cmnd structures allocated to the host adapter in order to
find one that had the correct target ID and logical unit for the device.  While
this was not a bottleneck when 7 devices each using a queue depth of 3 are
being used (i.e. 21 entries total), it is a significant problem for 7 or more
disks with a queue depth of 31 each.  This problem has been corrected by
introducing a device_queue element to the SCSI Device structure.  The
device_queue is a linked list of precisely those Scsi_Cmnd structures that are
assigned to the device; this bounds the linear searches in allocate_device and
request_queuable by the queue depth.

Another major bottleneck was the processing in update_timeout.  Each time a
command was queued or completed, all Scsi_Cmnd structures in the system were
scanned and updated.  At a low I/O rate this wasn't a problem, but when
executing 3000 SCSI commands per second, we were calling update_timeout to scan
a list of 200 entries 6000 times per second, even though nothing would happen
more often than 100 times per second.  I've corrected this by short circuiting
the update_timeout processing whenever the current jiffies count matches the
last one.  This limits the processing to the clock rate (HZ).  It's not the
best long term solution, but it is quite safe and easy to implement.

Finally, I've implemented a mechanism whereby drivers can decide how deep a
queue to provide for each device after the bus has been scanned and before the
Scsi_Cmnd structures are allocated.  A driver can provide a select_queue_depth
routine by assigning to this slot in the Scsi_Host structure:

    void (*select_queue_depths)(struct Scsi_Host *, Scsi_Device *);

This routine should assign the desired queue depth to the Device->queue_depth
field, which will then be used in preference to ths cmd_per_lun field in the
host structure.  Since it has access to the entire list of devices found, it
can be as intelligent as desired.

This allows for large queue depths on fast hard disks that support tagged
queuing, without forcing the same queue depth on disks that don't implement
tagged queuing or on CD-ROMs and tapes.  Look at BusLogic_SelectQueueDepth to
see one way of handling this.  It assigns a queue depth of 3 to each non-tagged
device, and then splits up the total queue depth available in the host adapter
among the disks that support tagged queuing.

home help back first fref pref prev next nref lref last post