[1491] in linux-scsi channel archive
Re: elevator sorting for the scsi subsystem.
daemon@ATHENA.MIT.EDU (Leonard N. Zubkoff)
Sun Mar 2 15:54:06 1997
Date: Sun, 2 Mar 1997 12:44:37 -0800
From: "Leonard N. Zubkoff" <lnz@dandelion.com>
To: Dario_Ballabio@milano.europe.dg.com
CC: linux-scsi@vger.rutgers.edu
In-reply-to: <199703022017.VAA21095@milano.europe.dg.com>
(Dario_Ballabio@milano.europe.dg.com)
Date: Sun, 2 Mar 1997 21:17:04 +0100
From: Dario_Ballabio@milano.europe.dg.com
If a request writes sectors 2 and 3, filling with 1's the two blocks
and then another request writes sectorr 1 and 2 filling with 0's,
at the end sector 2 shoud be filled with 0's. If we sort the two
requests or if the drive reorder them, at the end sector 2 is
filled with 1's, which is not the expected result.
I cheched that there are (rare) cases of overlapping write requests
in normal operations, tipically at the very beginning of the
disk partition. My concern is that if we use simple queue tags
for write requests, cases as the above are likely to happen soon
or later, causing randomic disk corruptions.
I belive that we can safely sort write requests only provided that
there are no overlapping requests in the batch to be sorted.
In any case I would always use ordered queue tags for write requests.
If overlapping I/O's are generated and the driver sorts the requests, then you
are correct that we could have a problem, but I don't think you can convince
the kernel to rewrite sector 2 until the first I/O writing sectors 2 and 3 is
completed. Once an I/O request is made in make_request, the buffer header is
locked (preventing further requests) until the I/O completes in
end_scsi_request.
However, even this wouldn't be a problem if the disk does the reordering,
though it is a problem if the driver or disk controller does. Unless the disk
is specifically allowed to reorder the effect of commands by setting the Queue
Algorithm Modifier bit in the Control Mode Page, it will prevent the case above
(see below for excerpt from SCSI-2 spec).
I've been running systems with tagged queuing in this fashion, even with the
Queue Algorithm Modifier bit set to 1, for almost two years now, and have never
had such a corruption problem. And ordered queue tags are only generated when
necessary to avoid starvation.
If there is a special case where we can generate overlapping write requests as
you indicate, I think it would be better to prevent this special case from
occurring than constrain our ability to use tagged queuing.
Leonard
The queue algorithm modifier field (see table 97) specifies restrictions
on the algorithm used for reordering commands that are tagged with the
SIMPLE QUEUE TAG message.
Table 97 - Queue algorithm modifier
+===========-====================================+
| Value | Definition |
|-----------+------------------------------------|
| 0h | Restricted reordering |
| 1h | Unrestricted reordering allowed |
| 2h - 7h | Reserved |
| 8h - Fh | Vendor-specific |
+================================================+
A value of zero in this field specifies that the target shall order the
actual execution sequence of the commands with a SIMPLE QUEUE tag such
that data integrity is maintained for that initiator. This means that,
if the transmission of new commands is halted at any time, the final value
of all data observable on the medium shall have exactly the same value as
it would have if the commands had been executed in the same received
sequence without tagged queuing. The restricted reordering value shall
be the default value.
A value of one in this field specifies that the target may reorder the
actual execution sequence of the commands with a SIMPLE QUEUE tag in any
manner. Any data integrity exposures related to command sequence order
are explicitly handled by the initiator through the selection of
appropriate commands and queue tag messages.