[1505] in linux-scsi channel archive
Re: elevator sorting for the scsi subsystem.
daemon@ATHENA.MIT.EDU (Gerard Roudier)
Mon Mar 3 15:30:43 1997
Date: Mon, 3 Mar 1997 21:25:28 +0000 (GMT)
From: Gerard Roudier <groudier@club-internet.fr>
To: "Leonard N. Zubkoff" <lnz@dandelion.com>
cc: Dario_Ballabio@milano.europe.dg.com, linux-scsi@vger.rutgers.edu
In-Reply-To: <199703030111.RAA04519@dandelion.com>
On Sun, 2 Mar 1997, Leonard N. Zubkoff wrote:
> Date: Mon, 3 Mar 1997 01:03:24 +0000 (GMT)
> From: Gerard Roudier <groudier@club-internet.fr>
>
> Well. In my opinion, the right layer for such stuff in Linux
> is ll_rw_blk.
>
> As far as I remember, disk IOs are done
> this way under Linux.
>
> 1 - Plug the device.
> 2 - Queue IO in severall chunks. Io chunks are then coalesced and
> reordered on the fly by make_request() and add_request().
>
> Actually, we should only coalesce as many requests as can be handed off to the
> host adapter as a single request. The present code is very poor in the case
> where the host adapter cannot accept as large a scatter/gather list as the
> make_request/add_request level created. In that case, we split the request,
> but the second part is not queued separately; it has to wait for the first part
> of the request to complete.
Agreed about this weakness of the present algorithm. However, it only
affects poor scsi adapters. I'am not sure they could have much better
performance with such an improvement.
Agreed about the fact that the current algorithm is poor.
The strategy applied by ll_rw_blk about calescence and request sorting
should probably be moved in the sd driver. That was not the choice for
the current scsi implementation in Linux and in my opinion it has
been a bad choice.
If we dont want or dont have time enough to rewrite a new scsi code, we
should at least make some short term changes that will allow low-level
drivers writers and maintainers to simply manage critical situations.
I wrote, about six months ago, that we should allow low-level driver to
tune at any time the credit of commands they can accept for a device.
I'am not sure that's possible with the current scsi code.
Such an improvement, would allow to manage intelligently QUEUE FULL
status, without having to queue too much commands inside
low-level drivers.
The same way, such an improvement, would allow low-level drivers to give
some feedback at any time to the upper drivers in order to try to minimize
the number of scsi commands that are not actually queued to the
controller and/or the device when it is possible to control that.
That's feedback technics, that's simple and that should work great, in
my opinion.
It is obvious (at least for me), that if low-level drivers have very
often to keep lots of commands in their internal queues, they disallow
kernel optimization algorithms to work properly. Tuning the credit of
commands cleverly, should allow low-level drivers to give relevant feed
back to the upper layers in order to avoid this bad situation.
At the moment, the credit is tailored at init time with the queue_depth
field. Using a large value for queue depth means that the risk to
have lots of commands locally queued in scsi layers is very probable.
Each time this condition are met, the kernel optimization algorithms
can only do bad work, even if they were perfect.
I often read reports about large tagged command queueing reducing
performances. That's true but in my opinion, is often due to
the problem I describe above.
With a command credit that cannot be tune at any time, but can only
be assigned at init time, we must use reasonnable value for the
credit (or queue_depth), like 4, 8, why not 12 when we use only
high end hard disks on a scsi bus. Using larger value have every
chance, in my opinion, to break kernel optimization algorithms.
I use 8 with my Atlas, and I donnot expect better performance at all
with larger value with the current Linux scsi disk IO strategy.
Gerard.