[1505] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

Re: elevator sorting for the scsi subsystem.

daemon@ATHENA.MIT.EDU (Gerard Roudier)
Mon Mar 3 15:30:43 1997

Date: 	Mon, 3 Mar 1997 21:25:28 +0000 (GMT)
From: Gerard Roudier <groudier@club-internet.fr>
To: "Leonard N. Zubkoff" <lnz@dandelion.com>
cc: Dario_Ballabio@milano.europe.dg.com, linux-scsi@vger.rutgers.edu
In-Reply-To: <199703030111.RAA04519@dandelion.com>


On Sun, 2 Mar 1997, Leonard N. Zubkoff wrote:

>   Date: 	Mon, 3 Mar 1997 01:03:24 +0000 (GMT)
>   From: Gerard Roudier <groudier@club-internet.fr>
> 
>   Well. In my opinion, the right layer for such stuff in Linux 
>   is ll_rw_blk.
> 
>   As far as I remember, disk IOs are done 
>   this way under Linux.
> 
>   1 - Plug the device.
>   2 - Queue IO in severall chunks. Io chunks are then coalesced and 
>       reordered on the fly by make_request() and add_request().
> 
> Actually, we should only coalesce as many requests as can be handed off to the
> host adapter as a single request.  The present code is very poor in the case
> where the host adapter cannot accept as large a scatter/gather list as the
> make_request/add_request level created.  In that case, we split the request,
> but the second part is not queued separately; it has to wait for the first part
> of the request to complete.

Agreed about this weakness of the present algorithm. However, it only 
affects poor scsi adapters. I'am not sure they could have much better 
performance with such an improvement.

Agreed about the fact that the current algorithm is poor.

The strategy applied by ll_rw_blk about calescence and request sorting 
should probably be moved in the sd driver. That was not the choice for 
the current scsi implementation in Linux and in my opinion it has 
been a bad choice.

If we dont want or dont have time enough to rewrite a new scsi code, we 
should at least make some short term changes that will allow low-level 
drivers writers and maintainers to simply manage critical situations.

I wrote, about six months ago, that we should allow low-level driver to 
tune at any time the credit of commands they can accept for a device.
I'am not sure that's possible with the current scsi code.

Such an improvement, would allow to manage intelligently QUEUE FULL 
status, without having to queue too much commands inside 
low-level drivers.

The same way, such an improvement, would allow low-level drivers to give 
some feedback at any time to the upper drivers in order to try to minimize 
the number of scsi commands that are not actually queued to the 
controller and/or the device when it is possible to control that.
That's feedback technics, that's simple and that should work great, in 
my opinion.

It is obvious (at least for me), that if low-level drivers have very 
often to keep lots of commands in their internal queues, they disallow 
kernel optimization algorithms to work properly. Tuning the credit of 
commands cleverly, should allow low-level drivers to give relevant feed 
back to the upper layers in order to avoid this bad situation.

At the moment, the credit is tailored at init time with the queue_depth 
field. Using a large value for queue depth means that the risk to 
have lots of commands locally queued in scsi layers is very probable.
Each time this condition are met, the kernel optimization algorithms 
can only do bad work, even if they were perfect.

I often read reports about large tagged command queueing reducing 
performances. That's true but in my opinion, is often due to 
the problem I describe above.

With a command credit that cannot be tune at any time, but can only 
be assigned at init time, we must use reasonnable value for the 
credit (or queue_depth), like 4, 8, why not 12 when we use only 
high end hard disks on a scsi bus. Using larger value have every 
chance, in my opinion, to break kernel optimization algorithms.

I use 8 with my Atlas, and I donnot expect better performance at all 
with larger value with the current Linux scsi disk IO strategy.


Gerard.

home help back first fref pref prev next nref lref last post