[1495] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

Re: elevator sorting for the scsi subsystem.

daemon@ATHENA.MIT.EDU (Dario_Ballabio@milano.europe.dg.co)
Sun Mar 2 17:33:32 1997

Date: 	Sun, 2 Mar 1997 23:30:00 +0100
From: Dario_Ballabio@milano.europe.dg.com
To: linux-scsi@vger.rutgers.edu, groudier@club-internet.fr

%UATTACH
>Cc: linux-scsi@vger.rutgers.edu, lnz@dandelion.com
>In-Reply-To: <199703021849.TAA18785@milano.europe.dg.com>
>On Sun, 2 Mar 1997 Dario_Ballabio@milano.europe.dg.com wrote:
>
>> %UATTACH
>> The actual problem with ll_rw_blk.c is that it effectively always sorts in
>> ascending order (and the sawtooth doubles the seek distance compared to the
>> elevator sort), but only if the queue to the SCSI device is already filled up.
>> When there is still space on the queue to the SCSI device, requests are
>> released immediately and hence they are not sorted at all.
>> So we have the paradox that requests are likely to be sorted when the
>> device queue depth is small, but not sorted when the device queue depth is
>> large.
>> The elevator sort implementation in the EATA driver sorts inside the
>> device queue depth and hence its performance increase proportionally to
>> the queue depth (the correct performance factor is Q/3, where Q is the
>> device queue depth).
>> About the device optimization I do no know how safe is to rely on it for a
>> mix of read and write requests. We are eventually forced to use ordered
>> queue tags for write requests, loosing most of the gain possible with
>> a good elevator sorting implementation.
>
>First, I think that there is no reason to use ordered queue tag if the 
>kernel and disk firmwares act correctly.
Agreed, as long as simple queue tags keep data integrity.
>
>Then, unless I am very tired, a performance factor of Q/3 has no sense 
>for me. I need to know what 'performance' and 'factor' mean in this 
>context and what I must calculate with such values.
I used a wrong word. I intended "average seek distance reduction factor".
>
>My opinion about disk IO optimization algorithms is that the right 
>places for them are:
>
>1. As close as possible to the application program.
>2. As close as possible to the physical components of disks.
>
>(1) makes very complex applications and so, the kernel is a better 
>    candidate for optimization algorithm like command reordering/sorting, 
>    IO clustering, (asynchronous) read ahead etc...
>    However, doing such optimization at application level in some 
>    rare cases must not be excluded.
>
>(2) the disk firmware is obviously the right place for optimization 
>    since all physical parameter can be known by it at any time.
>
>So, any optimization that is done in another place in the scsi layers 
>is just "crystal ball based", in my opinion, and is worse than nothing 
>each time it increase latency.
>
>I think that the real vertue of the couple driver+HBA are:
>
>1. Allow upper layers (scsi drivers) to use all available scsi features 
>   and disks features.
>
>2. Have very low latency.
>
>If kernel algorithms are not good enough, we must (try to) improve them.
>
Ok, this is exactely my point too.
>Any sort algorithm in a low-level driver that increase performance, 
>only prooves that there is room for improvement in the kernel.
As a matter of fact this is probably the case. The work that I did
would be more useful if integrated in a higher level, so all the
scsi drivers could immediately benefit of it.
>In my opinion, that's the same with assumed intelligent controllers.
>Such layers of the scsi subsystem act as a prosthesis for silly 
>kernels and/or hard disk firmwares when they are able to increase 
>performances.
>
>
>Gerard.
>
Sorting effects increase with the size of the batch which is
going to be sorted (in terms of average seek distance by a factor
of Q/3), so the size Q of the batch should be as large as possible.
My point is that if we just sort read requests, batches are likely
to be much smaller than if we sort both read and writes.
Batches with a mix of reads and writes can be safely reordered as long
as there are no overlapping requests, which is by itself a very rare
event. I implemented all of this in the driver because it is much
simpler for me working at that level, it is not intended to be the
right thing forever.

db

home help back first fref pref prev next nref lref last post