[1496] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

Re: elevator sorting for the scsi subsystem.

daemon@ATHENA.MIT.EDU (Gerard Roudier)
Sun Mar 2 19:07:12 1997

Date: 	Mon, 3 Mar 1997 01:03:24 +0000 (GMT)
From: Gerard Roudier <groudier@club-internet.fr>
To: Dario_Ballabio@milano.europe.dg.com
cc: linux-scsi@vger.rutgers.edu
In-Reply-To: <199703022230.XAA23929@milano.europe.dg.com>


On Sun, 2 Mar 1997 Dario_Ballabio@milano.europe.dg.com wrote:

> >I think that the real vertue of the couple driver+HBA are:
> >
> >1. Allow upper layers (scsi drivers) to use all available scsi features 
> >   and disks features.
> >
> >2. Have very low latency.
> >
> >If kernel algorithms are not good enough, we must (try to) improve them.
> >
> Ok, this is exactely my point too.
> >Any sort algorithm in a low-level driver that increase performance, 
> >only prooves that there is room for improvement in the kernel.
> As a matter of fact this is probably the case. The work that I did
> would be more useful if integrated in a higher level, so all the
> scsi drivers could immediately benefit of it.

Well. In my opinion, the right layer for such stuff in Linux 
is ll_rw_blk.

As far as I remember, disk IOs are done 
this way under Linux.

1 - Plug the device.
2 - Queue IO in severall chunks. Io chunks are then coalesced and 
    reordered on the fly by make_request() and add_request().
3 - Unplug the device.

(2) seems to proceed as follow:

make_request():
- try to coalesce the request with another one.
  If it is possible, just return, else call add_request()

add_request():
- if no current_request (on no plug) call request_fn() and return.
  for scsi device call request_fn() unconditionnaly, but the plug 
  status is RQ_INACTIVE and prevent the driver to start IO until 
  the plug is removed.

Etc ...

Evrything can be improved and probably this stuff too. Unless I have 
missed something important, I donnot agree with the need to reorder 
IO requests in another place of the kernel.

In my opinion, layers that are between add_request() and the HBA 
must not keep requests or scsi commands in queues.
These commands must be queued to the controller as quickly as possible 
and good and well-tuned hard disks (with tagged commands enabled) must 
disconnect quickly is order to be ready to accept other commands 
as quickly as possible.

The queue depth must be tune in order to allow that.
A very low latency help a lot.

> >In my opinion, that's the same with assumed intelligent controllers.
> >Such layers of the scsi subsystem act as a prosthesis for silly 
> >kernels and/or hard disk firmwares when they are able to increase 
> >performances.

> Sorting effects increase with the size of the batch which is
> going to be sorted (in terms of average seek distance by a factor
> of Q/3), so the size Q of the batch should be as large as possible.

Only the firmware is able to know the rotationnal position of the disk.

60seconds/5400 = 11 ms
60seconds/7200 = 8,3 ms 

Those numbers are comparable to the disk access time.
Does your algorithm try to guess such things?

> My point is that if we just sort read requests, batches are likely
> to be much smaller than if we sort both read and writes.
> Batches with a mix of reads and writes can be safely reordered as long
> as there are no overlapping requests, which is by itself a very rare
> event. I implemented all of this in the driver because it is much
> simpler for me working at that level, it is not intended to be the
> right thing forever.

In my opinion, the reason of the increase of performance, you observed 
are among the following:

1 - ll_rw_blk() does something wrong that I just missunderstood.
2 - The couple driver/HBA has so high latency and/or the disk cache is 
    so stuffed and the disk does not want to disconnect.
3 - The queue depth is too large for the hard disk (implies 2).
4 - The hard disk firmware is silly.
5 - The hard disk parameters values that are related to disconnections 
    are bad tuned.
6 - Your benchmarks are questionnable.


Gerard.

home help back first fref pref prev next nref lref last post