[379] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

Re: Reading more than buffersize

daemon@ATHENA.MIT.EDU (Drew Eckhardt)
Sat Jul 15 20:54:48 1995

To: eric@aib.com (Eric Youngdale)
cc: mucci@cs.utk.edu, linux-scsi@vger.rutgers.edu
In-reply-to: Your message of "Sat, 15 Jul 1995 14:31:00 EDT."
             <m0sXBzc-0009WNC@aib.com> 
Date: Sat, 15 Jul 1995 14:23:49 -0600
From: Drew Eckhardt <drew@poohsticks.org>

In message <m0sXBzc-0009WNC@aib.com>, eric@aib.com writes:
>
>>> Instead of doing things to a buffer and then copying to memory, it should 
>>> lock the relevant pages of user memory in core (there are mlock patches 
>>> floating arround somewhere for this), and do however many SCSI commands are
> 
>>> necessary (based on the number of scatter/gather segments supported by the 
>>> low level driver being used) to transfer data to those locations.
>>
>>Yes! The 0-copy approach! Most of the newer OSs are providing such things
>>as 0 copy device reads and 0 copy network stacks where possible. (I think
>>HP is pushing the envelope on this approach.) We need a generic interface
>>to do DMA direct to user space...
>
>	This would be an interesting approach.  The thing that would
>be tricky about it is that we need some generic interface to lookup
>the physical address based upon a virtual address in the user's space,

I don't think that's the worst part - 

One difficult bit is maintaining synchronicity with the buffer 
cache in a way that's compatable with performance.  If you do a single 
direct read/write, it would probably be optimal to just set your pages 
to COW and give them a context in the buffer cache as wll. 

However, the usual case will be using the same I/O buffer to loop over
an entire data set.   To get better performance here, you probably 
want to remap the memory to get a set of contiguous regions below the 
DMA transfer limit of the controller, and want to invalidate the relevant 
buffer cache entries so you allways use the same set of (now contiguous) 
pages for that buffer.  

In this case, you may also want to setup your own vm_region and 
make sure no one steals your memory out from under you, and do the 
right magic when people share it.

>plus we have to be aware that we could (and probably will) have a new
>physical page at each page boundary.  

Not necessarily.  On some reads ("some" defined as being with host adapters
that support a limited (for practical purposes) number of scatter/gather
segements it will be cheaper to allocate contiguous pages, map those in (with 
no zero-fill), copy the untouched bits of the two end pages into that 
buffer, and discard the old pages.

With writes, we may have better luck doing similar things before comitting
the data to disk (aka the cluster patches).

Also, there's the re-ordering of pages I mentioned above which will
really help.

>Also, we have to use bounce buffers
>if the low-level driver does not support DMA to addresses > 16Mb
>(see the disk code for how to do this).

See above.


>	We would also need some way of signaling whether it is
>possible/wise to split up transfers into smaller requests, or whether
>we should do the whole thing as one request.  Note that some host
>adapters (i.e. 1542) have a limit to the size of the scatter-gather table,
>so you are automatically limited to 16*4Kb=64Kb transfers per command
>(since we assume that adjacent pages in VM are not adjacent in PM.

See above.

>
>	Do we have any volunteers to take on this little project?  I
>can provide suggestions and code review, but I am a bit busy at the
>moment with other things so I don't want to take this on now.  It
>would be pretty easy to prototype and debug most of the code in user space.

0-copy transfers to 

	1.  a raw disk device
	2.  normal files using the SCSI disk driver

are at the top of my to-do list when I finish cleaning up the NCR driver
and tweaking it a bit (disconnect/reconnect works most of the time 
on all of my SCSI devices :-)), but I won't object if any one else 
cares to play with it.

>While this would be a nice hack, it would probably be preferable to 
>do this as a sort of generic interface to all scsi devices, and have 
>hooks to get to this

Yup - allow specification of an arbitrary read/write command and 
various read/write parameters.


home help back first fref pref prev next nref lref last post