[56568] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: 923Mbits/s across the ocean

daemon@ATHENA.MIT.EDU (Richard A Steenbergen)
Mon Mar 10 19:36:18 2003

Date: Mon, 10 Mar 2003 19:35:36 -0500
From: Richard A Steenbergen <ras@e-gerbil.net>
To: Iljitsch van Beijnum <iljitsch@muada.com>
Cc: nanog@nanog.org
In-Reply-To: <20030311002844.J68016-100000@sequoia.muada.com>
Errors-To: owner-nanog-outgoing@merit.edu


On Tue, Mar 11, 2003 at 12:41:15AM +0100, Iljitsch van Beijnum wrote:
> > On the receive size, the socket buffers must be large enough to
> > accommodate all the data received between application read()'s,
> 
> That's not true. It's perfectly acceptable for TCP to stall when the
> receiving application fails to read the data fast enough. (TCP then
> simply announces a window of 0 to the other side so the communication
> effectively stops until the application reads some data and a >0 window
> is announced.) If not, the kernel would be required to buffer unlimited
> amounts of data in the event an application fails to read it from the
> buffer for some time (which is a very common situation).

Ok, I think I was unclear. You don't NEED to have buffers large enough to
accommodate all that data received between application read()'s, unless
you are trying to achieve maximum performance. I thought that was the
general framework we were all working under. :)

> > locally. Jumbo frames help too, but their real benefit is not the
> > simplistic "hey look theres 1/3rd the number of frames/sec" view that many
> > people see. The good stuff comes from techniques like page flipping, where
> > the NIC DMA's data into a memory page which can be flipped through the
> > system straight to the application, without copying it throughout. Some
> > day TCP may just be implemented on the NIC itself, with ALL work
> > offloaded, and the system doing nothing but receiving nice page-sized
> > chunks of data at high rates of speed.
> 
> Hm, I don't see this happening to a usable degree as TCP has no concept
> of records. You really want to use fixed size chunks of information here
> rather than pretending everything's a stream.

We're talking optimizations for high performance transfers... It can't 
always be a stream.

> > IMHO the 1500 byte MTU of ethernet
> > will still continue to prevent good end to end performance like this for a
> > long time to come. But alas, I digress...
> 
> Don't we all? I'm afraid you're right. Anyone up for modifying IPv6 ND
> to support a per-neighbor MTU? This should make backward-compatible
> adoption of jumboframes a possibility. (Maybe retrofit ND into v4 while
> we're at it.)

Not necessarily sure thats the right thing to do, but SOMETHIG has got to 
be better than what passes for path mtu discovery now. :)

-- 
Richard A Steenbergen <ras@e-gerbil.net>       http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

home help back first fref pref prev next nref lref last post