[106144] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: Software router state of the art

daemon@ATHENA.MIT.EDU (Kevin Oberman)
Wed Jul 23 17:23:32 2008

To: "William Herrin" <herrin-nanog@dirtside.com>
In-Reply-To: Your message of "Wed, 23 Jul 2008 16:51:50 EDT."
	<3c3e3fca0807231351i5f2fc6f4g4a670e0f405342c3@mail.gmail.com> 
Date: Wed, 23 Jul 2008 14:23:18 -0700
From: "Kevin Oberman" <oberman@es.net>
Cc: Naveen Nathan <naveen@lastninja.net>, nanog@merit.edu
Errors-To: nanog-bounces@nanog.org

--==_Exmh_1216848198_45610P
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

> Date: Wed, 23 Jul 2008 16:51:50 -0400
> From: "William Herrin" <herrin-nanog@dirtside.com>
> Sender: wherrin@gmail.com
> 
> On Wed, Jul 23, 2008 at 3:59 PM, Kevin Oberman <oberman@es.net> wrote:
> >> The first bottleneck is the interrupts from the NIC. With a generic
> >> Intel NIC under Linux, you start to lose a non-trivial number of
> >> packets around 700mbps of "normal" traffic because it can't service
> >> the interrupts quickly enough.
> >
> > Most modern high performance network cards support MSI (Message Signaled
> > Interrupts) which generate real interrupts only in an intelligent
> > basis. and only at a controlled rate. Windows, Solaris and FreeBSD have
> > support for MSI and I think Linux does, too. It requires both hardware
> > and software support.
> 
> "ethtool -c". Thanks Sargun for putting me on to "I/O Coalescing."
> 
> But cards like the Intel Pro/1000 have 64k of memory for buffering
> packets, both in and out. Few have very much more than 64k. 64k means
> 32k to tx and 32k to rx. Means you darn well better generate an
> interrupt when you get near 16k so that you don't fill the buffer
> before the 16k you generated the interrupt for has been cleared. Means
> you're generating an interrupt at least for every 10 or so 1500 byte
> packets.

You have just hit on a huge problems with most (all?) 1G and 10G
hardware. The buffers are way too small for optimal performance in any
case where the RTT is anything more that half a millisecond, you exhaust
the window and stall the stream.

I need port move multi-gigabit streams across the country and between the
US and Europe. Those are a bit too far apart for those tiny buffers to
be of any use at all. This would require 3 GB of buffers. This same
problem also make TCP off-load of no use at all.
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: oberman@es.net			Phone: +1 510 486-8634
Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751

--==_Exmh_1216848198_45610P
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (FreeBSD)
Comment: Exmh version 2.5 06/03/2002

iD8DBQFIh6FGkn3rs5h7N1ERAsynAJ9Qhz8UnDmN0CGm6SRCF3KSXwaMlACbB5u0
/vhDS9FQMst3cBKPbbAAanI=
=fI6T
-----END PGP SIGNATURE-----

--==_Exmh_1216848198_45610P--


home help back first fref pref prev next nref lref last post