[121105] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: D/DoS mitigation hardware/software needed.

daemon@ATHENA.MIT.EDU (Valdis.Kletnieks@vt.edu)
Sun Jan 10 12:27:05 2010

To: Roger Marquis <marquis@roble.com>
In-Reply-To: Your message of "Sun, 10 Jan 2010 08:19:27 PST."
	<20100110161927.F281F2B2164@mx5.roble.com>
From: Valdis.Kletnieks@vt.edu
Date: Sun, 10 Jan 2010 12:25:59 -0500
Cc: nanog@nanog.org, Joe Greco <jgreco@ns.sol.net>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org

--==_Exmh_1263144359_4979P
Content-Type: text/plain; charset=us-ascii

On Sun, 10 Jan 2010 08:19:27 PST, Roger Marquis said:
> > Then you need to get rid of that '90's antique web server and get
> > something modern.  When you say "interrupt-bound hardware," all you
> > are doing is showing that you're not familiar with modern servers
> > and quality operating systems that are designed to mitigate things
> > like DDoS attacks.
> 
> "Modern" servers?   IP is processed in the kernel on web servers,
> regardless of OS.  Have you configured a kernel lately?  Noticed there
> are ~3,000 lines in the Linux config file alone?  _Lots_ of device
> drivers in there, which are interrupt driven and have to be timeshared.

Yes, but all the fast network adapters are able to do a lot of stuff like
interrupt coalescing so you don't need to take an interrupt on every packet.

And "have you configured a kernel lately" is another red herring - yes, there
are indeed be 4,533 lines in the current Fedora .config. But that's because
that config turns on everything under the sun.  I just checked, and my current
kernel config has only 960 '=y' lines, and another 220 '=m' lines - and a large
portion of those could easily be turned off.  I have a minimal config file that
comes in under 730 non-comment lines.

> No servers I know do realtime processing (RT kernels don't) or process IP
> in ASICs.

That's because in general, processing the IP in an ASIC simply Does Not Work
as well as you might hope.  Alan Cox did a nice discussion of some of the
issues here:

http://lkml.indiana.edu/hypermail/linux/kernel/0307.1/2116.html

One should read his last paragraph carefully, and note that what he
wrote back in 2003 is still true today:

http://www.internet2.edu/lsr/history.html

> What configurations of Linux / BSD / Solaris / etc does web / email / ntp
> / sip / iptables / ipfw / ... and doesn't have issues with kernel
> locking?

So let me get this straight - you perceive a problem with locking inside
the kernel, where if you're lucky the lock is in an already-hot cache line
and your biggest worry is cache line ping-ponging, and if you're unlucky
you actually have to go out the southbridge and hit main memory, at main
memory access speeds.

And to fix this, you're going to move one of the things contending for the
lock off the CPU, so now every time the lock is contended, it has to go out
through the PCI bridge to an external card?

How the heck is that supposed to help?  You're suggesting the same "go talk
to another card" solution that the router vendors learned is the *last* thing
you want to do - calling out to the supervisor card rather than handling it
onboard the line card is guaranteed performance death.

>  Test it on your own servers by mounting a damaged DVD on the
> root directory, and dd'ing it to /dev/null.  Notice how the ATA/SATA/SCSI
> driver impacts the latency of everything on the system.  How would you
> replicate that on a firmware and ASIC drive appliance?

There's two little things you went astray on here:

1) I've in fact had to do this while doing data recovery.  It doesn't do
squat to the latency of anything that doesn't have to go through the same
controller as the DVD.  Everything else works just fine. Heck, it isn't even
enough to cause audio playback skips (and those are noticeable even at
the millisecond level).

2) Your latency hit is because the controller is *busy* while trying to
re-read and error-correct a bad block.

So yeah - trying to do I/O through a controller that's taking a several-second
time-out dealing with bad media will cause a latency hit *for that I/O*. What's
your point?

--==_Exmh_1263144359_4979P
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Exmh version 2.5 07/13/2001

iD8DBQFLSg2ncC3lWbTT17ARApkiAKDcYtkklZgvppZJiDVVF2F2vECI9QCfTbUp
0jbhTlLe1+0DFtkXlhxrIJo=
=9DvI
-----END PGP SIGNATURE-----

--==_Exmh_1263144359_4979P--



home help back first fref pref prev next nref lref last post