[291] in linux-net channel archive
Re: csum_partial_copyffs (1.3.0) loses big on Pentium
daemon@ATHENA.MIT.EDU (Matti E. Aarnio [OH1MQK])
Tue May 9 08:37:00 1995
From: "Matti E. Aarnio [OH1MQK]" <mea@mea.cc.utu.fi>
To: ftom@netcom.com (Tom May)
Date: Tue, 9 May 1995 13:16:50 +0300 (EET DST)
Cc: linux-net@vger.rutgers.edu
In-Reply-To: <199505090616.XAA12479@netcom7.netcom.com> from "Tom May" at May 8, 95 11:16:49 pm
> Hi,
>
> It looks like the new function csum_partial_copyffs() in the 1.3.0 net
> code is a win on a 486, but it is an extremely bad lose on a Pentium.
....
> And, if you paid any attention to those results, you may have noticed
> that the 486 is running the old code *FASTER* than the Pentium on the
> "mixed" and "large" packet tests. If anybody can explain what's going
> on, please do.
I do venture a guess that it is about pipeline stall.
That is, Pentium has two integer units, which share
common register file. Now if unit 1 is changing register
X, unit 2 must delay an instruction needing data from that
register, until data arrives there.
This is also why the original (it is still the same, I think)
bogomips-loop produces surprisingly low figures for Pentiums.
There is a prooven speed enchangement of adding a couple NOPs
to the loop so that P5 won't go to pipeline stall.
(but doing so must be matched in the microsec delay code!
However Linus has (apparently) decided against of doing such
a change on code, which does not affect system performance,
but would invalidate benchmarking between old and new versions.)
... but you Assembler Hackers knew this already, didn't you ?
> Sincerely,
> Tom.
/Matti Aarnio <mea@utu.fi>