[291] in linux-net channel archive

home help back first fref pref prev next nref lref last post

Re: csum_partial_copyffs (1.3.0) loses big on Pentium

daemon@ATHENA.MIT.EDU (Matti E. Aarnio [OH1MQK])
Tue May 9 08:37:00 1995

From: "Matti E. Aarnio [OH1MQK]" <mea@mea.cc.utu.fi>
To: ftom@netcom.com (Tom May)
Date: 	Tue, 9 May 1995 13:16:50 +0300 (EET DST)
Cc: linux-net@vger.rutgers.edu
In-Reply-To: <199505090616.XAA12479@netcom7.netcom.com> from "Tom May" at May 8, 95 11:16:49 pm

> Hi,
> 
> It looks like the new function csum_partial_copyffs() in the 1.3.0 net
> code is a win on a 486, but it is an extremely bad lose on a Pentium.
....
> And, if you paid any attention to those results, you may have noticed
> that the 486 is running the old code *FASTER* than the Pentium on the
> "mixed" and "large" packet tests.  If anybody can explain what's going
> on, please do.

	I do venture a guess that it is about pipeline stall.
	That is, Pentium has two integer units, which share
	common register file.  Now if unit 1 is changing register
	X, unit 2 must delay an instruction needing data from that
	register, until data arrives there.

	This is also why the original (it is still the same, I think)
	bogomips-loop produces surprisingly low figures for Pentiums.
	There is a prooven speed enchangement of adding a couple NOPs
	to the loop so that P5 won't go to pipeline stall.
	(but doing so must be matched in the microsec delay code!
	 However Linus has (apparently) decided against of doing such
	 a change on code, which does not affect system performance,
	 but would invalidate benchmarking between old and new versions.)

	... but you Assembler Hackers knew this already, didn't you ?

> Sincerely,
> Tom.

	
	/Matti Aarnio	<mea@utu.fi>

home help back first fref pref prev next nref lref last post