[290] in linux-net channel archive
csum_partial_copyffs (1.3.0) loses big on Pentium
daemon@ATHENA.MIT.EDU (Tom May)
Tue May 9 03:37:30 1995
Date: Mon, 8 May 1995 23:16:49 -0700
From: ftom@netcom.com (Tom May)
To: linux-net@vger.rutgers.edu
Hi,
It looks like the new function csum_partial_copyffs() in the 1.3.0 net
code is a win on a 486, but it is an extremely bad lose on a Pentium.
Here are some timing tests I ran on my machines. The "old code" has a
single loop which combines copy and checksum operations. It is my
version of the 1.3.0 csum_partial_copyffs() routine, which is faster
than the original 1.3.0 routine on both 486 and Pentium. "New code"
is a function which does memcpy_fromfs() followed by a dedicated
checksum loop. (memcpy_fromfs() is my improved version.)
Here are the results on a 486DX2/66:
Running regression test, just a moment.
Regression passed.
Timing old code, small packets: 1.08 seconds
Timing new code, small packets: 1.31 seconds
Timing old code, mixed packets: 4.86 seconds
Timing new code, mixed packets: 6.24 seconds
Timing old code, large packets: 8.67 seconds
Timing new code, large packets: 11.25 seconds
As you can see, the "old code" combined loop wins on the 486. But on
a 60MHz Pentium it is a different story:
Running regression test, just a moment.
Regression passed.
Timing old code, small packets: 0.64 seconds
Timing new code, small packets: 0.79 seconds
Timing old code, mixed packets: 5.65 seconds
Timing new code, mixed packets: 1.89 seconds
Timing old code, large packets: 10.73 seconds
Timing new code, large packets: 3.04 seconds
Re-separating the move from the checksum operation results in more
than 3x performance improvement for the "new code".
And, if you paid any attention to those results, you may have noticed
that the 486 is running the old code *FASTER* than the Pentium on the
"mixed" and "large" packet tests. If anybody can explain what's going
on, please do.
It looks like perhaps we need separate routines for each processor
with a -DPENTIUM compilation flag or something.
Here is the code I used for these tests:
begin 644 ffs.tar.gz
M'XL(`(;VKB\"`^U:7W/;1@[WJ_@I4*9N*86V2$JR<E9\-VWB:3--G$R<//3J
M'(<B*8D.1>I(RK+OFN]^P.[RKRC+<A*G<\-]L)8@%@M@L3\LEH[=Z=P-DL/9
MWM=KNJ8=]?NP!]`;&AK]8DM_0=<'1SK`8#CH#7I#K3=$DC8TC#W0]AZ@+>/$
MB@#VDG!^*U\4ALG>_U][Y$T"QYV`^=/Y*_/\])=7IV?OS%^E1TCS`K=*SNB_
MG;X].WUI/CMO:=>Z5B4_9^0G.?O[\].WQ(QDHU>F/N?4GY$Y504G/3]]]?/+
MWTU3DG!Y$L\&+_!IR#*(O6G@.F#/<-6F;F(N8S<RQS>)J]AA$"?\10<LQXG:
MTG^E5N1.O3AQH\I0\VHD22W3M.*Y:8(BS\.K,>SO3^+C?5W=U^1C^>3?,BCF
M5?M8GF.GPP2V1R0P648!$_`IMY!4F<1<$<9:44[A>K7YR_9&N^)9&"7YV%48
M.<(P_J:3&5894F/0JF)0=`^#F`(5@QA-$?ILM<@/@VD^EIZ$05Y0:P[CK['&
MOXLU<+'%'C9_Q1Y&4Y@ZFZRY"CT'%LM2N-&"7EF^RE<V-:0:4IHJU):/H=N!
M((1PF:"D&#I=.)8]BC*4TE;+JU+0G*9-0^M:Y;J755&NV^J6`"L;P%:0+R!9
M()82:FU8;;<ANJL-;-XU&W@\D0U;8JIL!%NV<MB0,6P=ZTWQOZ`I;/8U4W@H
MD2G;P\DTIV[@1IYMSMVYO;@QDW`2*^Q5!Y)0!;Y-!&$2A7.ULJN"@H6*;/O.
M17"1R%)+7BSC&6X7-TX)&1J@^?9UA6I?JQ5>X2@W9XUGD0_?&V5BY"Y@!,@?
M^RDI<>-DS!C'6DJ[_`_HDX+T>'414%<_OD@$OU[A-TK\8\YO(/\B7###\/FX
MNGY(DJVQ@^L6X/H]QU^%W-1&9^+S>?Y,OFP3MWTMJ[+CX9_8D_DRURX46PDK
M2#YOI5KQRDOL&:D'^-2RK=@%[1A[`K-&*5%GQ/(F[Y027%M,DVY[LG%4(\DH
M2V);K5/.*)FH['F#K-[GR*H:8SRNL:>M`M%O-ZE?5H-MN4XQIV1*B*<U.9^R
MS?SL]:M7K\]PPV+B6-M(E$PJ6ZE(FA1)(BQS2G5O$/&ZQ,M(-2ATP2*3HKC;
MOT,<"_Y2)$N%6(-]Z*_%F[!;EFL]K)=XQ`Y$&VZ)L"+S:C-SKYYYXQ2X4DMV
M*.1#-F[0"I*28[X4EMX=#B?Q".X#B31N5UA,QV30^!4`$7.@.P^CFUV0\;Z>
MWQ4;.RE*)"&<K)VUUX%E<^1V,JPJB^*GDAITNR6P.T7<VT%81RF`'CP&H[UF
M5#U>;L;'3HY^15G\@+(&EI\!D#SB)W$]VGT5?,L#\XL"76;#^"Y(EW&O[@)U
M.3?Y9_-,-8"7+D5YA]'6XKLI(!<IICE>>G[B!?F67-!F^@>^W+Q/%7*YPE,O
M<K?AF+/7XVF5N[VF'#L8?:9J3,8=%:OE1;6Z'0G1YCR<NR$JMYJ%\#$(5S'\
M<O8>,(AA["9T)Y#,K`!>T(Y<^@XXX7+LN[C!7/LCOG)A$OI^N/*"Z2%)>X%;
MUW7G,>THW,\?51@O$QSMA,&/"9,/W@1>_#A'"HZ!&&=/9JRW'"?^#:RB4(@Z
M.#B`=[^_4T'7NT:_^S>=B'_@/N'\C!%F;N2J\-(+EO$Q3G.YY)L_F")PDG;6
M-6DRQDYP@WEG"A\DPORM=;B`YPU5=RO;Y1B?HF18E0MN#3M:S=7!MHF=G29V
M[CLQ2S4Q-W2M/&23IT6A:5Z%/@[UW6S:%2\0*]4A+P[3VE#4@V[@X'HC6_'"
MBJU`X57I_HQ>_L7N_R:3F,X:A_8WO/_5!X:X_QWJ0TW<_PZ;^]^':!BC\.(-
MQ[QX.2<<B=PX]L*@&R]<0AJ,#L`C1X#40^#A[06VOW1<D./TZX&<7]^^?OD\
MRPK8!ZW5PDET/(2@BW^,X<J-2#R$$]`/>X<:V#BON;"BQ+-\3`:+&XQ)E9(B
MBY(3"-Q5+0_IPG=:/OG;TU_>GIZ?O\#,F>J0DU`5H4ENI<JFX*;6R'OST[/?
M3M^9YR_^>9H)+-"X1)*@'?1TH(-:K+(9M(.!GA$,)`QT@S\6IR'O_XHP#UXL
MO+/N$@7/0<O`3E!9X?\,U.@`%T;>U*QSCR).BG%DJZ+KQ(D*Y%ET+8WU7;2?
M.CB<3DYH-;^`]WI/CDQ3HNV9H;+4$I>P6/&H].-<X^&+U4$#E9.E%JM1I!8%
M#6.-/97_2/IQ*[_$51BMS=Z-<9CEV#[O,Y)UG4]&)&)W/#KLY2+Z.\KHUPEY
MLJ.0)W5"=&-'*7S`FIBC7<4<U8DQM!W%\`%K8G;UKU'K8&-7#QL%%_NN!;V2
M<S&."E1'4!VD.JZ=Q2#N47V<SV1G,QE9#&+P9C%K!0[5\D^RH+[$&FJ2QK:1
MD6N"NO<%@IH9U*^SLK_!2$<8V2L8Z623](^%1;WBKARB0?9\D1G$R&,83)A2
MJS4+:`[4P8<U_PMVIV*"X"RJBWX\8G[T<5K]2.6J#YC/QNLS^E+ZPO%SV4=H
MCN-4;.1F?Z^EST.LM>@"!.03"P]I!&?L29-97P6L-Q5$.^J=(RVRJ4>5)T(B
M9QUC@0FLS`3987^IT`2J.0$/?(]</W;Q;Q2%$9R%6#RC,UU*?HC#DY!*"<3P
M9V_>I]A.V,E/J4P%?F3T>'IDN/\>:T0";R]`W]FAXW(QWISJ#9%H1<:IS7X;
M@9_.GS[.I/-YWI%B3+X38O:QJ.[$N@(KB``?<'Q=5OF*":544K(%4(&)I/4I
ME@%2B[&O)QTB5Q,/T43RH6Y]`LKD5;<JO:C9KAE_?]<!3W8=L)Y$MHXXVG7$
M>FK8.F)GPXU=+*\'>'I3@'.VL@+2B[)*L%Z,E`*T,_XJO#.!'.+S2#)*K^NA
M/IN]J''1EOXMIC@%4WH54TK@G>O<JT;W4*A<!7+VFH.Y<,0JTY%#N="P`.<%
M#05.<V=4L5H('!<%^H3*W(`,F0NN6`-G1LWP>2L\WX[)]P'CUAH2BP/X;?^%
M(7D+DV!PPA"Z<I7`;&``!QJ#-83:<WZ-E-\-36T;5AYZE%T%>7.$WR6B/EW>
MQ-;<I3)$XD4PTWJ?7*:KM/JP;W!8+AS!01&?'6AYM!$Y'Q/QRJ!KDE'F='H3
MA!2]FD!0(+?_$&5Y,:5I@H)N35]FOB66BL%V9"7V;'0'C?3=-%*S!SY%O8IZ
M4<5*AJ75*'VC/Z-R2Y)8DAHO)Q,W^N/L@PH86/@[DD2&&O&,7"@0<1'?9J4A
M+WVMJ>7197DI51^RC#G'-Z!0&)`J)-3C[J'%5#RJ`D?@P5/0-8UZCQ^SZVEB
MO!SQ7RJYL1S%%<<R[U^@B-[3IS"@7,@$77)!ERCH#'^$%)I'F';Y(9/!;I"K
MPTZJX]!HI1S<M2<,17S&:?-YF/]4N%19\.,JM;X[@;*8C6>2+;+XM?TB0I>@
M:G)A#2:6Y[O.X47`+\K=:R\!1:<^&?*)S.4=^ELG8&'%<2:`F+@(+<4!C':V
M=_/+CI67S$"F[2KC8L_GZ-A#<>]1NA"`$W1P%G(O3\_H3ENA/S]`3V^+4UAE
M@+YAP$#G(PC7*@QT<W!0$-S.T.O.`3C0M%((\K#;8:%(%\IKFG#BISI7II>>
M!>]W\ZN5O:;]!=MD\O7G0/@;#@:;[G_!T(VAN/\UAD>]`9*&?5UO[G\?Y/]_
MOX/NV`NZ8RN>29)KST*0WRZ#@#Y#1>54J/)/4%1`LDM?6:+SS<%K1*?G6-6>
M:/B;Y](3'0Y"$)\7(/O,(!UV4]J??W(`T<6\!P'([UCY"R$>HRC/8GJ86WA^
M6ECV1S>)\410F5,OSTDJ%`"7GNMT(&R'7!'C[S_H\"=@FF`ZQ%WZ4G[80>QR
M[3!PXNY"7E>1W4K?147MFZF8>W'N7>/(^WI1?P@O;E%1^V8JYE[TK6CJWMN+
MQD-X<8N*VD.IV"36IC6M:4UK6M.:UK2F-:UI36M:TYK6M*8UK6E-:UK3FM:T
1IC6M:4U[X/8_83+[2`!0``!K
`
end
Sincerely,
Tom.