[9368] in cryptography@c2.net mail archive
Rijndael in Assembler for x86?
daemon@ATHENA.MIT.EDU (Mikael Johansson)
Mon Sep 17 18:04:47 2001
Message-ID: <003801c13fc1$dac276a0$0200a8c0@ROL>
From: "Mikael Johansson" <mikael.johansson@wineasy.se>
To: <cryptography@wasabisystems.com>
Date: Mon, 17 Sep 2001 23:43:55 +0200
MIME-Version: 1.0
Content-Type: text/plain;
charset="Windows-1252"
Content-Transfer-Encoding: 7bit
Peter Trei wrote:
> > From:
> > iang@abraham.cs.berkeley.edu[SMTP:iang@abraham.cs.berkeley.edu]
> >
> > In article <87d74urezs.fsf@snark.piermont.com>,
> > Perry E. Metzger <perry@piermont.com> wrote:
> > >
> > >Helger Lipmaa <helger@tcs.hut.fi> writes:
> >
> > >> Why just not to use a C code?
> > >
> > >Because it is typically slower by many times than hand tuned assembler.
> I'll chime in with Perry here - The newer processors are insanely complex
> beasties, with multiple execution units allowing some internal
parallelism,
> subject to register contention and under very complex rules. Anyone who
> thinks they can do better optimizing within a small window is naive, or
> much, much better than the average run of programmer.
Though, when not targetting desktop-size processors; or targeting not quite
so standard processors, there is quite a lot to be won on hand-optimizing...
I've spent three periods at Ericsson Mobile Communications working extra
side to side with my studies, optimizing RSA under various circumstances.
The first time I went in, I worked on their implementation of WTLS --
specifically the hand-shaking part of the protocol -- and got a code snippet
with the kernel add-and-multiply loop that executed in 25 minutes (sic!) for
a standard handshake situation. Hand-optimizing the C-code yielded --
together with a minor algorithm change -- an execution-time on less than 1.5
seconds.
This is an extreme example, but still...
> Back when I was doing proof-of-principle for the DES crack, I spent a
*lot*
> of time optimizing DES code for the Pentium. While handoptimizing for
> that processor more than doubled the speed, the really big gains all
> came from a higher level understanding of the problem; in particular my
> insight on speeding up key schedule generation about 80x, and the
> perversion of the Pentium II MMX registers to run 'bitslice' (no, I didn't
> do
> that) algorithms, testing 64 keys in parallel.
>
> The optimizing compilers have generally exceeded human ability in
> low-level optimizing - not that that won't stop me from trying, now and
> then.
>
> BTW, the code used for the DES crackers bears about as much
> resemblence to regular DES code as a top-fuel dragster does to
> a Toyota Corolla - its tweaked to a fare-thee-well for one function,
> and totally useless for all others.
>
> Peter Trei
// Mikael Johansson
---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo@wasabisystems.com