[1360] in Kerberos_V5_Development

home help back first fref pref prev next nref lref last post

Re: kdc performance and rcache

daemon@ATHENA.MIT.EDU (Ken Raeburn)
Fri Jun 28 20:26:49 1996

To: "Donald T. Davis" <don@cam.ov.com>
Cc: "Barry Jaspan" <bjaspan@MIT.EDU>, krbdev@MIT.EDU
From: Ken Raeburn <raeburn@cygnus.com>
Date: 28 Jun 1996 20:25:49 -0400
In-Reply-To: "Donald T. Davis"'s message of Fri, 28 Jun 1996 11:23:46 -0400


I've been keeping public benchmarking in mind.  I can do exclusively
as-reqs, or almost exclusively tgs-reqs (ratio about 200:1, I'll make
it adjustable).  But I've hard-coded the list of principals from our
database, as well as my own name; that'll need to be fixed.  If
anyone's got good data on real usage patterns at large, busy sites,
I'd be interested.  I'm probably thrashing the database more than a
typical large site would for a comparable number of packets per
second.  (E.g., many tgs-reqs would probably be for the same
services.)

It might be nice to publicize some numbers, but I think they should
look good first.  We've about doubled our numbers, but they're still
really poor.

Like I said before, we need a faster DES, more efficient database lock
handling, and less copying of objects.  For some opaque objects,
coalescing small malloc invocations might help too.


Another item we found high on the list was asn1_encode_generaltime.
Most sparc systems can't do multiplication, division or remainder
on-chip, so library routines are called.  This means sprintf("%02d")
is expensive.  We've eliminated the sprintf call and avoided directly
doing division in the common cases (meaning, with dates before the
year 2100), but the call to gmtime is expensive too.  A small cache
helps a lot in my tests, though maybe it would help less so in a
realistic heavy load.  (And with a small fixed size, and a tiny key,
this cache should be costing very little even when it misses.)

However, there's no asn1 context in which to put a cache...  How would
people feel about my creating one, pointed to by the asn1buf, and
passed into asn1buf_create?  Then the KDC can have optimal time
conversion, and the library code is still thread-safe.  (Well, as much
as it is now.)

On a related note, I noticed about 20-25% of the time conversions had
the value 0 (Jan 1 1970), coming from a "last-req" field.  Does this
sound reasonable??


Also, how do people feel about gcc-specific optimizations?

Doing graph profiling grossly exaggerates the call overhead (as
measured by how much time mcount takes), but I like to use it as a
measure of when too many calls are being made.  It also indicates one
place where performance may degrade when moving to a machine type with
a less efficient call sequence.

That call overhead was reduced by about 20% when I converted three
tiny low-level asn1 functions (e.g., asn1_remove_octet) to macros.
Together, I think these three eliminated somewhere around 1000
function calls per packet processed, maybe less, and didn't change the
program size noticably.  Problem is, one of these three required a
local variable, so I used a gcc extension, conditionalized the macro
on having gcc version 2, and kept the function around for other
compilers to call.

In theory I'd be opposed to lots of compiler-specific hacks, but gcc
is ubiquitous enough that I don't worry about a few gcc-specific
hacks, as long as the code still works with other compilers.

home help back first fref pref prev next nref lref last post