[2896] in Kerberos-V5-bugs
krb5-libs/564: Subtle bug in replay cache code
daemon@ATHENA.MIT.EDU (Ken Hornstein)
Wed Mar 18 20:01:08 1998
Resent-From: gnats@rt-11.MIT.EDU (GNATS Management)
Resent-To: krb5-unassigned@RT-11.MIT.EDU
Resent-Reply-To: krb5-bugs@MIT.EDU, kenh@cmf.nrl.navy.mil
Date: Wed, 18 Mar 1998 19:59:55 -0500 (EST)
From: Ken Hornstein <kenh@cmf.nrl.navy.mil>
Reply-To: kenh@cmf.nrl.navy.mil
To: krb5-bugs@MIT.EDU
>Number: 564
>Category: krb5-libs
>Synopsis: replay cache code can call expunge twice, which hoses things up
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: krb5-unassigned
>State: open
>Class: sw-bug
>Submitter-Id: unknown
>Arrival-Date: Wed Mar 18 20:01:01 EST 1998
>Last-Modified:
>Originator: Ken Hornstein
>Organization:
Navel Research Laboratory
>Release: 1.0pl5
>Environment:
System: SunOS elvis 4.1.4 6 sun4c
Architecture: sun4
>Description:
>How-To-Repeat:
Here's sample code that illustrates this problem. Run this program with
a single argument (such as "test"), and watch the temporary files get
created.
#include <stdio.h>
#include <krb5.h>
#include <com_err.h>
int
main(int argc, char *argv[])
{
krb5_context context;
krb5_error_code code;
krb5_rcache rc;
krb5_deltat d = 5*60;
char buf[500];
krb5_donot_replay rep;
rep.server = "host/foo.bar@BAR";
rep.client = "foo@BAR";
rep.cusec = 0;
rep.ctime = 99999;
krb5_init_context(&context);
rc = (krb5_rcache) malloc(sizeof(*rc));
krb5_rc_resolve_type(context, &rc, krb5_rc_default_type(context));
if ((code = krb5_rc_resolve(context, rc, argv[1])) != 0) {
com_err("rc_resolve", code, "");
exit(1);
}
if ((code = krb5_rc_initialize(context, rc, d)) != 0) {
com_err("rc_initialize", code, "");
exit(1);
}
while ((code = krb5_rc_store(context, rc, &rep)) == 0)
rep.cusec++;
com_err("rc_store", code, "");
exit(0);
}
>Fix:
This seems to solve the problem. Basically, an extra structure field is
introduced (similar to "recovering"), which prevents the second call to
krb5_rc_dfl_expunge().
cvs diff: Diffing .
Index: rc_dfl.c
===================================================================
RCS file: /afs/cmf/project/cvsroot/krb5/lib/krb5/rcache/rc_dfl.c,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 rc_dfl.c
--- rc_dfl.c 1997/06/02 21:57:35 1.1.1.1
+++ rc_dfl.c 1998/03/19 00:22:30
@@ -115,6 +115,7 @@
krb5_rc_iostuff d;
#endif
char recovering;
+ char expunging;
}
;
@@ -287,6 +288,7 @@
t->d.fd = -1;
#endif
t->recovering = 0;
+ t->expunging = 0;
return 0;
cleanup:
@@ -459,7 +461,7 @@
krb5_rc_free_entry(context, &rep);
if (retval)
krb5_rc_io_close(context, &t->d);
- else if (expired_entries > EXCESSREPS)
+ else if (! t->expunging && expired_entries > EXCESSREPS)
retval = krb5_rc_dfl_expunge(context, id);
t->recovering = 0;
return retval;
@@ -578,10 +580,12 @@
free(name);
if (retval)
return retval;
+ ((struct dfl_data *)id->data)->expunging = 1;
retval = krb5_rc_dfl_recover(context, id);
if (retval)
return retval;
t = (struct dfl_data *)id->data; /* point to recovered cache */
+ t->expunging = 0;
}
tmp = (krb5_rcache) malloc(sizeof(*tmp));
>Audit-Trail:
>Unformatted:
>Deocription:
There is a problem with the replay cache code that only happens to show
up on long-running daemons (such as the KDC).
If you check the directory that holds the replay cache for your KDC, you
will probably notice a WHOLE bunch of files named something like:
krb5_RC84act
krb5_RC84acu
krb5_RC84acv
krb5_RC84acw
krb5_RC84acx
krb5_RC84acy
krb5_RC84acz
krb5_RC84ada
And over time, these will eventually fill up the partition holding your
replay cache, and then bad things start to happen.
The bug here is fairly subtle, and was introduced by one of the PR's
that I submitted a while ago (but I can't take all of the blame; my
PR was based on a patch that jik sent to me :-) ). After a while
replay caches would grow really large, and daemons that only did
authentication once (such as telnetd, rlogind, etc etc) would literally
spend minutes trying to recover the cache. However, since recover
discards expired entries, the there never was a case where krb5_rc_store()
would cause the replay cache to be expunged.
The original patch counted the number of expired entries, and if they
grew past some threshold would cause the cache to be recovered. That's
a good thing. _However_, this causes a problem for expunges that
are triggered by krb5_rc_store(). In this case, the first thing that
the krb5_rc_dfl_expunge() does is close the cache and recover it ...
and that recover calls krb5_rc_dfl_expunge() _itself_, and this confuses
things to the point where the temporary file used by the first call
to expunge is left around.