[15185] in Athena Bugs
Sun4 8.0J: lertstop has serious problems problems
daemon@ATHENA.MIT.EDU (Matt)
Mon Jun 9 01:42:03 1997
Date: Mon, 9 Jun 1997 01:42:02 -0400
From: Matt <matt@MIT.EDU>
To: mbarker@MIT.EDU
Cc: bugs@MIT.EDU, ops@MIT.EDU
System name: minos
Type and version: SPARC/5 8.0J
Display type: cgthree
What were you trying to do?
remove the 'stale' entries from the lert db, since many of the users
lerted in 1996 have been deactivated or purged by now
What's wrong:
minos# lertstop a
(and it spins...I control-c'd it after 76 minutes of CPU time)
...and it trashes the db
Before: -rw-rw-r-- 1 root other 157668 Jun 8 23:58 /tmp/lertdb
After: -rw-rw-r-- 1 root other 15360000 Jun 9 01:30 /tmp/lertdb2
What should have happened:
It should have removed category a from the DB and given me back my
prompt
What other relevant things do you want to tell us:
Glad you asked...for starter, the lert db is *big*
minos# wc /tmp/lertdb
5552 22208 157668 /tmp/lertdb
(/tmp/lertdb is the output of lertdump)
so it is in fact possible that it is not spinning, but working
very hard to go through the whole db
What else is wrong:
Even if it had finsished in real time, it would have removed
some users it should not have. It seems that if a user is
in multiple categories and one of those categories gets lertstop'd
then the user will dissapear entirely (from the lertdb) instead of
just being removed from the category that was lertstop'd. It also
seems that it introduces corruption into the the db under
certain similiar circumstances (this is very bad!).
For example:
(mhbraun@forever) /var/ops/lert/% ./lertload a < /tmp/users1
(mhbraun@forever) /var/ops/lert/% ./lertload b < /tmp/users2
(mhbraun@forever) /var/ops/lert/% ./lertload c < /tmp/users3
(mhbraun@forever) /var/ops/lert/% ./lertdump
name: mwhitson categories: ac (this output is in fact
name: nathanw categories: bc correct for what I loaded)
name: mhbraun categories: a
name: kretch categories: ac
name: jweiss categories: a
name: joanna categories: bc
name: ted categories: b
name: dkk categories: b
name: cat categories: bc
(mhbraun@forever) /var/ops/lert/% ./lertstop a
(mhbraun@forever) /var/ops/lert/% ./lertdump
name: mwhbccat categories: c <------ ??? and notice kretch is
name: nathanw categories: bc no longer in the db when
name: joanna categories: bc he should still be in c
name: ted categories: b
name: dkk categories: b
name: cat categories: bc
(mhbraun@forever) /var/ops/lert/%
Where is the source for the binaries that produced this:
/mit/ops/src/lert/src
What does this mean in the great scheme of things:
we are up to category 's', that means we have 7 categories left
(8 if you count 'j' which we skipped for some reason). We seem to use
from 1 to 4 a month depending on the number of failures (and if
different users from the same failure need to get different
notices, and accounts will likely be lerting people for deactivation
in the near future. We really need this fixed by the fall.
Please describe any relevant documentation references:
/mit/ops/src/lert/doc/lert.dvi
The documentation does not give any indication what will
happen if I control-c the lertstop...ie it would be nice to know
if that will corrupt the db.