[806] in Moira

home help back first fref pref prev next nref lref last post

Hesiod DCM

daemon@ATHENA.MIT.EDU (David Krikorian)
Wed Oct 19 03:52:29 1994

Date: Wed, 19 Oct 94 03:52:25 -0400
From: David Krikorian <dkk@MIT.EDU>
To: moiradev@MIT.EDU
Cc: op@MIT.EDU

The Hesiod DCM "failed" tonight, but I manually completed it.

The update succeeded on Apollo, the first server to be tried, but the
control script thought it failed.  I have no idea why.  I can't see
any error opportunities here.  (See /mit/moiradev/src/gen/hesiod.sh.)

The error was MR_NAMED, which should only happen if hesiod.sh script
had waited an *hour*, without seeing a new named start.  Here is the
timing of the update and the error:

From moira:/moira/dcm.log

Oct 19 00:19:50 <2189> dcm: starting update for APOLLO.MIT.EDU:hesiod
Oct 19 00:24:22 <2189> dcm: name daemon failed to start installation of APOLLO.MIT.EDU:hesiod failed, code = 47836475
Oct 19 00:24:23 <2189> dcm: DCM updating APOLLO.MIT.EDU:hesiod: name daemon failed to start

The time the named actually succeeded:

apollo# ls -l /etc/named.pid
-rw-r--r--  1 root            6 Oct 19 00:26 /etc/named.pid

So, the hesiod.sh script waited for no more than 4-1/2 minutes (rather
than 60), while the named took about 6 minutes to get started
(including untarring the .db files).  It then exited with status
$MR_NAMED, and left itself and the tarfile in /tmp/.

Anyone over there in Dev have any ideas?
The script is very straightforward...

home help back first fref pref prev next nref lref last post