[806] in Moira
Hesiod DCM
daemon@ATHENA.MIT.EDU (David Krikorian)
Wed Oct 19 03:52:29 1994
Date: Wed, 19 Oct 94 03:52:25 -0400
From: David Krikorian <dkk@MIT.EDU>
To: moiradev@MIT.EDU
Cc: op@MIT.EDU
The Hesiod DCM "failed" tonight, but I manually completed it.
The update succeeded on Apollo, the first server to be tried, but the
control script thought it failed. I have no idea why. I can't see
any error opportunities here. (See /mit/moiradev/src/gen/hesiod.sh.)
The error was MR_NAMED, which should only happen if hesiod.sh script
had waited an *hour*, without seeing a new named start. Here is the
timing of the update and the error:
From moira:/moira/dcm.log
Oct 19 00:19:50 <2189> dcm: starting update for APOLLO.MIT.EDU:hesiod
Oct 19 00:24:22 <2189> dcm: name daemon failed to start installation of APOLLO.MIT.EDU:hesiod failed, code = 47836475
Oct 19 00:24:23 <2189> dcm: DCM updating APOLLO.MIT.EDU:hesiod: name daemon failed to start
The time the named actually succeeded:
apollo# ls -l /etc/named.pid
-rw-r--r-- 1 root 6 Oct 19 00:26 /etc/named.pid
So, the hesiod.sh script waited for no more than 4-1/2 minutes (rather
than 60), while the named took about 6 minutes to get started
(including untarring the .db files). It then exited with status
$MR_NAMED, and left itself and the tarfile in /tmp/.
Anyone over there in Dev have any ideas?
The script is very straightforward...