[489] in Moira

home help back first fref pref prev next nref lref last post

Mailhub ALARM CONDITION

daemon@ATHENA.MIT.EDU (Mark Rosenstein)
Mon Nov 30 11:58:40 1992

Date: Mon, 30 Nov 92 11:58:08 -0500
From: Mark Rosenstein <mar@MIT.EDU>
To: tytso@Athena.MIT.EDU
Cc: root@tsx-11.MIT.EDU, network@MIT.EDU, bug-moira@Athena.MIT.EDU
In-Reply-To: Theodore Ts'o's message of Sat, 28 Nov 92 00:47:09 EST <9211280547.AA11586@tsx-11.MIT.EDU>

Just a few comments on what yall discovered this weekend.

As Richard surmised, while a DCM is updating a server, new config
files for that service will not be generated to avoid changing the
files out from under the running DCM.

As to detecting this problem: we already use keep alives on the TCP
connection.  However, keep-alives are only turned on after the
connection is fully open and protocol version numbers have been
exchanged.  So the problem here was that we never got far enough to
turn on the error detection.  With the current modularity boundaries,
there is no good way to fix this; I will have to think about it some.

We have considered a long session timeout for the DCM before.  The
problem with this is that there is no right value for the timeout.
How long does an update take?  We've had legitimate updates take 3.5
hours.  But if we use a time limit on that order, we have a false
sense of security about detecting problems with updates that should
take less than a minute.

So for now I'll just look at eliminating with window before turning on
the keep alives.
					-Mark

home help back first fref pref prev next nref lref last post