[49308] in Hotline Meeting
large scale AFS outage *TONIGHT*
daemon@ATHENA.MIT.EDU (Jonathon Weiss)
Sat Nov 27 17:58:21 1999
Message-Id: <199911272258.RAA02724@aa1-240-186.detroit.usabestnet.net>
From: Jonathon Weiss <jweiss@MIT.EDU>
To: athena-outage@MIT.EDU
Date: Sat, 27 Nov 1999 17:58:14 EST
Over the last 24 hours Athena Server Operations has become
increasingly aware of a kernel problem that can, under certain
circumstances cause Solaris 2.6 machines to panic. Since an AFS
server panicking can cause (sometimes serious) data loss, since we are
in the middle of a long weekend, since we've seen a number of machines
panic in the last 24 hours, and since there won't be another good
opportunity to install the patch until after finals are over, we are
planning on installing the patch tonight. Unfortunately, installing
the patch requires rebooting the server, which requires an outage.
However, we feel the cost of having a short controlled outage for each
server is smaller than the risk of much longer outages with data loss
that may occur if the machines panic.
As a result, starting at Tonight (Tuesday night) at 11:59PM, we will
be shutting down all of the file-servers in the Athena and dev cells 1
or 2 at a time. Status updates will be sent to -i consult and -c
filsrv.
We apologize for the short notice of this outage, but we feel that
we're better off having the outage tonight, while many people are out
of town for the long weekend.
Jonathon