[33310] in North American Network Operators' Group
Re: Operate until failure
daemon@ATHENA.MIT.EDU (Eric A. Hall)
Mon Jan 8 18:55:40 2001
Message-ID: <3A5A51FB.5E148452@ehsco.com>
Date: Mon, 08 Jan 2001 15:49:15 -0800
From: "Eric A. Hall" <ehall@ehsco.com>
MIME-Version: 1.0
To: Sean Donelan <sean@donelan.com>
Cc: nanog@merit.edu
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Errors-To: owner-nanog-outgoing@merit.edu
> One issue with highly redudandent data centers is the failure modes
> are "interesting." You don't want to shutdown due to a single UPS
> failure, so you don't use something simple like PowerChute Plus. You
> most likely don't want to shutdown based on any automatic signal.
> However, you do want a way for an operator to gracefully shutdown a
> lot of equipment quickly when the decision is made.
The old Deltec stuff was good about this. They had it so that a server
daemon would notify different groups at different stages.
Power lost->notify group A (Printers, PCs)
Low battery->notify group B (Secondary servers)
Dead battery->notify group C (Primary servers, comms)
They also had different outlets on different "groups", so if a device
wasn't able to understand the network alert (the routers and firewalls
don't have agents), they could be terminated as a part of a group.
Deltec got bought by somebody and I'm sure a lot of this stuff has changed
since I last looked at it, but it was a good design.
--
Eric A. Hall http://www.ehsco.com/
Internet Core Protocols http://www.oreilly.com/catalog/coreprot/