[33310] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: Operate until failure

daemon@ATHENA.MIT.EDU (Eric A. Hall)
Mon Jan 8 18:55:40 2001

Message-ID: <3A5A51FB.5E148452@ehsco.com>
Date: Mon, 08 Jan 2001 15:49:15 -0800
From: "Eric A. Hall" <ehall@ehsco.com>
MIME-Version: 1.0
To: Sean Donelan <sean@donelan.com>
Cc: nanog@merit.edu
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Errors-To: owner-nanog-outgoing@merit.edu



> One issue with highly redudandent data centers is the failure modes
> are "interesting."  You don't want to shutdown due to a single UPS
> failure, so you don't use something simple like PowerChute Plus. You
> most likely don't want to shutdown based on any automatic signal.
> However, you do want a way for an operator to gracefully shutdown a
> lot of equipment quickly when the decision is made.

The old Deltec stuff was good about this. They had it so that a server
daemon would notify different groups at different stages.

	Power lost->notify group A (Printers, PCs)
	Low battery->notify group B (Secondary servers)
	Dead battery->notify group C (Primary servers, comms)

They also had different outlets on different "groups", so if a device
wasn't able to understand the network alert (the routers and firewalls
don't have agents), they could be terminated as a part of a group.

Deltec got bought by somebody and I'm sure a lot of this stuff has changed
since I last looked at it, but it was a good design.

-- 
Eric A. Hall                                        http://www.ehsco.com/
Internet Core Protocols          http://www.oreilly.com/catalog/coreprot/


home help back first fref pref prev next nref lref last post