[119018] in North American Network Operators' Group
RE: HE.net, Fremont-2 outage?
daemon@ATHENA.MIT.EDU (Alex Rubenstein)
Wed Nov 4 14:06:55 2009
From: Alex Rubenstein <alex@corp.nac.net>
To: "nanog@nanog.org" <nanog@nanog.org>
Date: Wed, 4 Nov 2009 14:06:03 -0500
In-Reply-To: <200911041854.nA4Isnbv021842@aurora.sol.net>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org
> Yup. Related: "100% availability" is a marketing person's dream; it
> sounds good in theory but is unattainable in practice, and is a
> reliable sign of non-100%-reliability.
You are confusing two different things.
Availability !=3D Reliability.
For instance, an airplane is designed to be 100% reliable, but much less av=
ailable. To keep a 747 from not crashing (100% reliability) it needs signif=
icant downtime (not 100% available).
> And even for those who follow best practices... You can inspect and
> maintain things until you're blue in the face. One day a contractor
> will drop a wrench into a PDU or UPS or whatever and spectacular things
> will happen. =20
That's were policies, procedures and methods come in (read: SAS70)
> Or a battery develops a strange fault.
Get more than one string, one more than one UPS, with monitoring. Batteries=
are NOT the Achilles heel everyone wants to make you believe they are.
"Question everything, assume nothing, discuss all, and resolve quickly."
-- Alex Rubenstein, AR97, K2AHR, alex@nac.net, latency, Al Reuben --
-- Net Access Corporation, 800-NET-ME-36, http://www.nac.net --