[31605] in North American Network Operators' Group
RE: availability and resiliency
daemon@ATHENA.MIT.EDU (Roeland M.J. Meyer)
Fri Sep 29 19:45:37 2000
Message-ID: <1148622BC878D411971F0060082B042C3755@hawk.lvrmr.mhsc.com>
From: "Roeland M.J. Meyer" <rmeyer@MHSC.com>
To: Andrew Brown <atatat@atatdot.net>,
"Roeland M.J. Meyer" <rmeyer@MHSC.com>
Cc: nanog@merit.edu
Date: Fri, 29 Sep 2000 15:59:01 -0700
MIME-Version: 1.0
Content-Type: text/plain
Errors-To: owner-nanog-outgoing@merit.edu
> >Hosts meeting three nines, or better, typically have redundant power
> >supplies and integrated UPS, bootable RAID for the OS,
> redundant NICs,
> >and SMP CPU configurations.
>
> um...is an smp cpu configuration really going to help your uptime? or
> are there operating systems or hardware out there that can say to
> themselves "hmph! cpu 2 seems not to be working correctly...i'd
> better spin it down."
That is a natural function these days. Fail-safe to the "off" state.
Most SMP OS's can recognise when one, out of an SMP CPU set, goes down.
> just for fun a few years back i decided to check if the sun e4000 we
> had had hot-swappable cpus (i figured it didn't, but why not try it?)
> and i pulled one of the boards. it didn't like it too much.
None of what I claimed is required to be hot-swappable to get 99.9%.
Enough online hot-spares will keep the system up long enough so that you
can replace the entire box. 99.9% allows over 8 hours per year outage.
You should be able to swap out a host in less than 0.5 hours.