[31616] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: availability and resiliency

daemon@ATHENA.MIT.EDU (Jay Tribick)
Sat Sep 30 15:36:02 2000

Date: Sat, 30 Sep 2000 20:34:01 +0100
From: Jay Tribick <jay.tribick@carrier1.net>
To: Adrian Chadd <adrian@creative.net.au>
Cc: Valdis.Kletnieks@vt.edu, nanog@merit.edu
Message-ID: <20000930203401.F26046@mail.noc.carrier1.net>
Mail-Followup-To: Adrian Chadd <adrian@creative.net.au>,
	Valdis.Kletnieks@vt.edu, nanog@merit.edu
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <20001001031350.H72673@ewok.creative.net.au>; from adrian@creative.net.au on Sun, Oct 01, 2000 at 03:13:50AM +0800
Errors-To: owner-nanog-outgoing@merit.edu


> > and retry the failing machine instruction on a hot-spare.  That's after a
> > reset-and-retry on the failing processor has proven it's a hard failure and
> > not a soft one.
> > 
> > The mind boggles.... ;)
> 
> .. and the concept of this happening on Wintel hardware running anything
> is sheer ludicrousy. Whoever mentioned that SMP can help you get high uptime
> boxes is smoking heavy crack in most cases.
> 
> Note that the big-end Alpha and Sun gear is NUMA, not SMP. Different kettle
> of fish there, and if you need an explanation as to why its more likely to
> happen with NUMA and not SMP, there are lots of hardware books out there. :-)

If you're looking at implementing "5 9's" check out Suns FT1800 - very nice 
box (read: looks nice ;), easy to admin, and so far has been rock solid for 
us (not that Solaris crashes much anymore anyway.. but at least you no longer
have to worry about hardware resilience with the FT)

All you have to worry about then is disparate power, and software stability.

--
Regards,

Jay Tribick 
Senior Systems Engineer
Carrier1 
Voice: 	+44 207 531 3874


home help back first fref pref prev next nref lref last post