[72909] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: Quick question.

daemon@ATHENA.MIT.EDU (Colm MacCarthaigh)
Sun Aug 1 13:06:34 2004

Date: Sun, 1 Aug 2004 18:05:51 +0100
From: Colm MacCarthaigh <colm@stdlib.net>
To: Michel Py <michel@arneill-py.sacramento.ca.us>
Cc: Nanog <nanog@nanog.org>
Reply-To: colm@stdlib.net
In-Reply-To: <DD7FE473A8C3C245ADA2A2FE1709D90B0DB322@server2003.arneill-py.sacramento.ca.us>
Errors-To: owner-nanog-outgoing@merit.edu


On Sun, Aug 01, 2004 at 09:44:13AM -0700, Michel Py wrote:
> In other words, I don't really care if the second processor reduces the
> MTBF from 200k hours to 60k hours, but I do care if the second processor
> reduces the time to restore service from 24 hours to 20 minutes (7.5
> minutes for SNMP to fail the query twice, 1.5 minute for the tech to
> find out that either it's frozen or there's a BSOD, 6 minutes to have
> someone go there and reset, 5 minutes to reboot).

With the right form factor (nice easy-to-open rackmount unit) it will take 
just as little time to swap in an on-site cold-spare. That way you get the 
nice MTBF and the short restore time. Also, if you have multiple similar 
machines, you drastically reduce your spares inventory.

> Unsignificant in my experience, and does not balance what Alexei
> mentioned yesterday: a duallie will keep the system up when a faulty
> process hogs 100% CPU, because the second one is still available. That
> also increases availability ratio.

These days you can achieve the same using hyper-threading for example,
and keep the long MTBF :)

-- 
Colm MacCárthaigh                        Public Key: colm+pgp@stdlib.net

home help back first fref pref prev next nref lref last post