[87015] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: trollage (Re: Akamai server reliability)

daemon@ATHENA.MIT.EDU (Joel Jaeggli)
Mon Nov 28 15:52:03 2005

Date: Mon, 28 Nov 2005 12:51:30 -0800 (PST)
From: Joel Jaeggli <joelja@darkwing.uoregon.edu>
To: Chris Owen <owenc@hubris.net>
Cc: nanog@merit.edu
In-Reply-To: <Pine.LNX.4.58.0511281336390.14488@corp.hubris.net>
Errors-To: owner-nanog@merit.edu


On Mon, 28 Nov 2005, Chris Owen wrote:

> As far as I can tell the only thing that will get a box replaced is if it
> can't be booted/pinged.  We've pointed out dead CPU fans before (even on
> the incoming replacement boxes) and they've never seemed to care.  If it
> runs it runs.  If it doesn't they replace the entire box.

Having built a fair number of machines to live for 5 years or longer in 
data-centers I will never visit, there's relatively little that you want 
to triage onsite on a rackmount pc. Drives, in hot-plug enclosures and 
removable power supply modules are about it... Smart-hands are good for 
racking and stacking, swapping disks, recabling the oob, swapping media 
and so forth.  It's not really a good use of someone else's time to have 
them performing experimental surgery on pc's. Much better to simply ship 
out another one and ship the old one back in the same box.

Decent modern 1u chassis still have sufficient airflow with a couple fans 
failed to remain adequately cool, further there's now enough sensors in a 
pc to be able to tell when you getting in trouble, rpm indicator for all 
the fans, intake processor and output temperature, thermal sensors in 
each of the drives etc. Our success-rate at indetifying machines before 
they fail has gotten substantially better over time.

> Given all their redundancy I suppose that is probably the way to go.

> Chris
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Chris Owen                ~ Garden City (620) 275-1900 ~  Lottery (noun):
> President                 ~ Wichita     (316) 858-3000 ~    A stupidity tax
> Hubris Communications Inc ~       www.hubris.net       ~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>

-- 
--------------------------------------------------------------------------
Joel Jaeggli  	       Unix Consulting 	       joelja@darkwing.uoregon.edu
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


home help back first fref pref prev next nref lref last post