[164369] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: What to expect after a cooling failure

daemon@ATHENA.MIT.EDU (George Herbert)
Wed Jul 10 05:08:07 2013

In-Reply-To: <1373426894.69598008@apps.rackspace.com>
From: George Herbert <george.herbert@gmail.com>
Date: Wed, 10 Jul 2013 02:07:29 -0700
To: Erik Levinson <erik.levinson@uberflip.com>
Cc: NANOG mailing list <nanog@nanog.org>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org

Numbers from memory and filed off a bit for anonymity, but....

A site I was consulting with had statistically large numbers of x86 servers (=
say, 3000), SPARC enterprise gear (100), NetApp units (60) and NetApp drives=
 (5000+) go through a roughly 42C excursion.  It was much hotter at ceiling l=
evel but fortunately high (20 foot) ceilings.  Within about 1C of the (wet p=
ipes) sprinkler system head fuse temp... (shudder)

Both NetApp and X86 server PSUs had significantly increased failure rates fo=
r the next year.  Say in rough numbers 10% failed in the year.  About 2% wer=
e instant fails.

Hard drives had a significantly higher fail rate for the next year, also in t=
he 10% range.

No change in rate of motherboard or CPU or RAM failures was noted that I rec=
all.


George William Herbert
Sent from my iPhone

On Jul 9, 2013, at 8:28 PM, "Erik Levinson" <erik.levinson@uberflip.com> wro=
te:

> As some may know, yesterday 151 Front St suffered a cooling failure after E=
nwave's facilities were flooded.=20
>=20
> One of the suites that we're in recovered quickly but the other took much l=
onger and some of our gear shutdown automatically due to overheating. We shu=
t down remotely many redundant and non-essential systems in the hotter suite=
, and transferred remotely some others to the cooler suite, to ensure that w=
e had a minimum of all core systems running in the hotter suite. We waited u=
ntil the temperatures returned to normal, and brought everything back online=
. The entire event lasted from approx 18:45 until 01:15. Apparently ambient t=
emperature was above 43 degrees Celcius at one point on the cool side of cab=
inets in the hotter suite.=20
>=20
> For those who have gone through such events in the past, what can one expe=
ct in terms of long-term impact...should we expect some premature component f=
ailures? Does anyone have any stats to share?
>=20
> Thanks
>=20
> --
> Erik Levinson
> CTO, Uberflip
> 416-900-3830
> 1183 King Street West, Suite 100
> Toronto ON  M6K 3C5
> www.uberflip.com
>=20
>=20
>=20


home help back first fref pref prev next nref lref last post