[164358] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: What to expect after a cooling failure

daemon@ATHENA.MIT.EDU (Erik Levinson)
Tue Jul 9 23:51:21 2013

Date: Tue, 9 Jul 2013 23:50:58 -0400 (EDT)
From: "Erik Levinson" <erik.levinson@uberflip.com>
To: "Bryan Tong" <contact@nullivex.com>
In-Reply-To: <CAAARkvLbYRDYQ=wOOh0iieBo8x=rXs1LdoVjY=yaD8h1vR8n2A@mail.gmail.com>
Cc: NANOG mailing list <nanog@nanog.org>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org

Thanks. I should also mention that most of the gear was still on but we had=
 turned off many VMs on physical servers within the first 2.5 hours, so the=
 CPU and hard drive / io load was around zero on such servers. Most of the =
servers in the hotter suite had fans running at over 75% vs. about 35% in t=
he cooler suite and ambient temp was down to 32 degrees Celcius within four=
 hours.=0A=0A=0A--=0AErik Levinson=0ACTO, Uberflip=0A416-900-3830 =0A1183 K=
ing Street West, Suite 100=0AToronto ON  M6K 3C5=0Awww.uberflip.com=0A =0A=
=0A-----Original Message-----=0AFrom: "Bryan Tong" <contact@nullivex.com>=
=0ASent: Tuesday, July 9, 2013 11:42pm=0ATo: "Erik Levinson" <erik.levinson=
@uberflip.com>=0ACc: "NANOG mailing list" <nanog@nanog.org>=0ASubject: Re: =
What to expect after a cooling failure=0A=0AHello,=0A=0AIn my experience wi=
th heating issues the only thing that really degrades=0Aquickly in event of=
 overheating are hard drives. If you had them spun down=0Ait should be fine=
.=0A=0ACPU / Memory / Motherboards will be fine.=0A=0AThe only other thing =
I can think of having possible issues are PSU's but if=0Athey were powered =
off should be fine as well. Maybe melted wires but I dont=0Athink it was ho=
t enough for that.=0A=0AThanks=0A=0A=0AOn Tue, Jul 9, 2013 at 9:28 PM, Erik=
 Levinson <erik.levinson@uberflip.com>wrote:=0A=0A> As some may know, yeste=
rday 151 Front St suffered a cooling failure after=0A> Enwave's facilities =
were flooded.=0A>=0A> One of the suites that we're in recovered quickly but=
 the other took much=0A> longer and some of our gear shutdown automatically=
 due to overheating. We=0A> shut down remotely many redundant and non-essen=
tial systems in the hotter=0A> suite, and transferred remotely some others =
to the cooler suite, to ensure=0A> that we had a minimum of all core system=
s running in the hotter suite. We=0A> waited until the temperatures returne=
d to normal, and brought everything=0A> back online. The entire event laste=
d from approx 18:45 until 01:15.=0A> Apparently ambient temperature was abo=
ve 43 degrees Celcius at one point on=0A> the cool side of cabinets in the =
hotter suite.=0A>=0A> For those who have gone through such events in the pa=
st, what can one=0A> expect in terms of long-term impact...should we expect=
 some premature=0A> component failures? Does anyone have any stats to share=
?=0A>=0A> Thanks=0A>=0A> --=0A> Erik Levinson=0A> CTO, Uberflip=0A> 416-900=
-3830=0A> 1183 King Street West, Suite 100=0A> Toronto ON  M6K 3C5=0A> www.=
uberflip.com=0A>=0A>=0A>=0A>=0A=0A=0A-- =0A--------------------=0ABryan Ton=
g=0ANullivex LLC | eSited LLC=0A(507) 298-1624=0A



home help back first fref pref prev next nref lref last post