[98104] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: San Francisco Power Outage

daemon@ATHENA.MIT.EDU (Brandon Galbraith)
Tue Jul 24 20:57:14 2007

Date: Tue, 24 Jul 2007 18:57:34 -0500
From: "Brandon Galbraith" <brandon.galbraith@gmail.com>
To: "Seth Mattinen" <sethm@rollernet.us>
Cc: "nanog list" <nanog@merit.edu>
In-Reply-To: <46A68310.4070000@rollernet.us>
Errors-To: owner-nanog@merit.edu


------=_Part_178091_30313116.1185321454872
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

On 7/24/07, Seth Mattinen <sethm@rollernet.us> wrote:
>
>
> I have a question: does anyone seriously accept "oh, power trouble" as a
> reason your servers went offline? Where's the generators? UPS? Testing
> said combination of UPS and generators? What if it was important? I
> honestly find it hard to believe anyone runs a facility like that and
> people actually *pay* for it.
>
> If you do accept this is a good reason for failure, why?
>
> ~Seth
>

I'm unable to find a link at the moment, but many moons ago power was lost
at the 350 E Cermak Equinix facility in Chicago. At the time, we didn't have
production equipment there (only a firewall in a shared colo cage/cabinet).
This occured on a Friday evening and lasted for quite some time into
Saturday morning because their generators would start up but would refuse to
continue running. I believe the root cause was a problem related to
insulation on the power cables somewhere. I understand testing is done
frequently, but I'm also aware that if I want full redundancy, I'm going to
have two physically separate locations. There are some events you can't plan
for, as well as failure modes that aren't easily/quickly resolved.

-brandon

------=_Part_178091_30313116.1185321454872
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

<br><div><span class="gmail_quote">On 7/24/07, <b class="gmail_sendername">Seth Mattinen</b> &lt;<a href="mailto:sethm@rollernet.us">sethm@rollernet.us</a>&gt; wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>I have a question: does anyone seriously accept &quot;oh, power trouble&quot; as a<br>reason your servers went offline? Where&#39;s the generators? UPS? Testing<br>said combination of UPS and generators? What if it was important? I
<br>honestly find it hard to believe anyone runs a facility like that and<br>people actually *pay* for it.<br><br>If you do accept this is a good reason for failure, why?<br><br>~Seth<br></blockquote></div><br>I&#39;m unable to find a link at the moment, but many moons ago power was lost at the 350 E Cermak Equinix facility in Chicago. At the time, we didn&#39;t have production equipment there (only a firewall in a shared colo cage/cabinet). This occured on a Friday evening and lasted for quite some time into Saturday morning because their generators would start up but would refuse to continue running. I believe the root cause was a problem related to insulation on the power cables somewhere. I understand testing is done frequently, but I&#39;m also aware that if I want full redundancy, I&#39;m going to have two physically separate locations. There are some events you can&#39;t plan for, as well as failure modes that aren&#39;t easily/quickly resolved.
<br><br>-brandon<br>

------=_Part_178091_30313116.1185321454872--


home help back first fref pref prev next nref lref last post