[116899] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

RE: Data Center testing

daemon@ATHENA.MIT.EDU (Dylan Ebner)
Wed Aug 26 11:33:31 2009

From: Dylan Ebner <dylan.ebner@crlmed.com>
To: Dan Snyder <sliplever@gmail.com>, Ken Gilmour <ken.gilmour@gmail.com>
Date: Wed, 26 Aug 2009 15:32:42 +0000
In-Reply-To: <1c2d53bb0908240638t446b0d18tfc711960b84350fc@mail.gmail.com>
Cc: NANOG list <nanog@nanog.org>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org

I would hope that the data center engineers built and ran suite of tests to=
 find failure points before the network infrastructure was put into product=
ion. That said, changes are made constantly to the infrastructure and it ca=
n become very difficult very quickly to know if the failovers are still goi=
ng to work. This is one place where the power and network in a datacenter d=
ivulge. The power systems may take on additional load over the course of th=
e life of the facility, but the transfer switches and generators do not get=
 many changes made to them.  Also, network infrastructure tests are not goi=
ng to be zero impact if there is a config problem. Generator tests are much=
 easier. You can start up the generator and do a load test. You can also lo=
ad test the UPS systems as well. Then you can initiate your failover. Netwo=
rk tests are not going to be zero impact even if there isn't a problem. Let=
's say you wanted to power fail a edge router participating in BGP, it can =
take 30 seconds for that routers route to get withdrawn from the BGP tables=
 of the world. The other problem is network failures always seem to come fr=
om "unexpected" issues. I always love it when I get an outage report from m=
y ISP's or datacenter and they say an "unexpected issue" or "unforseen issu=
e" caused the problem.


Dylan
-----Original Message-----
From: Dan Snyder [mailto:sliplever@gmail.com]=20
Sent: Monday, August 24, 2009 8:39 AM
To: Ken Gilmour
Cc: NANOG list
Subject: Re: Data Center testing

We have done power tests before and had no problem.  I guess I am looking f=
or someone who does testing of the network equipment outside of just power =
tests.  We had an outage due to a configuration mistake that became apparen=
t when a switch failed.  It didn't cause a problem however when we did a po=
wer test for the whole data center.

-Dan


On Mon, Aug 24, 2009 at 9:31 AM, Ken Gilmour <ken.gilmour@gmail.com> wrote:

> I know Peer1 in vancouver reguarly send out notifications of=20
> "non-impacting" generator load testing, like monthly. Also InterXion=20
> in Dublin, Ireland have occasionally sent me notification that there=20
> was a power outage of less than a minute however their backup=20
> successfully took the load.
>
> I only remember one complete outage in Peer1 a few years ago... Never=20
> seen any outage in InterXion Dublin.
>
> Also I don't ever remember any power failure at AiNet (Deepak will=20
> probably elaborate)
>
> 2009/8/24 Dan Snyder <sliplever@gmail.com>:
> > Does any one know of any data centers that do failure testing of=20
> > their networking equipment regularly? I mean to verify that=20
> > everything fails over properly after changes have been made over=20
> > time.  Is there any best practice guides for doing this?
> >
> > Thanks,
> > Dan
> >
>



home help back first fref pref prev next nref lref last post