[115025] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: Facility wide DR/Continuity

daemon@ATHENA.MIT.EDU (Stefan)
Wed Jun 3 11:02:14 2009

In-Reply-To: <F3318834F1F89D46857972DD4B411D700FC91A85@EXCHANGE.thenap.com>
Date: Wed, 3 Jun 2009 10:01:38 -0500
From: Stefan <netfortius@gmail.com>
To: "nanog@nanog.org" <nanog@nanog.org>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org

On Wed, Jun 3, 2009 at 7:09 AM, Drew Weaver <drew.weaver@thenap.com> wrote:

> Hi All,
>
> I'm attempting to devise a method which will provide continuous operation
> of certain resources in the event of a disaster at a single facility.
>
> The types of resources that need to be available in the event of a disaster
> are ecommerce applications and other business critical resources.
>
> Some of the questions I keep running into are:
>
>                Should the additional sites be connected to the primary site
> (and/or the Internet directly)?
>                What is the best way to handle the routing? Obviously two
> devices cannot occupy the same IP address at the same time, so how do you
> provide that instant 'cut-over'? I could see using application balancers to
> do this but then what if the application balancers fail, etc?
>
> Any advice from folks on list or off who have done similar work is greatly
> appreciated.
>
> Thanks,
> -Drew
>
>
>

In an environment where a DR site is deemed critical, it is my experience
that critical business applications also have a test or development
environment associated with the production one. If you look at the problem
this way, then a DR equipped with the test/devel systems, with one
"instance" of production always available, would only be challenging in
terms of data sync. Various SAN solutions would resolve that (SAN sync-ing
over WAN/MAN/etc.). Virtualization of critical systems may also add some
benefits here: clone the critical VMs in the DR, and in conjunction with the
storage being available, you'll be able to bring up this type of machines in
no time - just make sure you have some sort of L2 available - maybe EoS, or
tunneling over an L3 connectivity - tons of info when querying for virtual
machine mobility and inter-site connectivity.

Voice has to be considered, also - f/PSTN - make arrangements with provider
to re-route (8xx) in case of disaster. VoIP may add some extra capabilities
in terms of reachability over the Internet, in case your DR site cannot
accommodate - C/S people, for example, who are critical to interface with
customers in case of disaster (if no information - bigger loss - perception
issues) have to be able to connect even from home.

As far as "immediate" switch from one to another - DNS is the primary
concern (unless some wise people have hardcoded IPs all over), but there are
other issues people tend to forget, at the core of some clilents - take
Oracle "fat" client and its TNS names - I've seen those associated with IPs,
instead of host names ... etc.

Disclaimer: the above = one of many aspects. Have seen DNS comments already,
so I won't repeat those aspects.

HTH,
-- 
***Stefan
http://twitter.com/netfortius

home help back first fref pref prev next nref lref last post