[66611] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: One-element vs two-element design

daemon@ATHENA.MIT.EDU (Brent_OKeeffe@asc.aon.com)
Fri Jan 16 23:10:01 2004

To: Timothy Brown <tim@tux.org>
Cc: nanog@merit.edu
From: Brent_OKeeffe@asc.aon.com
Date: Fri, 16 Jan 2004 23:09:23 -0500
Errors-To: owner-nanog-outgoing@merit.edu


This is a multipart message in MIME format.
--=_alternative 0017000B85256E1E_=
Content-Type: text/plain; charset="us-ascii"

One key consideration you should think about is the ability to perform 
maintenance on redundant devices in the N+1 model without impacting the 
availability of the network.

Brent




Timothy Brown <tim@tux.org>
Sent by: owner-nanog@merit.edu
01/16/2004 10:14 PM

 
        To:     nanog@merit.edu
        cc: 
        Subject:        One-element vs two-element design



I fear this may be a mother of a debate.

In my (short?) career, i've been involved in several designs, some 
successful,
some less so.  I've recently been asked to contribute a design for one of 
the
networks I work on.  The design brings with it a number of challenges, but
also, unlike a greenfield network, has a lot of history.

One of the major decisions i'm being faced with is a choice between 
one-element
or two-element design.  When I refer to elements, what I really mean to 
say
is N or N+1.  For quite some time now, vendors have been improving 
hardware
to the point where most components in a given device, with the exception 
of
a line card, can be made redundant.  This includes things like routing and
switching processors, power supplies, busses, and even, in the case of 
vendor
J and several others, the possibility of inflight restarts of  particular
portions of the software as part of either scheduled maintenance or to 
correct
a problem.

I have always been traditionally of the school of learning that states 
that
it is best to have two devices of equal power and on the same footing, 
and,
in multiple site configurations, four devices of equal power and equal 
footing.
I feel like a safe argument to make is N+1, so that is the philosophy that
I tend to adopt.  N+2 or N...whatever doesn't seem to add a lot of 
additional
security into the network's model of availability.  This adds complexity, 
but
I prefer to think of this in terms of,  "Well, I can manage software or 
design
complexity in my configurations, but I can't manage the loss of a single
device which holds my network together."  Now I must view this assertion 
in
the context of better designed hardware and cheap spares-on-hand.

Of course, like many other folks, I have tried to drink as deeply as I can
from the well of knowledge.  I've perused at length Cisco Press' High
Availability Network Fundamentals, and understand MTBF calculations and
some of the design issues in building a highly available network.  But 
from
a cost perspective, it seems that a single, larger box may be able to 
offer me
as much redundancy as two equally configured boxes handling the same 
traffic
load.  Of course, there's that little demon on my shoulder, that tells me
that I could always lose a complete device due to a power issue or short,
and then i'd be up a creek.

We have a history of adopting the N+1 model on the specific network i'm 
talking about, and it has worked very well so far in the face of 
occassional
software failures by a vendor we occassionally have ridiculed here on 
nanog-l.
However, in considering a comprehensive redesign, another vendor offers
significantly more software stability, so i'm re-evaluating the need for
multiple devices.

My mind's more or less already made up, but i'd like to hear the design
philosophies of other members of the operational community when adopting a
N+1 approach.  In particular, i'd love to hear a catastrophic operational
failure which either proves or disproves either of the potential options.

Tim

ObDisclaimer:  Please contact me off-list if you're okay with your 
thoughts
on this matter being published in a book targeted to the operations 
community.




--=_alternative 0017000B85256E1E_=
Content-Type: text/html; charset="us-ascii"


<br><font size=2 face="sans-serif">One key consideration you should think about is the ability to perform maintenance on redundant devices in the N+1 model without impacting the availability of the network.</font>
<br>
<br><font size=2 face="sans-serif">Brent</font>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td>
<td><font size=1 face="sans-serif"><b>Timothy Brown &lt;tim@tux.org&gt;</b></font>
<br><font size=1 face="sans-serif">Sent by: owner-nanog@merit.edu</font>
<p><font size=1 face="sans-serif">01/16/2004 10:14 PM</font>
<br>
<td><font size=1 face="Arial">&nbsp; &nbsp; &nbsp; &nbsp; </font>
<br><font size=1 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; To: &nbsp; &nbsp; &nbsp; &nbsp;nanog@merit.edu</font>
<br><font size=1 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; cc: &nbsp; &nbsp; &nbsp; &nbsp;</font>
<br><font size=1 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; Subject: &nbsp; &nbsp; &nbsp; &nbsp;One-element vs two-element design</font></table>
<br>
<br>
<br><font size=2 face="Courier New"><br>
I fear this may be a mother of a debate.<br>
<br>
In my (short?) career, i've been involved in several designs, some successful,<br>
some less so. &nbsp;I've recently been asked to contribute a design for one of the<br>
networks I work on. &nbsp;The design brings with it a number of challenges, but<br>
also, unlike a greenfield network, has a lot of history.<br>
<br>
One of the major decisions i'm being faced with is a choice between one-element<br>
or two-element design. &nbsp;When I refer to elements, what I really mean to say<br>
is N or N+1. &nbsp;For quite some time now, vendors have been improving hardware<br>
to the point where most components in a given device, with the exception of<br>
a line card, can be made redundant. &nbsp;This includes things like routing and<br>
switching processors, power supplies, busses, and even, in the case of vendor<br>
J and several others, the possibility of inflight restarts of &nbsp;particular<br>
portions of the software as part of either scheduled maintenance or to correct<br>
a problem.<br>
<br>
I have always been traditionally of the school of learning that states that<br>
it is best to have two devices of equal power and on the same footing, and,<br>
in multiple site configurations, four devices of equal power and equal footing.<br>
I feel like a safe argument to make is N+1, so that is the philosophy that<br>
I tend to adopt. &nbsp;N+2 or N...whatever doesn't seem to add a lot of additional<br>
security into the network's model of availability. &nbsp;This adds complexity, but<br>
I prefer to think of this in terms of, &nbsp;&quot;Well, I can manage software or design<br>
complexity in my configurations, but I can't manage the loss of a single<br>
device which holds my network together.&quot; &nbsp;Now I must view this assertion in<br>
the context of better designed hardware and cheap spares-on-hand.<br>
<br>
Of course, like many other folks, I have tried to drink as deeply as I can<br>
from the well of knowledge. &nbsp;I've perused at length Cisco Press' High<br>
Availability Network Fundamentals, and understand MTBF calculations and<br>
some of the design issues in building a highly available network. &nbsp;But from<br>
a cost perspective, it seems that a single, larger box may be able to offer me<br>
as much redundancy as two equally configured boxes handling the same traffic<br>
load. &nbsp;Of course, there's that little demon on my shoulder, that tells me<br>
that I could always lose a complete device due to a power issue or short,<br>
and then i'd be up a creek.<br>
<br>
We have a history of adopting the N+1 model on the specific network i'm <br>
talking about, and it has worked very well so far in the face of occassional<br>
software failures by a vendor we occassionally have ridiculed here on nanog-l.<br>
However, in considering a comprehensive redesign, another vendor offers<br>
significantly more software stability, so i'm re-evaluating the need for<br>
multiple devices.<br>
<br>
My mind's more or less already made up, but i'd like to hear the design<br>
philosophies of other members of the operational community when adopting a<br>
N+1 approach. &nbsp;In particular, i'd love to hear a catastrophic operational<br>
failure which either proves or disproves either of the potential options.<br>
<br>
Tim<br>
<br>
ObDisclaimer: &nbsp;Please contact me off-list if you're okay with your thoughts<br>
on this matter being published in a book targeted to the operations community.<br>
<br>
</font>
<br>
<br>
--=_alternative 0017000B85256E1E_=--

home help back first fref pref prev next nref lref last post