[46649] in North American Network Operators' Group


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
RE: Quick Question on Industry Standard

daemon@ATHENA.MIT.EDU (Gary Blankenship)
Sun Apr 7 07:49:24 2002

From: "Gary Blankenship" <garyb@foundrynet.com>
To: <kgraham@ican.net>
Cc: <nanog@merit.edu>
Date: Sun, 7 Apr 2002 20:49:24 +0900
Message-ID: <001201c1de2a$449b84c0$965ba8c0@sejapan>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_0013_01C1DE75.B4832CC0"
In-Reply-To: <05924A4A9DEDAD46A21EE3C8C64B090DF3B941@cheetah.zoo.q9networks.com>
Errors-To: owner-nanog-outgoing@merit.edu



This is a multi-part message in MIME format.

------=_NextPart_000_0013_01C1DE75.B4832CC0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit

Kim:
 
>Any thoughts?
 
OK, I'll bite.  
 
CAUTION:  As always, my email is long, wordy, technical and sometimes
skirts off topic; however, I've got to put up with free marketing
references to Cisco/Juniper at every turn on NANOG.  It's nice to get
Foundry's name here every once in awhile.  We're all good companies.
For the most part (95%), I'll stay on HA operational topics for the
NANOG reader.
 
Some recommended books on this subject are listed below.  I will refer
to these books during this email:
 
Top Down Network Design by Priscilla Oppenheimer (Cisco Press)
Designing Enterprise Routing and Switching Architectures by Howard
Berkowitz (McMillan Press)
High Availability Network Fundamentals by Chris Oggerino (Cisco Press)
 
Lot's of other references to include industry standards by Telcordia
(how's your calculus?).  See HA Networking Fundamentals for a good root
reference list.
 
I guess the main thing to do is look on page 48 of Priscilla's book.
She categorizes customer requirements and recommends a method to
prioritize those requirements for network design tradeoffs.  I've added
"profitability" to the list for service providers.  You can tailor it to
your business goals.  I go back to this a lot and it helps me know where
availability sits as a design requirement.
 
Now we have to think about what you mean by "Industry Standard".  This
definitely depends on the industry; however, it varies per company based
on design requirements mapped to business goals of YOUR company in that
industry.  Obviously, having better availability is just one part of a
multi-faceted competitive business plan.  In some industries, it is
assumed to be basic to have high availability.  Other's ignore it.  
 
Some components you must look at are:
 
Human Error/Process Error
Physical Infrastructure Security and Robustness
Equipment Quality
Technologies
Special Events, Risks, and Threats (Sean Donelan digging up your fiber,
hacker attack, governmental or organizational shutdown/de-peering, war
or political unrest, resource shortages, economy, insert your
imagination here)
Maintenance
 
On the technology side, basically...  The lower you push your redundant
failover technology, the better your failover.  SONET APS can failover
in 50ms over and over again.  L2 and L3 protocols continue to operate as
normal with minimum In-Flight Packet loss (IFP).  This is exactly why
the 10GbE Forum is promoting APS in the 10GbE WAN PHY!   
 
Foundry Networks (my company) has two new technologies that can give you
sub second failover and avoid the failover of L3 and slower L2
redundancy protocols (RSTP and STP).  The technologies are Metro Ring
Protocol and Virtual Switch Redundancy Protocol.  Both of these are
currently in beta (soon to be released), but I've been playing with them
for the past week.  VSRP is VRRP on L2 steroids (sub second failover).
Easy to understand (one switch is actively forwarding while the other is
on standby).  All of these L2 protocols are interoperable on the same
devices in the same networks (RSTP, STP, VSRP, MRP).  A customer can run
STP with a provider VSRP edge and MRP core.  VLAN stacking and STP
tunneling is supported for those of you looking at Metro business plans.
Below is an example of HA technology with MRP.   Take a look at this
topology:
 
      _____P1A
PE1    1    |       ___P2A___P2C
      \_____P1B/       2          |       _____P4A
                      \___P2B___P3A/     3      |
                                               \_____PE2
 
I've got link P2B to P3A running MPLS (LER to LER, don't ask why, it's
just a lab) OC-48 (wire speed 2.5G) with Draft Martini L2 VPN.  Link P2A
to P2C is 802.3ae draft 4.2 compliant 10GbE.  All other links are GbE.
I've got 50 VLAN's.  25 of them travel clockwise around the rings and 25
of them travel counter clockwise.  Each group of 25 is grouped in a
topology group and run an instance of MRP on the lead (master) VLAN of
that topology group.  Rings are 1, 2, and 3.  I really hope my diagram
shows up OK for the readers of this email.
 
The MRP ring masters are PE1 for ring1, P1B for ring 2, and PE2 for ring
3.  MRP masters send out Ring Health Packets (RHP) around the ring every
100ms (configurable).  They originate these out of their primary ports
and receive them on their secondary ports.  MRP masters block forwarding
on their secondary ports if they receive the RHP's.  They transition to
forwarding (ring broken) when they stop receiving the RHP's.  
 
Now let's assume that all traffic is taking the bottom path via MRP
primary paths on the masters.  OK, let's start pinging (192.168.1.40 is
PE2 loopback address):
 
PE1#ping 192.168.1.40 count 100000000 time 800
Sending 100000000, 16-byte ICMP Echo to 192.168.1.40, timeout 800 msec,
TTL 64
Type Control-c to abort
    511000Request timed out. < Here I unplug PE1 to P1B link (primary
path).  1 In-Flight Packet (IFP) lost.
    854000Request timed out. < Here I unplug PE1B to P2B link (primary
path) 1 IFP lost
   1160000Request timed out.< Here I unplug P3A to PE2 link (primary
path) 2 IFP's lost.  All traffic on secondary path now.
Request timed out.
   1513000Request timed out. < Here I plug PE2 to PE3 link back.  2
IFP's lost.
Request timed out.
   1638000Request timed out. < Plug P1B to P2B link back in.  1 IFP
lost.
   1823000Request timed out. < Plug PE1 to P1B link back.  2 IFP's lost.
Request timed out.
  1^674000
 
Not too bad considering that MRP is a software technology eh?  Also, the
CPU's of all the devices are at 1%!
 
802.17 Resilient Packet Ring (RPR) is supposed to do EXACTLY what MRP
does, but faster 'cause it is in HW.  Personally, I don't think the
industry needs another L2 technology.  Ethernet will be just fine with
APS in the WAN PHY (Coming this year I'm sure)!   RPR is not Ethernet
and will be more expensive.  I was a Token Ring fan.  I've learned to
respect Ethernet and I regard Ethernet as the clear winner. My XBOX(TM)
at home has Ethernet (NOTE:  My XBOX has only rebooted suddenly on me 3
times as opposed to ZERO for my PS2.  Thanks MS!)!   ATM Segmentation
and Reassembly on OC-192 will be a lot more costly than simple 10GbE as
well.  I'm not even sure if SAR has the capability to do it at wire
speed today.  I've seen nothing on this from the ATM front.  Ethernet
will be at 40GbE (OC-768) before ATM SAR is at OC-192.  My money is on
Ethernet.  LINX is just one of many folks running 10GbE today!   Took
them 3 minutes to make the conversion from what I read in the press
release.  Wonder how long it would take to do an RPR upgrade from GbE
(Haven't seen a working RPR network yet.  I have seen MRP on 10GbE, GbE,
and POS).  ATM?  
 
We can see here that the technology can get us to the point of 100%
availability (I don't consider one or two packets per user session on a
link failover as downtime.  Do you?); however, as you can see from my
design, I've got Single Points of Failure.  I can easily design more
rings (at more cost) and remove these SPOF's.  Now the only question is:
What are your business goals and your acceptable amount of downtime.  I
want to point you to Howard Berkowitz's book for some advice on downtime
tolerance.  I don't want to explain it here; however, the unit of
measurement is called the "Fulford".  Howard talks about a network
design requirement no more than two Fulfords a year.  Hilarious
scenario, but often true.  Howard also is quick to point out that simply
having redundant links does not equal high availability!  Good read
(although Howard can get a bit repetitious, it helps drive the main
points home).
 
I think that there is no acceptable industry standard that you can
simply overlay into an individual company's requirements.  It is all
customized.  Some folks are happy with slow convergence.  As long as the
phone doesn't ring.  Some users accept a provider to be down for 30
minutes every Sunday night.  All relevant.  Some providers have
governmental reporting requirements if they have downtime.
 
One other thing.  If an organization doesn't have downtime reporting
processes and run charts, then I feel they really don't know what
they've got.   You can calculate device serial and parallel availability
by using MTBF/MTTR calculations.  These are all probabilities.  There is
a much bigger picture.  How many times does a network engineer mess
something up and then hide it?  I think every one on this list has made
a serious mistake and caused network downtime WITHOUT reporting it.  I
caused about 10 seconds of downtime on a large service provider network
by accidentally removing the default route (fat fingered!) not two
months ago.  The guy in charge said:  "Only 10 seconds?  Don't worry
'bout it.  Nobody will every know.".  Now you know.  Plan, Do, Check,
Act, Repeat.  You must have processes to track and improve your
availability.  Else you are doing nothing but talking 'bout it and you
are still clueless.
 
Bottom line, there is no hard and fast rule.  Everyone wants 5 nines
(99.999%), yet how can you get that using routing protocol (L3) or
spanning tree (L2) redundancy technologies!  I'm a big L3 fan in my
designs, but I understand the convergence factors.  VSRP on an L2 core
could give you sub second.  L3 BGP may be your only choice.  OSPF with
link aggregation and auto-cost decrementation (is this a word?) on link
failure (hey, aggregate 4X10GbE and loosing one link can have
significant impact on a network core).  An example is, you can get 50ms
to 5 seconds failover using Rapid Spanning Tree, but it takes the normal
failover of STP when returning back to the original path.  This is
almost a minute of downtime.  Not too good on the statistics.  
 
I worked on US Military networks before.  Their redundancy is not only
terrestrial, but uses aircraft and satellites.  Wanna buy a used AWACS?
See how it all relates?  How much money you got?  What are your goals?
How smart are your humans (training is the most upstream process)?  Save
money and use monkeys?  Now back to your original question:  
 
>Is this a hard and fast rule or is this a value that we all try and
emulate as best we can?  Do I have the value incorrect?  Is it higher or
lower?
 
Set your own standard.  I doubt if you'll find the right answer on
NANOG. If you want my generic answer.  I'd say you want 99.999%
availability from all network endpoints to network endpoints during
times of network utilization.  I doubt if you'll hear many complaints
from users/customers at this level.  Please be careful when jumping this
high.  You could pull a muscle (take away from another key requirement
such as Cost, Manageability, Security, Reliability, et al..).
 
Gary Blankenship 
Systems Engineer - Japan


------=_NextPart_000_0013_01C1DE75.B4832CC0
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<TITLE>Message</TITLE>

<META content=3D"MSHTML 6.00.2715.400" name=3DGENERATOR></HEAD>
<BODY>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D494351109-07042002>Kim:</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D494351109-07042002></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D494351109-07042002>&gt;Any thoughts?</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D494351109-07042002></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN =
class=3D494351109-07042002>OK,=20
I'll bite.&nbsp; </SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D494351109-07042002></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D494351109-07042002>CAUTION:&nbsp; As always, my email is long, =
wordy,=20
technical and sometimes skirts off topic; however, I've got to put up =
with free=20
marketing references to Cisco/Juniper&nbsp;at every turn on NANOG.&nbsp; =
It's=20
nice to get Foundry's name here every once in awhile.&nbsp; We're all =
good=20
companies.&nbsp; For the most part (95%), I'll stay on HA operational =
topics for=20
the NANOG reader.</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D494351109-07042002></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN =
class=3D494351109-07042002>Some=20
recommended books on this subject are listed below.&nbsp; I will refer =
to these=20
books during this email:</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D494351109-07042002></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN =
class=3D494351109-07042002>Top=20
Down Network Design by Priscilla Oppenheimer (Cisco =
Press)</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D494351109-07042002>Designing Enterprise Routing and Switching=20
Architectures by Howard Berkowitz (McMillan Press)</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN =
class=3D494351109-07042002>High=20
Availability Network Fundamentals by Chris Oggerino (Cisco=20
Press)</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D494351109-07042002></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN =
class=3D494351109-07042002>Lot's=20
of other references&nbsp;to include industry standards by Telcordia =
(how's your=20
calculus?).&nbsp; See HA Networking Fundamentals for a good root =
reference=20
list.</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D494351109-07042002></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN =
class=3D494351109-07042002>I=20
guess the main thing to do is look on page 48 of Priscilla's book.&nbsp; =
She=20
categorizes customer requirements and recommends a method to prioritize =
those=20
requirements for network design tradeoffs.&nbsp; I've added =
"profitability" to=20
the list for service providers.&nbsp; You can tailor it to your business =

goals.&nbsp; I go back to this a lot and it helps me know where =
availability=20
sits as a design requirement.</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D494351109-07042002></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN =
class=3D494351109-07042002>Now we=20
have to think about what you mean by "Industry Standard".&nbsp; This =
definitely=20
depends on the industry; however, it varies per company based on design=20
requirements mapped to business goals of YOUR company in that =
industry.&nbsp;=20
Obviously, having better availability is just one part of a =
multi-faceted=20
competitive business plan.&nbsp; In some industries, it is assumed to be =
basic=20
to have high availability.&nbsp; Other's ignore it.&nbsp; =
</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D494351109-07042002></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN =
class=3D494351109-07042002>Some=20
components you must look at are:</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D494351109-07042002></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN =
class=3D494351109-07042002>Human=20
Error/Process Error</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D494351109-07042002>Physical Infrastructure Security and=20
Robustness</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D494351109-07042002>Equipment Quality</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D494351109-07042002>Technologies</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D494351109-07042002>Special Events, Risks, and Threats (Sean =
Donelan=20
digging up your fiber, hacker attack, governmental or organizational=20
shutdown/de-peering, war or political unrest, resource shortages, =
economy,=20
insert your imagination here)</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D494351109-07042002>Maintenance</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2></FONT>&nbsp;</DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =
size=3D2>On the=20
technology side, basically...&nbsp; The lower you push your redundant =
failover=20
technology, the better your failover.&nbsp; SONET APS can failover in =
50ms over=20
and over again.&nbsp; L2 and L3 protocols continue to operate as normal =
with=20
minimum In-Flight Packet loss (IFP).&nbsp; This is exactly why the 10GbE =
Forum=20
is promoting APS in the 10GbE WAN PHY!&nbsp;&nbsp; </FONT></SPAN></DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =

size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =

size=3D2>Foundry Networks (my company) has two new technologies that can =
give you=20
sub second failover and avoid the failover of L3 and slower L2 =
redundancy=20
protocols (RSTP and STP).&nbsp; The technologies are Metro Ring Protocol =
and=20
Virtual Switch Redundancy Protocol.&nbsp; Both of these are currently in =
beta=20
(soon to be released), but I've been playing with them for the past =
week.&nbsp;=20
VSRP is VRRP on L2 steroids (sub second failover).&nbsp; Easy to =
understand (one=20
switch is actively forwarding while the other is on =
standby).&nbsp;&nbsp;All of=20
these L2 protocols are interoperable on the same devices in the same =
networks=20
(RSTP, STP, VSRP, MRP).&nbsp; A customer can run STP&nbsp;with a =
provider VSRP=20
edge and MRP core.&nbsp; VLAN stacking and STP tunneling is supported =
for those=20
of you looking at Metro business plans.&nbsp; Below is an example of HA=20
technology with MRP.&nbsp; &nbsp;Take a look at this=20
topology:</FONT></SPAN></DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =

size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =

size=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; _____P1A</FONT></SPAN></DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =

size=3D2>PE1&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;=20
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ___P2A___P2C</FONT></SPAN></DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =

size=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
\_____P1B/&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; _____P4A</FONT></SPAN></DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =

size=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
\___P2B___P3A/&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;=20
|</FONT></SPAN></DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =

size=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
\_____PE2</FONT></SPAN></DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =

size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =
size=3D2>I've=20
got link P2B to P3A running MPLS (LER to LER, don't ask why, it's just a =
lab)=20
OC-48 (wire speed 2.5G) with Draft Martini L2 VPN.&nbsp; Link P2A to P2C =
is=20
802.3ae draft 4.2 compliant 10GbE.&nbsp; All other links are GbE.&nbsp; =
I've got=20
50 VLAN's.&nbsp; 25 of them travel clockwise around the rings and 25 of =
them=20
travel counter clockwise.&nbsp; Each group of 25 is grouped in a =
topology group=20
and run an instance of MRP on the lead (master) VLAN of that topology=20
group.&nbsp; Rings are 1, 2, and 3.&nbsp; I really hope my diagram shows =
up OK=20
for the readers of this email.</FONT></SPAN></DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =

size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =
size=3D2>The=20
MRP ring masters are PE1 for ring1, P1B for ring 2, and PE2 for ring =
3.&nbsp;=20
MRP masters send out Ring Health Packets (RHP) around the ring every =
100ms=20
(configurable).&nbsp; They originate these out of their primary ports =
and=20
receive them on their secondary ports.&nbsp; MRP masters block =
forwarding on=20
their secondary ports if they receive the RHP's.&nbsp; They transition =
to=20
forwarding (ring broken) when they stop receiving the RHP's.&nbsp;=20
</FONT></SPAN></DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =

size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =
size=3D2>Now=20
let's assume that all traffic is taking the bottom path via MRP primary =
paths on=20
the masters.&nbsp; OK, let's start pinging (192.168.1.40 is PE2 loopback =

address):</FONT></SPAN></DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =

size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =

size=3D2>PE1#ping 192.168.1.40 count 100000000 time 800<BR>Sending =
100000000,=20
16-byte ICMP Echo to 192.168.1.40, timeout 800 msec, TTL 64<BR>Type =
Control-c to=20
abort<BR>&nbsp;&nbsp;&nbsp; 511000Request timed out. &lt; Here I unplug =
PE1 to=20
P1B link&nbsp;(primary path).&nbsp;&nbsp;1 In-Flight Packet&nbsp;(IFP)=20
lost.<BR>&nbsp;&nbsp;&nbsp; 854000Request timed out. &lt; Here I unplug =
PE1B to=20
P2B link (primary path) 1 IFP lost<BR>&nbsp;&nbsp; 1160000Request timed =
out.&lt;=20
Here I unplug P3A to PE2 link (primary path) 2 IFP's lost.&nbsp; All =
traffic on=20
secondary path now.<BR>Request timed out.<BR>&nbsp;&nbsp; 1513000Request =
timed=20
out. &lt; Here I plug PE2 to PE3 link back.&nbsp; 2 IFP's =
lost.<BR>Request timed=20
out.<BR>&nbsp;&nbsp; 1638000Request timed out. &lt; Plug P1B to P2B link =
back=20
in.&nbsp;&nbsp;1 IFP lost.<BR>&nbsp;&nbsp; 1823000Request timed out. =
&lt; Plug=20
PE1 to P1B link back.&nbsp; 2 IFP's lost.<BR>Request timed =
out.<BR>&nbsp;=20
1&#710;674000</FONT></SPAN></DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =

size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =
size=3D2>Not=20
too bad considering that MRP is a software technology eh?&nbsp; Also, =
the CPU's=20
of all the devices are at 1%!</FONT></SPAN></DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =

size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =
size=3D2>802.17=20
Resilient Packet Ring (RPR) is supposed to do EXACTLY what MRP does, but =
faster=20
'cause it is in HW.&nbsp; Personally, I don't think the industry needs =
another=20
L2 technology.&nbsp; Ethernet will be just fine with APS in the WAN PHY =
(Coming=20
this year I'm sure)!&nbsp;&nbsp; RPR is not Ethernet and will be more=20
expensive.&nbsp; I was a Token Ring fan.&nbsp; I've learned to respect =
Ethernet=20
and&nbsp;I regard&nbsp;Ethernet as the clear winner.&nbsp;My XBOX(TM) at =
home=20
has Ethernet (NOTE:&nbsp; My XBOX has only rebooted suddenly on me 3 =
times as=20
opposed to ZERO for my PS2.&nbsp; Thanks MS!)!&nbsp; &nbsp;ATM =
Segmentation and=20
Reassembly on OC-192 will be a lot more costly than simple 10GbE as =
well.&nbsp;=20
I'm not even sure if SAR has the capability to do it at wire speed =
today.&nbsp;=20
I've seen nothing on this from the ATM front.&nbsp; Ethernet will be at =
40GbE=20
(OC-768) before ATM SAR is at OC-192.&nbsp; My money is on =
Ethernet.&nbsp; LINX=20
is just one of many folks running 10GbE today!&nbsp;&nbsp; Took them 3 =
minutes=20
to make the conversion from what I read in the press release.&nbsp; =
Wonder how=20
long it would take to do an RPR upgrade from GbE&nbsp;(Haven't seen a =
working=20
RPR network&nbsp;yet.&nbsp; I have seen MRP on 10GbE, GbE, and =
POS).&nbsp;=20
ATM?&nbsp; </FONT></SPAN></DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =

size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =
size=3D2>We can=20
see here that the technology can get us to the point of 100% =
availability (I=20
don't consider one or two packets per user session on a link failover as =

downtime.&nbsp; Do you?); however, as you can see from my design, I've =
got=20
Single Points of Failure.&nbsp; I can easily design more rings (at more =
cost)=20
and remove these SPOF's.&nbsp; Now the only question is:&nbsp; What are =
your=20
business goals and your acceptable amount of downtime.&nbsp; I want to =
point you=20
to Howard Berkowitz's book for some advice on downtime tolerance.&nbsp; =
I don't=20
want to explain it here; however, the unit of measurement is called the=20
"Fulford".&nbsp; Howard talks about a network design requirement no more =
than=20
two Fulfords a year.&nbsp; Hilarious scenario, but often true.&nbsp; =
Howard also=20
is quick to point out that simply having redundant links does not equal =
high=20
availability!&nbsp; Good read (although Howard can get a bit =
repetitious, it=20
helps drive the main points home).</FONT></SPAN></DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =

size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =
size=3D2>I=20
think that there is no acceptable industry standard that you can simply =
overlay=20
into an individual company's requirements.&nbsp; It is all =
customized.&nbsp;=20
Some folks are happy with slow convergence.&nbsp; As long as the phone =
doesn't=20
ring.&nbsp; Some users accept a provider to be down for 30 minutes every =
Sunday=20
night.&nbsp; All relevant.&nbsp; Some providers have governmental =
reporting=20
requirements if they have downtime.</FONT></SPAN></DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =

size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =
size=3D2>One=20
other thing.&nbsp; If an organization doesn't have downtime reporting =
processes=20
and run charts, then I feel they really don't know what they've =
got.&nbsp;&nbsp;=20
You can calculate device serial and parallel availability by using =
MTBF/MTTR=20
calculations.&nbsp; These are all probabilities.&nbsp; There is a much =
bigger=20
picture.&nbsp; How many times does a network engineer mess something up =
and then=20
hide it?&nbsp; I think every one on this list has made a serious mistake =
and=20
caused network downtime WITHOUT reporting it.&nbsp; I caused =
about&nbsp;10=20
seconds of downtime on a large service provider network by accidentally =
removing=20
the default route (fat fingered!) not two months ago.&nbsp; The guy in =
charge=20
said:&nbsp; "Only 10 seconds?&nbsp; Don't worry 'bout it.&nbsp; Nobody =
will=20
every know.".&nbsp; Now you know.&nbsp; Plan, Do, Check, Act, =
Repeat.&nbsp; You=20
must have processes to track and improve your availability.&nbsp; Else =
you are=20
doing nothing but talking 'bout it and you are still=20
clueless.</FONT></SPAN></DIV><!-- Converted from text/rtf format =
--><SPAN=20
lang=3Den-us><FONT face=3D"Courier New" size=3D2>
<DIV></FONT></SPAN><SPAN lang=3Den-us><FONT face=3D"Courier New"=20
size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN lang=3Den-us><FONT face=3D"Courier New" size=3D2>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =
size=3D2>Bottom=20
line, there is no hard and fast rule.&nbsp; Everyone wants 5 nines =
(99.999%),=20
yet how can you get that using routing protocol (L3) or spanning tree =
(L2)=20
redundancy technologies!&nbsp; I'm a big L3 fan in my designs, but I =
understand=20
the convergence factors.&nbsp; VSRP on an L2 core could give you sub=20
second.&nbsp; L3 BGP may be your only choice.&nbsp; OSPF with link =
aggregation=20
and auto-cost decrementation (is this a word?)&nbsp;on link failure =
(hey,=20
aggregate 4X10GbE and loosing one link can have significant impact on a =
network=20
core).&nbsp; An example is, you can get 50ms to 5 seconds failover using =
Rapid=20
Spanning Tree, but it takes the normal failover of STP when returning =
back to=20
the original path.&nbsp; This is almost a minute of downtime.&nbsp; Not =
too good=20
on the statistics.&nbsp; </FONT></SPAN></DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial=20
color=3D#0000ff></FONT></SPAN><SPAN class=3D494351109-07042002><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =
size=3D2>I=20
worked on US Military networks before.&nbsp; Their redundancy is not =
only=20
terrestrial, but uses aircraft and satellites.&nbsp; Wanna buy a used=20
AWACS?&nbsp; </FONT></SPAN><SPAN class=3D494351109-07042002><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>See how it all relates?&nbsp; How much money =
you got?&nbsp;=20
What are your goals?&nbsp; How smart are your humans (training is the =
most=20
upstream process)?&nbsp; Save money and use monkeys?&nbsp; Now back to =
your=20
original question:&nbsp; </FONT></SPAN></DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial=20
color=3D#0000ff></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial =
color=3D#0000ff>&gt;Is this a=20
hard and fast rule or is this a value that we all try and emulate as =
best we=20
can?&nbsp; Do I have the value incorrect?&nbsp; Is it higher or=20
lower?</FONT></SPAN></DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial=20
color=3D#0000ff></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =
size=3D2>Set=20
your own standard.&nbsp; I doubt if you'll find the right answer on=20
NANOG.&nbsp;If you want my generic answer.&nbsp; I'd say you want =
99.999%=20
availability from all network endpoints to network endpoints during =
times of=20
network utilization.&nbsp; I doubt if you'll hear many complaints from=20
users/customers at this level.&nbsp; Please be careful when jumping this =

high.&nbsp; You could pull a muscle (take away from another key =
requirement such=20
as Cost, Manageability, Security, Reliability, et =
al..).</FONT></SPAN></DIV>
<DIV><SPAN class=3D494351109-07042002><FONT face=3DArial color=3D#0000ff =

size=3D2></FONT></SPAN>&nbsp;</DIV>Gary Blankenship</FONT></SPAN> =
<BR><SPAN=20
lang=3Den-us><FONT face=3D"Courier New" size=3D2>Systems Engineer -=20
Japan</FONT></SPAN><BR></DIV></BODY></HTML>

------=_NextPart_000_0013_01C1DE75.B4832CC0--

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[46649] in North American Network Operators' Group

RE: Quick Question on Industry Standard

daemon@ATHENA.MIT.EDU (Gary Blankenship)Sun Apr 7 07:49:24 2002

daemon@ATHENA.MIT.EDU (Gary Blankenship)
Sun Apr 7 07:49:24 2002