[98694] in North American Network Operators' Group
RE: Extreme congestion (was Re: inter-domain link recovery)
daemon@ATHENA.MIT.EDU (Rod Beck)
Wed Aug 15 16:04:27 2007
Date: Wed, 15 Aug 2007 20:40:27 +0100
From: "Rod Beck" <Rod.Beck@hiberniaatlantic.com>
To: =?iso-8859-1?Q?Chilo=E9_Temuco?= <dzlboi@gmail.com>, <nanog@merit.edu>
Errors-To: owner-nanog@merit.edu
This is a multi-part message in MIME format.
------_=_NextPart_001_01C7DF74.4DB9AB44
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Is this a declaration of principles? There is no reason why 'Tier 1' =
means that the carrier will not have an incentive to shape or even block =
traffic. Particularly, if they have a lot of eyeballs.=20
Roderick S. Beck
Director of EMEA Sales
Hibernia Atlantic
1, Passage du Chantier, 75012 Paris
http://www.hiberniaatlantic.com
Wireless: 1-212-444-8829.=20
Landline: 33-1-4346-3209
AOL Messenger: GlobalBandwidth
rod.beck@hiberniaatlantic.com
rodbeck@erols.com
``Unthinking respect for authority is the greatest enemy of truth.'' =
Albert Einstein.=20
-----Original Message-----
From: owner-nanog@merit.edu on behalf of Chilo=E9 Temuco
Sent: Wed 8/15/2007 6:06 PM
To: nanog@merit.edu
Subject: Re: Extreme congestion (was Re: inter-domain link recovery)
=20
Congestion and applications...=20
My opinion:
=20
A tier 1 provider does not care what traffic it carries. That is all a =
function of the application not the network.
=20
A tier 2 provider may do traffic shaping, etc.
=20
A tier 3 provider may decide to block traffic paterns.
=20
________________________________
=20
More or less... The network was intended to move data from one machine =
to another... The less manipulation in the middle the better... No =
manipulation of the payload is the name of the game.
=20
That being said. It's entirely a function of the application to timeout =
and drop out of order packets, etc.
=20
ONS is designed around this principle.
=20
In streaming data... often it is better to get bad or missing data than =
to try and put out of order or bad data in the buffer...=20
=20
A good example is digital over-the-air tv... If you didn't build in =
enough error correction... then you'll have digital breakup, etc. It =
is impossible to recover any of that data.
=20
If reliable transport of data is required... That is a function of the =
application.
ONS is an Optical Networking Standard in the development stage.
-Chiloe Temuco
On 8/15/07, Stephen Wilcox <steve.wilcox@packetrade.com> wrote:=20
Hey Sean,
=09
On Wed, Aug 15, 2007 at 11:35:43AM -0400, Sean Donelan wrote:
> On Wed, 15 Aug 2007, Stephen Wilcox wrote:=20
> >(Check slide 4) - the simple fact was that with something like 7 of =
9
> >cables down the redundancy is useless .. even if operators =
maintained
> >N+1 redundancy which is unlikely for many operators that would imply =
> >50% of capacity was actually used with 50% spare.. however we see
> >around 78% of capacity is lost. There was simply to much traffic and
> >not enough capacity.. IP backbones fail pretty badly when faced with =
> >extreme congestion.
>
> Remember the end-to-end principle. IP backbones don't fail with =
extreme
> congestion, IP applications fail with extreme congestion.
=09
Hmm I'm not sure about that... a 100% full link dropping packets causes =
many problems:=20
L7: Applications stop working, humans get angry
L4: TCP/UDP drops cause retransmits, connection drops, retries etc
L3: BGP sessions drop, OSPF hellos are lost.. routing fails
L2: STP packets dropped.. switching fails=20
=09
I believe any or all of the above could occur on a backbone which has =
just failed massively and now has 20% capacity available such as =
occurred in SE Asia
=09
> Should IP applications respond to extreme congestion conditions =
better?=20
alert('Connection dropped')
"Ping timed out"
=09
kinda icky but its not the applications job to manage the network
=09
> Or should IP backbones have methods to predictably control which IP=20
> applications receive the remaining IP bandwidth? Similar to the =
telephone
> network special information tone -- All Circuits are Busy. Maybe =
we've
> found a new use for ICMP Source Quench.
=09
yes and no.. for a private network perhaps, but for the Internet =
backbone where all traffic is important (right?), differentiation is =
difficult unless applied at the edge and you have major failure and =
congestion i dont see what you can do that will have any reasonable =
effect. perhaps you are a government contractor and you reserve some =
capacity for them and drop everything else but what is really out there =
as a solution?=20
=09
FYI I have seen telephone networks fail badly under extreme congestion. =
CO's have small CPUs that dont do a whole lot - setup calls, send busy =
signals .. once a call is in place it doesnt occupy CPU time as the path =
is locked in place elsewhere. however, if something occurs to cause a =
serious amount of busy ccts then CPU usage goes thro the roof and you =
can cause cascade failures of whole COs=20
=09
telcos look to solutions such as call gapping to intervene when they =
anticipate major congestion, and not rely on the network to handle it
=09
> Even if the IP protocols recover "as designed," does human impatience =
mean=20
> there is a maximum recovery timeout period before humans start making =
the
> problem worse?
=09
i'm not sure they were designed to do this.. the arpanet wasnt intended =
to be massively congested.. the redundant links were in place to cope =
with loss of a node and usage was manageable.=20
=09
Steve
=09
This e-mail and any attachments thereto is intended only for use by the =
addressee(s) named herein and may be proprietary and/or legally =
privileged. If you are not the intended recipient of this e-mail, you =
are hereby notified that any dissemination, distribution or copying of =
this email, and any attachments thereto, without the prior written =
permission of the sender is strictly prohibited. If you receive this =
e-mail in error, please immediately telephone or e-mail the sender and =
permanently delete the original copy and any copy of this e-mail, and =
any printout thereof. All documents, contracts or agreements referred or =
attached to this e-mail are SUBJECT TO CONTRACT. The contents of an =
attachment to this e-mail may contain software viruses that could damage =
your own computer system. While Hibernia Atlantic has taken every =
reasonable precaution to minimize this risk, we cannot accept liability =
for any damage that you sustain as a result of software viruses. You =
should carry out your own virus checks before opening any attachment
------_=_NextPart_001_01C7DF74.4DB9AB44
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7638.1">
<TITLE>RE: Extreme congestion (was Re: inter-domain link =
recovery)</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->
<P><FONT SIZE=3D2>Is this a declaration of principles? There is no =
reason why 'Tier 1' means that the carrier will not have an incentive to =
shape or even block traffic. Particularly, if they have a lot of =
eyeballs.<BR>
<BR>
Roderick S. Beck<BR>
Director of EMEA Sales<BR>
Hibernia Atlantic<BR>
1, Passage du Chantier, 75012 Paris<BR>
<A =
HREF=3D"http://www.hiberniaatlantic.com">http://www.hiberniaatlantic.com<=
/A><BR>
Wireless: 1-212-444-8829.<BR>
Landline: 33-1-4346-3209<BR>
AOL Messenger: GlobalBandwidth<BR>
rod.beck@hiberniaatlantic.com<BR>
rodbeck@erols.com<BR>
``Unthinking respect for authority is the greatest enemy of truth.'' =
Albert Einstein.<BR>
<BR>
<BR>
<BR>
-----Original Message-----<BR>
From: owner-nanog@merit.edu on behalf of Chilo=E9 Temuco<BR>
Sent: Wed 8/15/2007 6:06 PM<BR>
To: nanog@merit.edu<BR>
Subject: Re: Extreme congestion (was Re: inter-domain link recovery)<BR>
<BR>
Congestion and applications...<BR>
<BR>
My opinion:<BR>
<BR>
A tier 1 provider does not care what traffic it carries. That is =
all a function of the application not the network.<BR>
<BR>
A tier 2 provider may do traffic shaping, etc.<BR>
<BR>
A tier 3 provider may decide to block traffic paterns.<BR>
<BR>
________________________________<BR>
<BR>
<BR>
More or less... The network was intended to move data from one =
machine to another... The less manipulation in the middle the =
better... No manipulation of the payload is the name of the =
game.<BR>
<BR>
That being said. It's entirely a function of the application to =
timeout and drop out of order packets, etc.<BR>
<BR>
ONS is designed around this principle.<BR>
<BR>
In streaming data... often it is better to get bad or missing data than =
to try and put out of order or bad data in the buffer...<BR>
<BR>
A good example is digital over-the-air tv... If you didn't build =
in enough error correction... then you'll have digital breakup, =
etc. It is impossible to recover any of that data.<BR>
<BR>
If reliable transport of data is required... That is a function of the =
application.<BR>
<BR>
ONS is an Optical Networking Standard in the development stage.<BR>
<BR>
-Chiloe Temuco<BR>
<BR>
On 8/15/07, Stephen Wilcox <steve.wilcox@packetrade.com> =
wrote:<BR>
<BR>
<BR>
Hey Sean,<BR>
<BR>
On Wed, Aug 15, 2007 at =
11:35:43AM -0400, Sean Donelan wrote:<BR>
> On Wed, 15 Aug 2007, =
Stephen Wilcox wrote:<BR>
> >(Check slide 4) - =
the simple fact was that with something like 7 of 9<BR>
> >cables down the =
redundancy is useless .. even if operators maintained<BR>
> >N+1 redundancy which =
is unlikely for many operators that would imply<BR>
> >50% of capacity was =
actually used with 50% spare.. however we see<BR>
> >around 78% of =
capacity is lost. There was simply to much traffic and<BR>
> >not enough =
capacity.. IP backbones fail pretty badly when faced with<BR>
> >extreme =
congestion.<BR>
><BR>
> Remember the end-to-end =
principle. IP backbones don't fail with extreme<BR>
> congestion, IP =
applications fail with extreme congestion.<BR>
<BR>
Hmm I'm not sure about =
that... a 100% full link dropping packets causes many problems:<BR>
L7: Applications stop =
working, humans get angry<BR>
L4: TCP/UDP drops cause =
retransmits, connection drops, retries etc<BR>
L3: BGP sessions drop, OSPF =
hellos are lost.. routing fails<BR>
L2: STP packets dropped.. =
switching fails<BR>
<BR>
I believe any or all of the =
above could occur on a backbone which has just failed massively and now =
has 20% capacity available such as occurred in SE Asia<BR>
<BR>
> Should IP applications =
respond to extreme congestion conditions better?<BR>
alert('Connection =
dropped')<BR>
"Ping timed =
out"<BR>
<BR>
kinda icky but its not the =
applications job to manage the network<BR>
<BR>
> Or should IP backbones =
have methods to predictably control which IP<BR>
> applications receive the =
remaining IP bandwidth? Similar to the telephone<BR>
> network special =
information tone -- All Circuits are Busy. Maybe we've<BR>
> found a new use for ICMP =
Source Quench.<BR>
<BR>
yes and no.. for a private =
network perhaps, but for the Internet backbone where all traffic is =
important (right?), differentiation is difficult unless applied at the =
edge and you have major failure and congestion i dont see what you can =
do that will have any reasonable effect. perhaps you are a government =
contractor and you reserve some capacity for them and drop everything =
else but what is really out there as a solution?<BR>
<BR>
FYI I have seen telephone =
networks fail badly under extreme congestion. CO's have small CPUs that =
dont do a whole lot - setup calls, send busy signals .. once a call is =
in place it doesnt occupy CPU time as the path is locked in place =
elsewhere. however, if something occurs to cause a serious amount of =
busy ccts then CPU usage goes thro the roof and you can cause cascade =
failures of whole COs<BR>
<BR>
telcos look to solutions such =
as call gapping to intervene when they anticipate major congestion, and =
not rely on the network to handle it<BR>
<BR>
> Even if the IP protocols =
recover "as designed," does human impatience mean<BR>
> there is a maximum =
recovery timeout period before humans start making the<BR>
> problem worse?<BR>
<BR>
i'm not sure they were =
designed to do this.. the arpanet wasnt intended to be massively =
congested.. the redundant links were in place to cope with loss of a =
node and usage was manageable.<BR>
<BR>
Steve<BR>
<BR>
<BR>
<BR>
<BR>
</FONT>
</P>
</BODY>
<!--[object_id=3D#SID=3D01050000000000051500000075ca8dc12eba800b1db34ca40=
1020000#]--><FONT face=3DTahoma size=3D2><FONT color=3D#0000ff>
<P><FONT face=3DArial size=3D2>This e-mail and any attachments thereto =
is intended only for use by the addressee(s) named herein and may be =
proprietary and/or legally privileged. If you are not the intended =
recipient of this e-mail, you are hereby notified that any =
dissemination, distribution or copying of this email, and any =
attachments thereto, without the prior written permission of the sender =
is strictly prohibited. If you receive this e-mail in error, please =
immediately telephone or e-mail the sender and permanently delete the =
original copy and any copy of this e-mail, and any printout thereof. All =
documents, contracts or agreements referred or attached to this e-mail =
are SUBJECT TO CONTRACT. The contents of an attachment to this e-mail =
may contain software viruses that could damage your own computer system. =
While Hibernia Atlantic has taken every reasonable precaution to =
minimize this risk, we cannot accept liability for any damage that you =
sustain as a result of software viruses. You should carry out your own =
virus checks before opening any =
attachment</FONT></P></FONT></FONT></HTML>
------_=_NextPart_001_01C7DF74.4DB9AB44--