[98694] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

RE: Extreme congestion (was Re: inter-domain link recovery)

daemon@ATHENA.MIT.EDU (Rod Beck)
Wed Aug 15 16:04:27 2007

Date: Wed, 15 Aug 2007 20:40:27 +0100
From: "Rod Beck" <Rod.Beck@hiberniaatlantic.com>
To: =?iso-8859-1?Q?Chilo=E9_Temuco?= <dzlboi@gmail.com>, <nanog@merit.edu>
Errors-To: owner-nanog@merit.edu


This is a multi-part message in MIME format.

------_=_NextPart_001_01C7DF74.4DB9AB44
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Is this a declaration of principles? There is no reason why 'Tier 1' =
means that the carrier will not have an incentive to shape or even block =
traffic. Particularly, if they have a lot of eyeballs.=20

Roderick S. Beck
Director of EMEA Sales
Hibernia Atlantic
1, Passage du Chantier, 75012 Paris
http://www.hiberniaatlantic.com
Wireless: 1-212-444-8829.=20
Landline: 33-1-4346-3209
AOL Messenger: GlobalBandwidth
rod.beck@hiberniaatlantic.com
rodbeck@erols.com
``Unthinking respect for authority is the greatest enemy of truth.'' =
Albert Einstein.=20



-----Original Message-----
From: owner-nanog@merit.edu on behalf of Chilo=E9 Temuco
Sent: Wed 8/15/2007 6:06 PM
To: nanog@merit.edu
Subject: Re: Extreme congestion (was Re: inter-domain link recovery)
=20
Congestion and applications...=20

My opinion:
=20
A tier 1 provider does not care what traffic it carries.  That is all a =
function of the application not the network.
=20
A tier 2 provider may do traffic shaping, etc.
=20
A tier 3 provider may decide to block traffic paterns.
=20
________________________________

=20
More or less...  The network was intended to move data from one machine =
to another...  The less manipulation in the middle the better...  No =
manipulation of the payload is the name of the game.
=20
That being said.  It's entirely a function of the application to timeout =
and drop out of order packets, etc.
=20
ONS is designed around this principle.
=20
In streaming data... often it is better to get bad or missing data than =
to try and put out of order or bad data in the buffer...=20
=20
A good example is digital over-the-air tv...  If you didn't build in =
enough error correction... then you'll have digital breakup, etc.   It =
is impossible to recover any of that data.
=20
If reliable transport of data is required... That is a function of the =
application.

ONS is an Optical Networking Standard in the development stage.

-Chiloe Temuco

On 8/15/07, Stephen Wilcox <steve.wilcox@packetrade.com> wrote:=20


	Hey Sean,
=09
	On Wed, Aug 15, 2007 at 11:35:43AM -0400, Sean Donelan wrote:
	> On Wed, 15 Aug 2007, Stephen Wilcox wrote:=20
	> >(Check slide 4) - the simple fact was that with something like 7 of =
9
	> >cables down the redundancy is useless .. even if operators =
maintained
	> >N+1 redundancy which is unlikely for many operators that would imply =

	> >50% of capacity was actually used with 50% spare.. however we see
	> >around 78% of capacity is lost. There was simply to much traffic and
	> >not enough capacity.. IP backbones fail pretty badly when faced with =

	> >extreme congestion.
	>
	> Remember the end-to-end principle.  IP backbones don't fail with =
extreme
	> congestion, IP applications fail with extreme congestion.
=09
	Hmm I'm not sure about that... a 100% full link dropping packets causes =
many problems:=20
	L7: Applications stop working, humans get angry
	L4: TCP/UDP drops cause retransmits, connection drops, retries etc
	L3: BGP sessions drop, OSPF hellos are lost.. routing fails
	L2: STP packets dropped.. switching fails=20
=09
	I believe any or all of the above could occur on a backbone which has =
just failed massively and now has 20% capacity available such as =
occurred in SE Asia
=09
	> Should IP applications respond to extreme congestion conditions =
better?=20
	alert('Connection dropped')
	"Ping timed out"
=09
	kinda icky but its not the applications job to manage the network
=09
	> Or should IP backbones have methods to predictably control which IP=20
	> applications receive the remaining IP bandwidth?  Similar to the =
telephone
	> network special information tone -- All Circuits are Busy.  Maybe =
we've
	> found a new use for ICMP Source Quench.
=09
	yes and no.. for a private network perhaps, but for the Internet =
backbone where all traffic is important (right?), differentiation is =
difficult unless applied at the edge and you have major failure and =
congestion i dont see what you can do that will have any reasonable =
effect. perhaps you are a government contractor and you reserve some =
capacity for them and drop everything else but what is really out there =
as a solution?=20
=09
	FYI I have seen telephone networks fail badly under extreme congestion. =
CO's have small CPUs that dont do a whole lot - setup calls, send busy =
signals .. once a call is in place it doesnt occupy CPU time as the path =
is locked in place elsewhere. however, if something occurs to cause a =
serious amount of busy ccts then CPU usage goes thro the roof and you =
can cause cascade failures of whole COs=20
=09
	telcos look to solutions such as call gapping to intervene when they =
anticipate major congestion, and not rely on the network to handle it
=09
	> Even if the IP protocols recover "as designed," does human impatience =
mean=20
	> there is a maximum recovery timeout period before humans start making =
the
	> problem worse?
=09
	i'm not sure they were designed to do this.. the arpanet wasnt intended =
to be massively congested.. the redundant links were in place to cope =
with loss of a node and usage was manageable.=20
=09
	Steve
=09




This e-mail and any attachments thereto is intended only for use by the =
addressee(s) named herein and may be proprietary and/or legally =
privileged. If you are not the intended recipient of this e-mail, you =
are hereby notified that any dissemination, distribution or copying of =
this email, and any attachments thereto, without the prior written =
permission of the sender is strictly prohibited. If you receive this =
e-mail in error, please immediately telephone or e-mail the sender and =
permanently delete the original copy and any copy of this e-mail, and =
any printout thereof. All documents, contracts or agreements referred or =
attached to this e-mail are SUBJECT TO CONTRACT. The contents of an =
attachment to this e-mail may contain software viruses that could damage =
your own computer system. While Hibernia Atlantic has taken every =
reasonable precaution to minimize this risk, we cannot accept liability =
for any damage that you sustain as a result of software viruses. You =
should carry out your own virus checks before opening any attachment



------_=_NextPart_001_01C7DF74.4DB9AB44
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7638.1">
<TITLE>RE: Extreme congestion (was Re: inter-domain link =
recovery)</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>Is this a declaration of principles? There is no =
reason why 'Tier 1' means that the carrier will not have an incentive to =
shape or even block traffic. Particularly, if they have a lot of =
eyeballs.<BR>
<BR>
Roderick S. Beck<BR>
Director of EMEA Sales<BR>
Hibernia Atlantic<BR>
1, Passage du Chantier, 75012 Paris<BR>
<A =
HREF=3D"http://www.hiberniaatlantic.com">http://www.hiberniaatlantic.com<=
/A><BR>
Wireless: 1-212-444-8829.<BR>
Landline: 33-1-4346-3209<BR>
AOL Messenger: GlobalBandwidth<BR>
rod.beck@hiberniaatlantic.com<BR>
rodbeck@erols.com<BR>
``Unthinking respect for authority is the greatest enemy of truth.'' =
Albert Einstein.<BR>
<BR>
<BR>
<BR>
-----Original Message-----<BR>
From: owner-nanog@merit.edu on behalf of Chilo=E9 Temuco<BR>
Sent: Wed 8/15/2007 6:06 PM<BR>
To: nanog@merit.edu<BR>
Subject: Re: Extreme congestion (was Re: inter-domain link recovery)<BR>
<BR>
Congestion and applications...<BR>
<BR>
My opinion:<BR>
<BR>
A tier 1 provider does not care what traffic it carries.&nbsp; That is =
all a function of the application not the network.<BR>
<BR>
A tier 2 provider may do traffic shaping, etc.<BR>
<BR>
A tier 3 provider may decide to block traffic paterns.<BR>
<BR>
________________________________<BR>
<BR>
<BR>
More or less...&nbsp; The network was intended to move data from one =
machine to another...&nbsp; The less manipulation in the middle the =
better...&nbsp; No manipulation of the payload is the name of the =
game.<BR>
<BR>
That being said.&nbsp; It's entirely a function of the application to =
timeout and drop out of order packets, etc.<BR>
<BR>
ONS is designed around this principle.<BR>
<BR>
In streaming data... often it is better to get bad or missing data than =
to try and put out of order or bad data in the buffer...<BR>
<BR>
A good example is digital over-the-air tv...&nbsp; If you didn't build =
in enough error correction... then you'll have digital breakup, =
etc.&nbsp;&nbsp; It is impossible to recover any of that data.<BR>
<BR>
If reliable transport of data is required... That is a function of the =
application.<BR>
<BR>
ONS is an Optical Networking Standard in the development stage.<BR>
<BR>
-Chiloe Temuco<BR>
<BR>
On 8/15/07, Stephen Wilcox &lt;steve.wilcox@packetrade.com&gt; =
wrote:<BR>
<BR>
<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Hey Sean,<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; On Wed, Aug 15, 2007 at =
11:35:43AM -0400, Sean Donelan wrote:<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &gt; On Wed, 15 Aug 2007, =
Stephen Wilcox wrote:<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &gt; &gt;(Check slide 4) - =
the simple fact was that with something like 7 of 9<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &gt; &gt;cables down the =
redundancy is useless .. even if operators maintained<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &gt; &gt;N+1 redundancy which =
is unlikely for many operators that would imply<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &gt; &gt;50% of capacity was =
actually used with 50% spare.. however we see<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &gt; &gt;around 78% of =
capacity is lost. There was simply to much traffic and<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &gt; &gt;not enough =
capacity.. IP backbones fail pretty badly when faced with<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &gt; &gt;extreme =
congestion.<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &gt;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &gt; Remember the end-to-end =
principle.&nbsp; IP backbones don't fail with extreme<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &gt; congestion, IP =
applications fail with extreme congestion.<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Hmm I'm not sure about =
that... a 100% full link dropping packets causes many problems:<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; L7: Applications stop =
working, humans get angry<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; L4: TCP/UDP drops cause =
retransmits, connection drops, retries etc<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; L3: BGP sessions drop, OSPF =
hellos are lost.. routing fails<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; L2: STP packets dropped.. =
switching fails<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; I believe any or all of the =
above could occur on a backbone which has just failed massively and now =
has 20% capacity available such as occurred in SE Asia<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &gt; Should IP applications =
respond to extreme congestion conditions better?<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; alert('Connection =
dropped')<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;Ping timed =
out&quot;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; kinda icky but its not the =
applications job to manage the network<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &gt; Or should IP backbones =
have methods to predictably control which IP<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &gt; applications receive the =
remaining IP bandwidth?&nbsp; Similar to the telephone<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &gt; network special =
information tone -- All Circuits are Busy.&nbsp; Maybe we've<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &gt; found a new use for ICMP =
Source Quench.<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; yes and no.. for a private =
network perhaps, but for the Internet backbone where all traffic is =
important (right?), differentiation is difficult unless applied at the =
edge and you have major failure and congestion i dont see what you can =
do that will have any reasonable effect. perhaps you are a government =
contractor and you reserve some capacity for them and drop everything =
else but what is really out there as a solution?<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; FYI I have seen telephone =
networks fail badly under extreme congestion. CO's have small CPUs that =
dont do a whole lot - setup calls, send busy signals .. once a call is =
in place it doesnt occupy CPU time as the path is locked in place =
elsewhere. however, if something occurs to cause a serious amount of =
busy ccts then CPU usage goes thro the roof and you can cause cascade =
failures of whole COs<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; telcos look to solutions such =
as call gapping to intervene when they anticipate major congestion, and =
not rely on the network to handle it<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &gt; Even if the IP protocols =
recover &quot;as designed,&quot; does human impatience mean<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &gt; there is a maximum =
recovery timeout period before humans start making the<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &gt; problem worse?<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; i'm not sure they were =
designed to do this.. the arpanet wasnt intended to be massively =
congested.. the redundant links were in place to cope with loss of a =
node and usage was manageable.<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Steve<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<BR>
<BR>
<BR>
<BR>
</FONT>
</P>

</BODY>
<!--[object_id=3D#SID=3D01050000000000051500000075ca8dc12eba800b1db34ca40=
1020000#]--><FONT face=3DTahoma size=3D2><FONT color=3D#0000ff>
<P><FONT face=3DArial size=3D2>This e-mail and any attachments thereto =
is intended only for use by the addressee(s) named herein and may be =
proprietary and/or legally privileged. If you are not the intended =
recipient of this e-mail, you are hereby notified that any =
dissemination, distribution or copying of this email, and any =
attachments thereto, without the prior written permission of the sender =
is strictly prohibited. If you receive this e-mail in error, please =
immediately telephone or e-mail the sender and permanently delete the =
original copy and any copy of this e-mail, and any printout thereof. All =
documents, contracts or agreements referred or attached to this e-mail =
are SUBJECT TO CONTRACT. The contents of an attachment to this e-mail =
may contain software viruses that could damage your own computer system. =
While Hibernia Atlantic has taken every reasonable precaution to =
minimize this risk, we cannot accept liability for any damage that you =
sustain as a result of software viruses. You should carry out your own =
virus checks before opening any =
attachment</FONT></P></FONT></FONT></HTML>
------_=_NextPart_001_01C7DF74.4DB9AB44--


home help back first fref pref prev next nref lref last post