[85392] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: Level 3's side of the story

daemon@ATHENA.MIT.EDU (Richard A Steenbergen)
Sat Oct 8 15:07:48 2005

Date: Sat, 8 Oct 2005 15:07:13 -0400
From: Richard A Steenbergen <ras@e-gerbil.net>
To: nanog@merit.edu
In-Reply-To: <4347D686.2020303@equinephotoart.com>
Errors-To: owner-nanog@merit.edu


On Sat, Oct 08, 2005 at 07:24:06AM -0700, JC Dill wrote:
>=20
> Cogent was a "tier 1" until prior de-peering incidents left them unable=
=20
> to reach other networks.  They solved this by buying filtered transit=20
> thru Verio to reach the networks they couldn't reach via peering.

For the record, Cogent was never a Tier 1. They have never had Sprint=20
peering (unless you count the 30 seconds between acquisition of a company=
=20
that did have it, and the depeering notice, years ago). Cogent's history=20
of depeering debacles, at least as best as I can remember them, is:

ATDN (AS1668) depeers Cogent, December 18 2002.
http://www.cctec.com/maillists/nanog/historical/0212/msg00366.html
http://www.cctec.com/maillists/nanog/historical/0212/msg00412.html

ATDN is in the process of shutting off legacy transit and peering on its=20
path to tier-1-dom, and disconnects Cogent due to ratio (also technically=
=20
Cogent is still on a trial peering session). At this time, Cogent is a=20
full transit customer of AboveNet (AS6461), ATDN is still a full transit=20
customer of Level 3, and Cogent is a peer of Level 3. Following the=20
depeering, Cogent shifts 100% of the traffic to their (3) peers, which=20
become severely congested nearly 24/7 for several weeks. Despite being=20
able to send some traffic to AboveNet transit, they decide to leave=20
traffic congested to (3) to see if ATDN will repeer (not knowing that AOL=
=20
customers don't know what peering is and thus won't be nearly as vocal as=
=20
Cogent's customers). Traffic stays congested until Cogent's peering=20
capacity with (3) is upgraded. ATDN later switches their routing with (3)=
=20
=66rom transit to customer-only routes (removing the last of their transit=
=20
paths), at which point Cogent shifts traffic to newly acquired Verio=20
transit to reach them.



Teleglobe (AS6453) depeers Cogent, some time in Feb 2005?
Don't ask me why but I can't find a NANOG thread discussing this.

Teleglobe depeers Cogent due to various ratio and market pressure issues.=
=20
Of note is that Cogent has recently entered the Canadian market where=20
Teleglobe has a strong presence, and has started giving away free or=20
nearly free transit to large inbound networks. Teleglobe is a Sprint=20
customer, and Cogent reaches Sprint through Verio. Teleglobe is caught=20
completely off-guard when Cogent refuses to accept the route via Sprint=20
transit, and blocks traffic between the networks. This continues for=20
several days, until eventually routes are leaked/added from Teleglobe to=20
SAVVIS (AS3561), who Cogent peers with. This continues for a few days more=
=20
until Teleglobe finally agrees to repeer Cogent.



France Telecom (OpenTransit/AS5511) depeers Cogent, April 14 2005
http://www.merit.edu/mail.archives/nanog/2005-04/msg00484.html

FT depeers Cogent due to, well, a variety of issues and general=20
unhappiness surrounding Cogent's entrance into their markets through the=20
purchase of Lambdanet. FT is a Sprint customer, Cogent is already=20
receiving Sprint routes via Verio but intentionally blocks these routes so=
=20
that they have no path to FT. The rumored resolution to the dispute is=20
that a FT customer sues Cogent in France, and a French judge either does=20
or is about to fine the hell out of Cogent unless connectivity is=20
restored. At this point Cogent caves, and begins accepting the routes via=
=20
Sprint (via Verio).



Of course I am certain there are a lot more depeerings (both from and to=20
Cogent) that did not make the news, but these are the big notable events=20
that dramatically impacted connectivity. For anyone keeping score, the=20
last two times Cogent was depeered, it responded by intentionally blocking=
=20
connectivity to the network in question, despite the fact that both of=20
those networks were Sprint customers and thus perfectly reachable under=20
the Sprint transit Cogent gets from Verio. While no one has come forward=20
to say if the Cogent/Verio agreement is structured for full transit or=20
only Sprint/ATDN routes, Cogent has certainly set a precedent for=20
intentionally disrupting connectivity in response to depeering, as a scare=
=20
tactic to keep other networks from depeering them.


> L3 was hoping to force Cogent to increase that transit to include the=20
> traffic destined for L3's customers, thus raising Cogent's transport=20
> costs at no additional (transport) cost to L3.

As I've already pointed out, L3 depeering Cogent is in fact a major=20
revenue loss for L3. Not only will they not make any money off of Cogent=20
(since we both know Cogent will NEVER give them money for direct transit),=
=20
but Cogent will heavily depref them and shift many many gigabits of=20
traffic away from L3 and onto their competitors, traffic that L3 was=20
previously billing their customers for. They'll also lose customers during=
=20
the unreachability, and even if Cogent buckles and buys transit they'll=20
lose some outbound traffic from their multihomed customers due to a longer=
=20
as-path length to reach Cogent and many of Cogent's routes (11k of them=20
remember).

Let me be perfectly clear here, under absolutely no line of logic will L3=
=20
see an increase in revenue from this, period. If you think they will, you=
=20
don't understand how the Internet works. What L3 will see from this is a=20
REDUCTION IN BILLABLE TRAFFIC AND BACKBONE UTILIZATION.

> >3)  Possible traffic issues.  Was Cogent guilty of not transporting the=
=20
> >Level3-bound packets within the Cogent network to the closest=20
> >point-of-entry peer to the host in the Level3 network, therefore=20
> >"costing" Level3 transit of their own packets? =20
>=20
> Possible, in fact probable.  Most ISPs hand off traffic to peers under a=
=20
> "hot potato" policy, they hand it off at the closest point where they=20
> connect.  If the traffic is equal in both directions then this works.=20
> If the traffic is not equal, then this lowers the cost of the network=20
> that has high outbound traffic, as the other network bears the brunt of=
=20
> the total cost for transporting the combined traffic between their=20
> respective customers.

Do you know why people hot potato traffic? Because MEDs suck. In addition=
=20
to the obvious aggregation issues (for example how do you assign a MED=20
value to 4.0.0.0/8, it is used around the world), they usually end up=20
producing sub-optimal routing. IGP cost is a view of what it costs YOU to=
=20
get the packet off your network. MED values set to the opposing network's=
=20
IGP cost is a view of what it costs THEN to get the packet off their=20
network. Neither is a complete view of reality, and the MED view just=20
happens to be worse.

Consider a simple scenerio, You operate a major network, you peer with=20
someone who operates a major network, you both intelligently aggregate=20
your prefixes and work with your customers to make certain that everything=
=20
in BGP maps to a specific geographic region, and you both interconnect=20
with each other in the usual "maxium reasonable extend possible" locations=
=20
(New York, Ashburn, Chicago, Dallas, San Jose, Los Angeles, Seattle,=20
Atlanta, Miami) across the US. Now lets say you have a customer who is in=
=20
Chicago, and they're sending data to a customer in, oh lets say Denver. In=
=20
hot-potato routing, you get the packet off your network in Chicago, and=20
then the other network uses its more complete and detailed understanding=20
of where this packet is going within its own network to know that=20
Chicago->Denver is a straight shot.

In a cold potato situation however, you are only looking at the other=20
network's IGP cost, not your own. Denver is pretty much dead center in the=
=20
middle of San Jose, Chicago, and Dallas, and which one is "closer" is=20
really up for grabs. On the vast majority of networks, Dallas is actually=
=20
closest by IGP cost, with San Jose a close second, and Chicago a close=20
third. If you're cold potato'ing to try and improve routing, even under=20
the most ideal conditions possible (which given the current financial=20
state of the carriers involved RARELY happens these days), you're going to=
=20
end up hauling packets to some out of the way place like SJC or DFW, and=20
then the other network is going to end up hauling packets back to Denver.=
=20
You both lose the "saving money by hauling traffic less" game, and your=20
customers lose in suboptimal routing.

The heart of the problem is that you need to consider your cost + their=20
cost to have meds be effective, even if you solved all the implementation=
=20
issues that you will never practically solve. Unfortunately since two=20
networks have no way to coordinate metrics on the same "scale" (my 43ms=20
may be 4300 igp cost, your 43ms may be 43, and joe bob's 43ms cost may be=
=20
9182), you have no reliable way to "add" the two costs.

Now, networks who are looking for equity in the ratios have a choice. They=
=20
can either:

* Spend thousands of man hours deaggregating (and then listen to=20
  you complain about poluting the routing table with prefixes)
* Spend millions of dollars  deploying more gear into more interconnection=
=20
  locations in areas of network presense but not peering presence (Denver,=
=20
  St Louis, Kansas City, New Orleans, Tampa, Phoenix, etc etc etc), all=20
  in areas without well defined peering locations where they are likely=20
  to end up in buildings across the block but which cost thousands of=20
  dollars to connect, or
* Establish these as smaller interconnections across telco circuits,=20
  again spending thousands of dollars a month more in circuits, hundreds=20
  of thousands of dollars in ports, tens of thousands of man hours=20
  managing capacity at five dozen new interconnections around the world,=20
  all while reducing to almost zero the ability for a major=20
  interconnection to fail over to another major interconnection during a=20
  maintenance, fiber cut, network event, etc.
* Break their customers routing in the process of doing all this.

-OR-

* Depeer said network, expect that they will buy transit from Verio or=20
  any of the other dozens of networks who provide this service, and that=20
  whoever ends up interconnecting with them to deliver the traffic will=20
  have equitable traffic.

Now, which one do you think they're going to pick?

> There are ways to deal with it though, like cold potato routing.

Spoken like someone who has never dealt with the reality of running a=20
large network, or dealt with customers wondering why you are routing their=
=20
traffic across the country and back again. Anyone who values the quality=20
of their connectivity will stick to arm-chair engineering and not actually=
=20
building a network this way.

--=20
Richard A Steenbergen <ras@e-gerbil.net>       http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

home help back first fref pref prev next nref lref last post