[181016] in North American Network Operators' Group
Re: Open letter to Level3 concerning the global routing issues on
daemon@ATHENA.MIT.EDU (Rafael Possamai)
Sat Jun 13 22:54:09 2015
X-Original-To: nanog@nanog.org
In-Reply-To: <5.1.1.6.2.20150613215121.01f39398@efes.iucc.ac.il>
From: Rafael Possamai <rafael@gav.ufsc.br>
Date: Sat, 13 Jun 2015 14:06:33 -0500
To: Hank Nussbacher <hank@efes.iucc.ac.il>
Cc: NANOG <nanog@nanog.org>
Errors-To: nanog-bounces@nanog.org
A lot of these things are for show only.. Like a big corporation donating
to non-profits and sponsoring "feel good" events. You can see that a lot of
these same businesses also lobby Washington like crazy, so there you go...
This was either an isolated incident or they really don't care much.
On Sat, Jun 13, 2015 at 1:54 PM, Hank Nussbacher <hank@efes.iucc.ac.il>
wrote:
> At 17:32 12/06/2015 +0200, Martin Millnert wrote:
>
> Interesting that Level3 is a member of http://www.routingmanifesto.org/
>
> or see
>
>
> http://www.internetsociety.org/news/network-operators-around-world-demonstrate-their-commitment-secure-and-resilient-internet
>
> to quote Level3
> "As one of the most connected Internet providers in the world, security of
> the Internet is top-of-mind at Level 3 Communications. We are dedicated to
> supporting and protecting the Internet ecosystem and work each day to
> safeguard customers' critical communications. The Internet is a shared
> responsibility, and only through these important collaborative efforts can
> we continue to ensure the protection of this collective infrastructure."
>
> -Hank
>
>
> Dear Level3,
>>
>> The Internet is a cooperative effort, and it works well only when its
>> participants take constructive actions to address errors and remedy
>> problems.
>> Your position as a major Internet Carrier bestows upon you a certain
>> degree of responsibility for the correct operation of the Internet all
>> across (and beyond) the planet. You have many customers. Customers will
>> always occasionally make mistakes. You as a major Internet Carrier have
>> a responsibility to limit, not amplify, your customers' mistakes.
>> Other major carriers implement technical measures that severely limits
>> the damages from customer mistakes from having global impact.
>> Other major carriers also implement operational procedures in addition
>> to technical measures.
>> In combination, these measures drastically reduce the outage-hours as a
>> result of customer configuration errors.
>>
>> At 08:44 UTC on Friday 12th of June, one of your transit customers,
>> Telekom Malaysia (AS4788) began announcing the full Internet table back
>> to you, which you accepted and propagated to your peers and customers,
>> causing global outages for close to 3 hours.
>> [ https://twitter.com/DynResearch/status/609340592036970496 ]
>> During this 3 hour window, it appears (from your own service outage
>> reports) that you did nothing to stop the global Internet outage, but
>> that Telekom Malaysia themselves eventually resolved it. This lack of
>> action on your end, and your disregard for the correct operation of the
>> global Internet is astonishing. These mistakes do not need to happen.
>> AS4788 under normal circumstances announces ~1900 IPv4 prefixes to the
>> Internet. You accepted multiple hundred thousand prefixes from them - a
>> max prefix setting would have severely limited the damage. We expect
>> that these are your practices as well, but they failed. When they do, it
>> should not take ~3 hours to shut down the session(s).
>>
>> Many operators, in despair, turned down their peering sessions with you
>> once it was clear you were causing the outages and no immediate fix was
>> in sight. This improved the situation for some - but not all did. Had
>> you deployed proper IRR-filtering to filter the bad announcements the
>> impact would've been far less critical.
>>
>> As a direct consequence of your ~3 hours of inaction, as a local
>> example, Swedish payment terminals were experiencing problems all over
>> the country. The Swedish economy was directly affected by your inaction.
>> There were queues when I was buying lunch! Imagine the food rage. The
>> situation was probably similar at other places around the globe where
>> people were awake.
>>
>> Operators around the planet are curious:
>> - Did Level3 not detect or understand that it was causing global
>> Internet outages for ~3 hours?
>> - If Level3 did in fact detect or understand it was causing global
>> Internet outages, why did it not properly and immediately remedy the
>> situation?
>> - What is Level3 going to do to address these questions and begin work
>> on restoring its credibility as a carrier?
>>
>> We all understand that mistakes do happen (in applying customer
>> interface templates, etc.). However the Internet is all too pervasive in
>> everyday life today for anything but swift action by carriers to remedy
>> breakage after the fact. It is absolutely not sufficient to let a
>> customer spend 3 hours to detect and fix a situation like this one. It
>> is unacceptable that no swift action was taken on your end to limit the
>> global routing issues you caused.
>>
>> Sincerely,
>> Martin Millnert
>> Member of Internet Community - no carrier / ISP affiliation.
>>
>
>