[154385] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: FYI Netflix is down

daemon@ATHENA.MIT.EDU (Jon Lewis)
Tue Jul 3 13:15:01 2012

Date: Tue, 3 Jul 2012 13:13:39 -0400 (EDT)
From: Jon Lewis <jlewis@lewis.org>
To: Nanog <nanog@nanog.org>
In-Reply-To: <alpine.BSF.2.00.1207021618330.95272@murf.icantclick.org>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org

On Mon, 2 Jul 2012, david raistrick wrote:

> On Mon, 2 Jul 2012, James Downs wrote:
>
>>> back-plane / control-plane was unable to cope with the requests.  Netflix 
>>> uses Amazon's ELB to balance the traffic and no back-plane meant they were 
>>> unable to reconfigure it to route around the problem.
>> 
>> Someone needs to define back-plane/control-plane in this case. (and what 
>> wasn't working)
>
> Amazon resources are controlled (from a consumer viewpoint) by API - that API 
> is also used by amazon's internal toolkits that support ELB (and RDS..). 
> Those (http accessed) API interfaces were unavailable for a good portion of 
> the outages.

It seems like if you're going to outsource your mission critical 
infrastructure to "cloud" you should probably pick at least 2 unrelated 
cloud providers and if at all possible, not outsource the systems that 
balance/direct traffic...and if you're really serious about it, have at 
least two of these setup at different facilities such that if the primary 
goes offline, the secondary takes over.  If a cloud provider fails, you 
redirect to another.

----------------------------------------------------------------------
  Jon Lewis, MCP :)           |  I route
  Senior Network Engineer     |  therefore you are
  Atlantic Net                |
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________


home help back first fref pref prev next nref lref last post