[67932] in North American Network Operators' Group
Re: Converged Networks Threat (Was: Level3 Outage)
daemon@ATHENA.MIT.EDU (Petri Helenius)
Wed Feb 25 14:46:13 2004
Date: Wed, 25 Feb 2004 21:37:24 +0200
From: Petri Helenius <pete@he.iki.fi>
To: David Meyer <dmm@1-4-5.net>
Cc: nanog@merit.edu
In-Reply-To: <20040225191916.GA8564@1-4-5.net>
Errors-To: owner-nanog-outgoing@merit.edu
David Meyer wrote:
>
> No doubt. However, the problem is: What constitutes
> "unnecessary system complexity"? A designed system's
> robustness comes in part from its complexity. So its not
> that complexity is inherently bad; rather, it is just
> that you wind up with extreme sensitivity to outlying
> events which is exhibited by catastrophic cascading
> failures if you push a system's complexity past some
> point; these are the so-called "robust yet fragile"
> systems (think NE power outage).
>
>
I think you hit the nail on the head. I view complexity as diminishing
returns play. When you increase complexity, the increase does benefit a
decreasing percentage of the users. A way to manage complexity is
splitting large systems into smaller pieces and try to make the pieces
independent enough to survive a failure of neighboring piece. This
approach exists at least in the marketing materials of many
telecommunications equipment vendors. The question then becomes, "what
good is a backbone router without BGP process". So far I havenīt seen a
router with a disposable entity on interface or peer basis. So if a BGP
speaker to 10.1.1.1 crashes the system would still be able to maintain
relationship to 10.2.2.2. Obviously the point of single device
availability becomes moot if we can figure out a way to route/switch
around the failed device quickly enough. Today we donīt even have a
generic IP layer liveness protocol so by default packets will be
blackholed for a definite duration until a routing protocol starts to
miss itīs hello packets. (Iīm aware of work towards this goal)
In summary, I feel systems should be designed to run independent in all
failure modes. If you lose 1-n neighbors the system should be
self-sufficient on figuring out near-immediately the situation, continue
working while negotiating with neighbors about the overall picture.
Pete