[77635] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: Resilience: faults, causes, statistics, open issues

daemon@ATHENA.MIT.EDU (David Andersen)
Fri Jan 28 13:44:41 2005

In-Reply-To: <F005CD411D18D3119C8F00508B0874801245C8C9@ehubunt100.eth.ericsson.se>
Cc: nanog@merit.edu
From: David Andersen <dga@lcs.mit.edu>
Date: Fri, 28 Jan 2005 13:43:51 -0500
To: =?ISO-8859-1?Q?Andr=E1s_Cs=E1sz=E1r_=28IJ/ETH=29?= <Andras.Csaszar@ericsson.com>
Errors-To: owner-nanog-outgoing@merit.edu



On Jan 28, 2005, at 5:30 AM, Andr=E1s Cs=E1sz=E1r (IJ/ETH) wrote:
>
> Just some comments about the root causes of BGP related problems,=20
> maybe you find something useful from the research perspective,=20
> although probably this is not going to be new for you.
>
> I found a few author groups with very related and useful papers:
>
> - Tim Griffin and co.
> - Nick Feamster and co.
> - Jennifer Rexford and co.
> - Lixin Gao and co.

   Yup.  That particular group you mentioned has a lot of interplay.

> These people often have joint publications but sometimes separate as=20=

> well. Also, Craig Labovitz and co have some very useful papers in the=20=

> area of routing convergence time.

Yes.  There's also Morley Mao's convergence work.
>
>
> As I see things now, in case of BGP, routing divergence, configuration=20=

> and policies have a very strong correlation.
>
> A high level conclusion (what you probably can expect from half year=20=

> paper- and presentation-reading research) is that the first root cause=20=

> of BGP problems is the absence of a >>widely deployed and practical<<=20=

> formal language for policies. Since there is no formal language, there=20=

> is
>  no compiler, and so you have unwanted anomalies resulting from your=20=

> config.

   In a sense.  I think that this is one of the root causes, but it's=20
perhaps not the only one.  I think we can group it into two areas:

   a)  Fundamental BGP problems
         (e.g., the convergence/flap damping issues, etc.).   By=20
"fundamental" I don't mean uncorrectable - I simply mean that they're=20
"features" of the protocol as it exists today.  Some may be fundamental=20=

trade-offs in global routing;  I don't know.

   b)  The abovementioned policy issue

Some of the issues in (a) can be corrected through (b) - for example,=20
the Gao/Rexford examination of what policies can be permitted if you=20
want to ensure stable routing.  Given that BGP is a strongly=20
policy-driven beast, many, many of its problems do arise from this.

> So, in the end, although we can possibly identify the root causes=20
> behind BGP problems, I'm not sure they can ever be fully ceased. OK, I=20=

> can imagine a formal language and config compiler, and one can find=20
> verification tools as well, but I can hardly imagine e.g. the sharing=20=

> of policies (although some papers write about methods how to infer the=20=

> necessary knowledge from measurements).

Agreed.  I think we'll make steps, though, and I think that groups of=20
collaborating providers can probably implement some of the solutions=20
between themselves in ways that make sense.

> p.s. Sorry for the long mail :) :)

No worries - quite interesting.  (to me, at least!)

   -Dave


home help back first fref pref prev next nref lref last post