[77635] in North American Network Operators' Group
Re: Resilience: faults, causes, statistics, open issues
daemon@ATHENA.MIT.EDU (David Andersen)
Fri Jan 28 13:44:41 2005
In-Reply-To: <F005CD411D18D3119C8F00508B0874801245C8C9@ehubunt100.eth.ericsson.se>
Cc: nanog@merit.edu
From: David Andersen <dga@lcs.mit.edu>
Date: Fri, 28 Jan 2005 13:43:51 -0500
To: =?ISO-8859-1?Q?Andr=E1s_Cs=E1sz=E1r_=28IJ/ETH=29?= <Andras.Csaszar@ericsson.com>
Errors-To: owner-nanog-outgoing@merit.edu
On Jan 28, 2005, at 5:30 AM, Andr=E1s Cs=E1sz=E1r (IJ/ETH) wrote:
>
> Just some comments about the root causes of BGP related problems,=20
> maybe you find something useful from the research perspective,=20
> although probably this is not going to be new for you.
>
> I found a few author groups with very related and useful papers:
>
> - Tim Griffin and co.
> - Nick Feamster and co.
> - Jennifer Rexford and co.
> - Lixin Gao and co.
Yup. That particular group you mentioned has a lot of interplay.
> These people often have joint publications but sometimes separate as=20=
> well. Also, Craig Labovitz and co have some very useful papers in the=20=
> area of routing convergence time.
Yes. There's also Morley Mao's convergence work.
>
>
> As I see things now, in case of BGP, routing divergence, configuration=20=
> and policies have a very strong correlation.
>
> A high level conclusion (what you probably can expect from half year=20=
> paper- and presentation-reading research) is that the first root cause=20=
> of BGP problems is the absence of a >>widely deployed and practical<<=20=
> formal language for policies. Since there is no formal language, there=20=
> is
> no compiler, and so you have unwanted anomalies resulting from your=20=
> config.
In a sense. I think that this is one of the root causes, but it's=20
perhaps not the only one. I think we can group it into two areas:
a) Fundamental BGP problems
(e.g., the convergence/flap damping issues, etc.). By=20
"fundamental" I don't mean uncorrectable - I simply mean that they're=20
"features" of the protocol as it exists today. Some may be fundamental=20=
trade-offs in global routing; I don't know.
b) The abovementioned policy issue
Some of the issues in (a) can be corrected through (b) - for example,=20
the Gao/Rexford examination of what policies can be permitted if you=20
want to ensure stable routing. Given that BGP is a strongly=20
policy-driven beast, many, many of its problems do arise from this.
> So, in the end, although we can possibly identify the root causes=20
> behind BGP problems, I'm not sure they can ever be fully ceased. OK, I=20=
> can imagine a formal language and config compiler, and one can find=20
> verification tools as well, but I can hardly imagine e.g. the sharing=20=
> of policies (although some papers write about methods how to infer the=20=
> necessary knowledge from measurements).
Agreed. I think we'll make steps, though, and I think that groups of=20
collaborating providers can probably implement some of the solutions=20
between themselves in ways that make sense.
> p.s. Sorry for the long mail :) :)
No worries - quite interesting. (to me, at least!)
-Dave