[138812] in North American Network Operators' Group


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Re: bfd-like mechanism for LANPHY connections between providers

daemon@ATHENA.MIT.EDU (Sudeep Khuraijam)
Thu Mar 17 01:33:45 2011

From: Sudeep Khuraijam <skhuraijam@liveops.com>
To: Jeff Wheeler <jsw@inconcepts.biz>
Date: Wed, 16 Mar 2011 22:33:39 -0700
In-Reply-To: <AANLkTi=jUVP+s+qA_6oO7HRQWjC5q1J1mCtNYkccoFMN@mail.gmail.com>
Cc: "nanog@nanog.org" <nanog@nanog.org>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org


On Mar 16, 2011, at 6:05 PM, Jeff Wheeler wrote:

>>There a difference of several orders of magnitude  between BFD keepalive =
intervals  (in ms) and BGP (in seconds) with generally configurable multipl=
iers vs. >>hold  timer.
>>With Real time media and ever faster last miles, BGP hold timer may find =
itself inadequate, if not in appropriate in some cases."

>For eBGP peerings, your router must re-converge to a good state in < 9
>seconds to see an order of magnitude improvement in time-to-repair.
>This is typically not the case for transit/customer sessions."



Not so, if your goal is peer deactivation and failover.    Also you miss th=
e point.   Once the event is detected the rest of the process starts.  I am=
 talking about
event detection.    One may  want longer than a  30 second hold-timer but  =
peer state deactivated instantly on link failure.  If thats the design goal=
 AND link state is not passed through, then
   BFD BGP deactivation is a good choice.

>To make a risk/reward choice that is actually based in reality, you
>need to understand your total time to re-converge to a good state, and
>how much of that is BGP hold-time.  You should then consider whether
>changing BGP timers (with its own set of disadvantages) is more or
>less practical than using BFD.



Yes I see that and  I mentioned  "in some cases" not all or most cases.


>Let's put it another way: if CPU/FIB convergence time were not a
>significant issue, do you think vendors would be working to optimize

  This goes orthogonal to my point.  The Table size taxes, best path algori=
thms and the speed with
  which you can re-FIB  &rewrite the ASICs are constant in both the cases. =
 But thats post event.
>this process, that we would have concepts like MPLS FRR and PIC, and

Those are out of scope in the context of this thread and have completely di=
fferent roles.

>that each new router product line upgrade comes with a yet-faster CPU?


For things they can sell more licenses for such as 3DES,  keying algorithms=
 , virtual instances, other things on BGP, stuff that allow service provide=
rs to charge a lot more money
while running on common infrastructure such as MPLS  & FRR and zillion othe=
r things like stateful redundancy, higher housekeeping needs, inservice upg=
rades and anything else with a list price.   And its cheaper than the old c=
pu.

>Of course not.  Vendors would just have said, "hey, let's get
>together on a lower hold time for BGP."


Because it would be horrible code design.  Link detection is a common servi=
ce.  Besides BGP process threads can run longer than min intervals for link=
.  Vendors would have to write checkpoints within BGP
   code to come up and service link state machine.   And wait its a user co=
nfigurable checkpoint!!   So came BFD.  Write a simple state machine and ma=
ke it available to all protocols.


>As I stated, I'll change my opinion of BFD when implementations
>improve.  I understand the risk/reward situation.  You don't seem to
>get this, and as a result, your overly-simplistic view is that "BGP
>takes seconds" and "BFD takes milliseconds."

 I have no doubt that you understand your risk/reward but you don't for eve=
ry other environments.

For event detection leading to a state change leading to peer deactivation,=
  "my overly-simplistic view"  is the fact ( not as you put it, but as it w=
as written unedited).  How you want to act in response is dependent on desi=
gn.
>is that "BGP
>takes seconds" and "BFD takes milliseconds."

Thats what you read not what I wrote.   I was comparing the speed of event =
detection.

Now like I said for speed of deactivation  "BGP hold timer may find itself =
inadequate, if not in appropriate in some cases" in this same context.  But=
 as I mentioned , we don't know the pain we are trying to solve for the req=
uirements thats drove this thread in the first place.  So I simply put the =
facts and a business driver.


   BFD is no different than deactivating a peer based on link failure.  You=
r view is that there is no case for it.  My point is - it arrived yesterday=
,  its just a damn hard thing to monetize upstream in transit.


>>For a provider to require a vendor instead of RFC compliance is sinful.

>Many sins are more practical than the alternatives.
Few maybe.


--
Jeff S Wheeler <jsw@inconcepts.biz<mailto:jsw@inconcepts.biz><mailto:jsw@in=
concepts.biz<mailto:jsw@inconcepts.biz>>>
Sr Network Operator  /  Innovative Network Concepts

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[138812] in North American Network Operators' Group

Re: bfd-like mechanism for LANPHY connections between providers

daemon@ATHENA.MIT.EDU (Sudeep Khuraijam)Thu Mar 17 01:33:45 2011

daemon@ATHENA.MIT.EDU (Sudeep Khuraijam)
Thu Mar 17 01:33:45 2011