[161675] in North American Network Operators' Group
Re: Is multihoming hard? [was: DNS amplification]
daemon@ATHENA.MIT.EDU (Owen DeLong)
Sun Mar 24 22:37:01 2013
From: Owen DeLong <owen@delong.com>
In-Reply-To: <CA+TcGd-1L3HxkdtE6=E6DqjnkD2N9u+_v7wtVuJkrWTDCf5-Tg@mail.gmail.com>
Date: Sun, 24 Mar 2013 19:31:38 -0700
To: Kyle Creyts <kyle.creyts@gmail.com>
Cc: "nanog@nanog.org" <nanog@nanog.org>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org
I assume those people will not bother with any attempt to multihome in =
any form.
They are not, therefore, part of what is being discussed here.
Owen
On Mar 23, 2013, at 19:47 , Kyle Creyts <kyle.creyts@gmail.com> wrote:
> You do realize that there are quite a few people (home broadband =
subscribers?) who just "go do something else" when their internet goes =
down, right?
>=20
> There are people who don't understand the difference between "a site =
being slow" and packet-loss. For many of these people, losing internet =
service carries zero business impact, and relatively little life impact; =
they might even realize they have better things to do than watch cat =
videos or scroll through endless social media feeds.
>=20
> Will they really demand ubiquitous, unabridged connectivity?
>=20
> When?
>=20
> On Mar 23, 2013 12:58 PM, "Owen DeLong" <owen@delong.com> wrote:
> >
> >
> > On Mar 23, 2013, at 12:12 , Jimmy Hess <mysidia@gmail.com> wrote:
> >
> > > On 3/23/13, Owen DeLong <owen@delong.com> wrote:
> > >> A reliable cost-effective means for FTL signaling is a hard =
problem without
> > >> a known solution.
> > >
> > > Faster than light signalling is not merely a hard problem.
> > > Special relativity doesn't provide that information may travel =
faster
> > > than the maximum
> > > speed C. If you want to signal faster than light, then slow =
down the light.
> > >
> > >> An idiot-proof simple BGP configuration is a well known solution. =
Automating
> > >> it would be relatively simple if there were the will to do so.
> > >
> > > Logistical problems... if it's a multihomed connection, which of =
the
> > > two or three providers manages it, and gets to blame the other
> > > provider(s) when anything goes wrong: or are you gonna rely on the
> > > customer to manage it?
> > >
> >
> > The box could (pretty easily) be built with a "Primary" and =
"Secondary" port.
> >
> > The cable plugged into the primary port would go to the ISP that =
sets the
> > configuration. The cable plugged into the other port would go to an =
ISP
> > expected to accept the announcements of the prefix provided by the =
ISP
> > on the primary port.
> >
> > BFD could be used to illuminate a tri-color LED on the box for each =
port,
> > which would be green if BFD state is good and red if BFD state is =
bad.
> >
> > At that point, whichever one is red gets the blame. If they're both =
green,
> > then traffic is going via the primary and the primary gets the =
blame.
> >
> > If you absolutely have to troubleshoot which provider is broken, =
then
> > start by unplugging the secondary. If it doesn't start working in 5 =
minutes,
> > then clearly there's a problem with the primary regardless of what =
else
> > is happening.
> >
> > Lather, rinse, repeat for the secondary.
> >
> > > Someone might be able to make a protocol that lets this happen, =
which
> > > would need to detect on a per-route basis any =
performance/connectivity
> > > issues, but I would say it's not any known implementation of BGP.
> >
> > A few additional options to DHCP could actually cover it from the =
primary
> > perspective.
> >
> > For the secondary provider, it's a little more complicated, but =
could be
> > mostly automated so long as the customer identifies the primary =
provider
> > and/or provides an LOA for the authorized prefix from the primary to
> > the secondary.
> >
> > The only complexity in the secondary case is properly filtering the =
announcement
> > of the prefix assigned by the primary.
> >
> > >> 1. ISPs are actually motivated to prevent customer mobility, =
not enable it.
> > >
> > >> 2. ISPs are motivated to reduce, not increase the number of =
multi-homed
> > >> sites occupying slots in routing tables.
> > >
> > > This is not some insignificant thing. The ISPs have to =
maintain
> > > routing tables
> > > as well; ultimately the ISP's customers are in bad shape, if =
too many slots
> > > are consumed.
> > >
> >
> > I never said it was insignificant. I said that solving the =
multihoming problem
> > in this manner was trivial if there was will to do so. I also said =
that the above
> > were contributing factors in the lack of will to do so.
> >
> > > How about
> > > 3. Increased troubleshooting complexity when there are =
potential
> > > issues or complaints.
> > >
> >
> > I do not buy that it is harder to troubleshoot a basic BGP =
configuration
> > than a multi-carrier NAT-based solution that goes woefully awry.
> >
> > I'm sorry, I've done the troubleshooting on both scenarios and I =
have
> > to say that if you think NAT makes this easier, you live in a =
different
> > world than I do.
> >
> > > The concept of a "fool proof" BGP configuration is clearly a new =
sort of myth.
> >
> > Not really.
> >
> > Customer router accepts default from primary and secondary =
providers.
> > So long as default remains, primary is preferred. If primary default =
goes
> > away, secondary is preferred.
> >
> > Customer box gets prefix (via DHCP-PD or static config or whatever
> > either from primary or from RIR). Advertises prefix to both primary
> > and secondary.
> >
> > All configuration of the BGP sessions is automated within the box
> > other than static configuration of customer prefix (if static is =
desired).
> >
> > Primary/Secondary choice is made by plugging providers into the
> > Primary or Secondary port on the box.
> >
> > > The idea that the protocol on its own, with a very basic config, =
does
> > > not ever require
> > > any additional attention, to achieve expected results; where
> > > expected results include isolation from any faults with the path =
from
> > > one of of the user's two, three, or four providers, and =
balancing
> > > for optimal throughput and best latency/loss to every destination.
> >
> > I have installed these configurations at customer sites for several =
of
> > my consulting clients that wanted to multihome their SMBs.
> >
> > Some of them have been running for more than 8 years without a
> > single issue.
> >
> > For all of the above requirements, no. You can't do that with the =
most
> > advanced manual BGP configurations today.
> >
> > However, if we reduce it to:
> >
> > 1. The internet connection stays up so long as one of the two
> > providers is up.
> >
> > 2. Traffic prefers the primary provider so long as the primary =
provider
> > is up.
> >
> > 3. My addressing remains stable so long as I remain connected =
to
> > the primary provider (or if I use RIR based addressing, =
longer).
> >
> > Then what I have proposed actually is achievable, does work, and
> > does actually meet the needs of 99+% of organizations that wish to
> > multihome.
> >
> > > BGP multihoming doesn't prevent users from having issues because:
> > >
> > > o Connectivity issues that are a responsibility of one of =
their provider's
> > > That they might have expected multihoming to protect them =
against
> > > (latency, packet loss).
> >
> > Correct. However, this is true of ANY multihoming solution. The =
dual-
> > provider NAT solution certainly does NOT improve this.
> >
> > > o very Poor performance of one of their links; or poor
> > > performance of one of their
> > > links to their favorite destination
> >
> > See above.
> >
> > > o Asymmetric paths; which means that when latency or loss is =
poor,
> > > the customer doesn't necessarily know which provider to =
blame,
> > > or if both are at fault, and the providers can spend a =
lot of time
> > > blaming each other.
> >
> > See above.
> >
> > > These are all solvable problems, but at cost, and therefore not =
for
> > > massmarket lowest cost ISP service.
> >
> > My point is that the automated simple BGP solution I propose can =
provide
> > a better customer experience than the currently popular NAT-based
> > multihoming with simpler troubleshooting and lower costs.
> >
> > > It's not as if they can have
> > > "Hello, DSL technical support... did you try shutting off your
> > > other peers and retesting'?"
> >
> > ROFL.
> >
> > > The average end user won't have a clue -- they will need one of =
the
> > > providers, or someone else to be managing that for them, and
> > > understand how each provider is connected.
> >
> > Again, you're setting a much higher goal than I was.
> >
> > My goal was to do something better than what is currently being =
done.
> > (Connect a router to two providers and use NAT to choose between =
them).
> >
> > > I don't see large ISPs training up their support reps for DSL
> > > $60/month services, to handle BGP troubleshooting, and multihoming
> > > management/repair.
> >
> > But they already get stuck with this in the current NAT-based =
solution which
> > is even harder to troubleshoot and creates even more problems.
> >
> > Owen
> >
> >