[89023] in North American Network Operators' Group
Re: shim6 @ NANOG (forwarded note from John Payne)
daemon@ATHENA.MIT.EDU (Kevin Day)
Wed Mar 1 02:53:49 2006
In-Reply-To: <63799F06-3544-470F-B6A7-86F85ED38DA1@isc.org>
Cc: Randy Bush <randy@psg.com>, NANOG list <nanog@nanog.org>
From: Kevin Day <toasty@dragondata.com>
Date: Wed, 1 Mar 2006 01:56:14 -0600
To: Joe Abley <jabley@isc.org>
Errors-To: owner-nanog@merit.edu
On Mar 1, 2006, at 12:47 AM, Joe Abley wrote:
>
>> o a small to medium multi-homed tier-n isp
>
> A small-to-medium, multi-homed, tier-n ISP can get PI space from
> their RIR, and don't need to worry about shim6 at all. Ditto larger
> ISPs, up to and including the largest.
>
If you include "Web hosting company" in your definition of ISP,
that's not true. Unless you're providing connectivity to 200 or more
networks, you can't get a /32. If all of your use is internal(fully
managed hosting) or aren't selling leased lines or anything, you are
not considered an LIR by the current IPv6 policies.
Even the proposed ARIN 2006-4 assignment policy for "end sites"
doesn't help a lot of small to mid sized hosting companies. For that,
to just get a /48, you need to already have a /19 or larger, and be
using 80% of that. That's 6553 IPs being utilized. If you're running
a managed hosting company (name based vhosts) and deploying 1 IP per
web server, you're pretty huge before you've hit 6553 devices. Even
assuming 20% of that is wasted, you're still talking about more than
5000 servers. 40 1U servers per rack, you need to have 125 racks of
packed to the gills servers before you'd qualify for PI space. That
excludes every definition I have of "small-to-medium" in the hosting
arena.
You don't get PI space, and Shim6 is looking like your only
alternative for multihoming.
>
> Content providers have a different set of problems, since a server
> with N simultaneously-active clients, each with an average of M
> available locators needs to deal with N*M worth of state, which is
> presumably M times worse than the situation today.
>
> For very large content providers, aggregating very large numbers of
> simultaneous clients through load balancers or other middleboxes,
> this is quite possibly not something that is going to be a simple
> matter of upgrading to a shim6-capable firmware release.
>
Yes, and content providers have other issues as well when it comes to
IPv6 policy... I'm betting only the top 1 or 2 CDN/content providers
out there qualify for a /32. Many content providers set up multiple
non-interconnected POPs in different geographical locations. The only
way this can be accomplished is by making separate announcements in
each POP for each space. This means either being able to deaggregate,
or to get a block for each POP. I don't know of *ANY* that are
deploying 5000+ servers per POP.
> Actually, I think the problem with shim6 is that there are far too
> few operators involved in designing it. This has evidently led to a
> widespread perception of an ivory tower with a moat around it.
I think the issue was... When I first heard of shim6, I thought
"Oooh, that's really clever. A lot of small businesses/enterprises
will use that, they don't need to deal with BGP, adding a new
provider is just a drop in." Then when we got to deploying IPv6 the
discovery of "Oh, wait, they expect EVERYONE who uses PA space to do
this? That's not cool." was a negative reaction.
> To gain real relevance it needs to be deployed; to be deployed, it
> needs to be embraced by enterprise operators and content providers.
>
> If these operators dismiss it out of hand on principal, and refuse
> to actually find out whether the general approach is able to solve
> problems or not, then irrelevance does indeed seem inevitable.
> However, the only alternative on the table is a v6 swamp.
>
> How about some actual technical complaints about shim6?
I'm just one guy, one ASN, and one content/hosting network. But I can
tell you that to switch to using shim6 instead of BGP speaking would
be a complete overhaul of how we do things.
Putting routing decisions in the control of servers we don't operate
scares me. I wouldn't rely on 90% of our customers to get this right
unless it was completely idiot proof. Even if it was, I don't see how
we can trust that users aren't messing with things to "game the
system" somehow.
We deal with long lived TCP sessions (hours/days). I don't see how
routing updates can happen that won't result in a disconnect/
reconnect, which isn't acceptable. With current BGP technologies, if
I need to move traffic off a transit port, I can do so without
relying on all of our servers to know anything about it, the move is
instant, and non-disruptive. Shim6 requires a keepalive to expire for
the end nodes to realize something is broken, then re-negotiate the
remaining routing decisions. With BGP, I can see if one of my transit
links goes down directly, and compensate before users start getting
impatient.
We have peering arrangements with about 120 ASNs. How do we mix BGP
IPv6 peering and Shim6 for transit?
So far it looks like Shim6 is going to rely on DNS. The DNS caching
issue is a real problem. We need changes to happen faster than DNS
caching will allow.
Our network is complicated. We have a /21 that's split into 4 /23s.
One for each non-interconnected POP. We only advertise the /23 for
each POP out to transit, but we give peers access to our entire
network wherever they peer with us and we pay to haul/tunnel it
around. How do we even do this without PI space, let alone through
shim6?
For quite the foreseeable future, we'd be running IPv4 and IPv6 at
the same time, over the same transit connections. We'd have to TE our
IPv6 bits completely differently than our IPv4 bits, even though we'd
be billed for the aggregate usage of both. Automated tools for
tweaking total usage per transit port is hard enough in BGP. Having
to tweak both BGP and some external shim6 method of TE when the goal
is a common aggregate number is going to be a very difficult issue.
Some of our applications are extremely sensitive to jitter/latency.
We've spent ages tweaking route-maps manually (and through automated
continual tweaking) to make sure we avoid any congested links. We
also rely on BGP communities by our providers to give us some more
information when it comes to route decisions. (If NSP A tells me
through communities that they peer directly with someone, where NSP B
is crossing the country, then hitting another NSP before the Origin
ASN, we prefer NSP A). I don't see how information like this, or
tweaking to that level is even possible with Shim6. BGP works well
for applications like this because each network the traffic passes
through can add its own hints (Communities, prepending, etc) to the
route, that lots of us use.
We'd still be relying on PA space. No matter how great dhcp6 is,
there will be significant renumbering pain when providers are
changed. Static ACLs, firewall rules, etc. If you're including
customer machines in the renumbering, many simply won't do it.
Putting the logic behind traffic engineering and routing decisions
into thousands of boxes seems a step backwards from putting the
decision on our border/edges. Many more places where things can
break. If we want to do things in a non-standard way, every box has
to support it. If there are refinements to Shim6 later, we're forced
with either not using them, or forcing our customers to upgrade their
OS.
How do we deal with "backup connections"? I.e. connections that are
only used if all others are down. Right now we advertise only a
supernet out to our "backup transit" provider, and the more specifics
to our main providers. (Yes, I realize this isn't perfect, but it
works fine for us.)
Please don't get me wrong, I think Shim6 is great for a lot of
people. Being able to let ANYONE multihome with no impact on the
world is great. BUT, there needs to be a fallback to the BGP/IPv4-ish
way for people who need the "power user" set of tools, or there is
going to be a huge pushback from a lot of groups when asked to switch
to ipv6. This fallback has to be available to anyone who can justify
the need, not just "anyone bigger than X size".
-- Kevin