[89023] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: shim6 @ NANOG (forwarded note from John Payne)

daemon@ATHENA.MIT.EDU (Kevin Day)
Wed Mar 1 02:53:49 2006

In-Reply-To: <63799F06-3544-470F-B6A7-86F85ED38DA1@isc.org>
Cc: Randy Bush <randy@psg.com>, NANOG list <nanog@nanog.org>
From: Kevin Day <toasty@dragondata.com>
Date: Wed, 1 Mar 2006 01:56:14 -0600
To: Joe Abley <jabley@isc.org>
Errors-To: owner-nanog@merit.edu



On Mar 1, 2006, at 12:47 AM, Joe Abley wrote:
>
>>   o a small to medium multi-homed tier-n isp
>
> A small-to-medium, multi-homed, tier-n ISP can get PI space from  
> their RIR, and don't need to worry about shim6 at all. Ditto larger  
> ISPs, up to and including the largest.
>

If you include "Web hosting company" in your definition of ISP,  
that's not true. Unless you're providing connectivity to 200 or more  
networks, you can't get a /32. If all of your use is internal(fully  
managed hosting) or aren't selling leased lines or anything, you are  
not considered an LIR by the current IPv6 policies.

Even the proposed ARIN 2006-4 assignment policy for "end sites"  
doesn't help a lot of small to mid sized hosting companies. For that,  
to just get a /48, you need to already have a /19 or larger, and be  
using 80% of that. That's 6553 IPs being utilized. If you're running  
a managed hosting company (name based vhosts) and deploying 1 IP per  
web server, you're pretty huge before you've hit 6553 devices. Even  
assuming 20% of that is wasted, you're still talking about more than  
5000 servers. 40 1U servers per rack, you need to have 125 racks of  
packed to the gills servers before you'd qualify for PI space. That  
excludes every definition I have of "small-to-medium" in the hosting  
arena.

You don't get PI space, and Shim6 is looking like your only  
alternative for multihoming.
>
> Content providers have a different set of problems, since a server  
> with N simultaneously-active clients, each with an average of M  
> available locators needs to deal with N*M worth of state, which is  
> presumably M times worse than the situation today.
>
> For very large content providers, aggregating very large numbers of  
> simultaneous clients through load balancers or other middleboxes,  
> this is quite possibly not something that is going to be a simple  
> matter of upgrading to a shim6-capable firmware release.
>

Yes, and content providers have other issues as well when it comes to  
IPv6 policy... I'm betting only the top 1 or 2 CDN/content providers  
out there qualify for a /32. Many content providers set up multiple  
non-interconnected POPs in different geographical locations. The only  
way this can be accomplished is by making separate announcements in  
each POP for each space. This means either being able to deaggregate,  
or to get a block for each POP. I don't know of *ANY* that are  
deploying 5000+ servers per POP.



> Actually, I think the problem with shim6 is that there are far too  
> few operators involved in designing it. This has evidently led to a  
> widespread perception of an ivory tower with a moat around it.

I think the issue was... When I first heard of shim6, I thought  
"Oooh, that's really clever. A lot of small businesses/enterprises  
will use that, they don't need to deal with BGP, adding a new  
provider is just a drop in." Then when we got to deploying IPv6 the  
discovery of "Oh, wait, they expect EVERYONE who uses PA space to do  
this? That's not cool." was a negative reaction.

> To gain real relevance it needs to be deployed; to be deployed, it  
> needs to be embraced by enterprise operators and content providers.
>
> If these operators dismiss it out of hand on principal, and refuse  
> to actually find out whether the general approach is able to solve  
> problems or not, then irrelevance does indeed seem inevitable.  
> However, the only alternative on the table is a v6 swamp.
>
> How about some actual technical complaints about shim6?

I'm just one guy, one ASN, and one content/hosting network. But I can  
tell you that to switch to using shim6 instead of BGP speaking would  
be a complete overhaul of how we do things.

Putting routing decisions in the control of servers we don't operate  
scares me. I wouldn't rely on 90% of our customers to get this right  
unless it was completely idiot proof. Even if it was, I don't see how  
we can trust that users aren't messing with things to "game the  
system" somehow.

We deal with long lived TCP sessions (hours/days). I don't see how  
routing updates can happen that won't result in a disconnect/ 
reconnect, which isn't acceptable. With current BGP technologies, if  
I need to move traffic off a transit port, I can do so without  
relying on all of our servers to know anything about it, the move is  
instant, and non-disruptive. Shim6 requires a keepalive to expire for  
the end nodes to realize something is broken, then re-negotiate the  
remaining routing decisions. With BGP, I can see if one of my transit  
links goes down directly, and compensate before users start getting  
impatient.

We have peering arrangements with about 120 ASNs. How do we mix BGP  
IPv6 peering and Shim6 for transit?

So far it looks like Shim6 is going to rely on DNS. The DNS caching  
issue is a real problem. We need changes to happen faster than DNS  
caching will allow.

Our network is complicated. We have a /21 that's split into 4 /23s.  
One for each non-interconnected POP. We only advertise the /23 for  
each POP out to transit, but we give peers access to our entire  
network wherever they peer with us and we pay to haul/tunnel it  
around. How do we even do this without PI space, let alone through  
shim6?

For quite the foreseeable future, we'd be running IPv4 and IPv6 at  
the same time, over the same transit connections. We'd have to TE our  
IPv6 bits completely differently than our IPv4 bits, even though we'd  
be billed for the aggregate usage of both. Automated tools for  
tweaking total usage per transit port is hard enough in BGP. Having  
to tweak both BGP and some external shim6 method of TE when the goal  
is a common aggregate number is going to be a very difficult issue.

Some of our applications are extremely sensitive to jitter/latency.  
We've spent ages tweaking route-maps manually (and through automated  
continual tweaking) to make sure we avoid any congested links. We  
also rely on BGP communities by our providers to give us some more  
information when it comes to route decisions. (If NSP A tells me  
through communities that they peer directly with someone, where NSP B  
is crossing the country, then hitting another NSP before the Origin  
ASN, we prefer NSP A). I don't see how information like this, or  
tweaking to that level is even possible with Shim6. BGP works well  
for applications like this because each network the traffic passes  
through can add its own hints (Communities, prepending, etc) to the  
route, that lots of us use.

We'd still be relying on PA space. No matter how great dhcp6 is,  
there will be significant renumbering pain when providers are  
changed. Static ACLs, firewall rules, etc. If you're including  
customer machines in the renumbering, many simply won't do it.

Putting the logic behind traffic engineering and routing decisions  
into thousands of boxes seems a step backwards from putting the  
decision on our border/edges. Many more places where things can  
break. If we want to do things in a non-standard way, every box has  
to support it. If there are refinements to Shim6 later, we're forced  
with either not using them, or forcing our customers to upgrade their  
OS.

How do we deal with "backup connections"? I.e. connections that are  
only used if all others are down.  Right now we advertise only a  
supernet out to our "backup transit" provider, and the more specifics  
to our main providers. (Yes, I realize this isn't perfect, but it  
works fine for us.)



Please don't get me wrong, I think Shim6 is great for a lot of  
people. Being able to let ANYONE multihome with no impact on the  
world is great. BUT, there needs to be a fallback to the BGP/IPv4-ish  
way for people who need the "power user" set of tools, or there is  
going to be a huge pushback from a lot of groups when asked to switch  
to ipv6. This fallback has to be available to anyone who can justify  
the need, not just "anyone bigger than X size".


-- Kevin


home help back first fref pref prev next nref lref last post