IPv6 multihoming without NAT: the problem

Every time I write about IPv6 multihoming issues and the need for NPT66, I get a comment or two saying “but I thought this is already part of IPv6 stack – can’t you have two or more IPv6 addresses on the same interface?” The commentators are right, you can have multiple IPv6 addresses on the same interface; the problem is: which one do you choose for outgoing sessions.

The source address selection rules are specified in RFC 3484 (Greg translated that RFC into an easy-to-consume format a while ago), but they are not very helpful as they cannot be influenced by the CPE router. Let’s look at the details.

Phase 1 – single ISP connection

We have a simple SMB network: a single CPE router connected to one ISP and a host sitting behind the router (ignore the PE-B part for the moment). CPE router asks ISPA for a delegated prefix (using IA_PD option in DHCPv6) and uses part of that prefix to address its LAN interface.

This is how you configure the CPE router if you’re using Cisco IOS:

Simple CPE router configuration

ipv6 unicast-routing
!
interface FastEthernet0/0
 description Inside interface
 ipv6 address ISPA ::1/64
 ipv6 nd router-preference High
 ipv6 nd ra interval 10
!
interface FastEthernet1/0
 description ISP A uplink
 ipv6 address autoconfig default
 ipv6 dhcp client pd ISPA

The CPE router configuration is not complete; you would also need DHCPv6 server on the inside interface to pass DNS server IPv6 address to the clients. A complete (and tested) configuration is included in the materials you get with the Building IPv6 Service Provider Core webinar.

The IPv6 client receives RA messages sent by the CPE and creates an IPv6 address from the advertised /64 prefix on its LAN interface:

The client is now able to communicate with the IPv6 Internet. Problem solved ... until someone figures out a single upstream connection is a single point of failure and orders a second Internet service.

Phase 2 – Two ISP uplinks

The second ISP uplink is configured almost identically to the first. Since you cannot have two RA-generated default routes in Cisco IOS release 15.1M, I had to use a floating static default route and hard-code the next-hop router’s IPv6 address in it.

Simple CPE router configuration – two uplinks

ipv6 unicast-routing
!
interface FastEthernet0/0
 description Inside interface
 ipv6 address ISPA ::1/64
 ipv6 address ISPB ::1/64
 ipv6 nd router-preference High
 ipv6 nd ra interval 10
!
interface FastEthernet1/0
 description ISP A uplink
 ipv6 address autoconfig default
 ipv6 dhcp client pd ISPA
!
interface FastEthernet1/1
 description ISP B uplink
 ipv6 address autoconfig
 ipv6 dhcp client pd ISPB
!
ipv6 route ::/0 FastEthernet1/1 FE80::2 30

After the CPE router receives a delegated prefix from PE-B, it adds a /64 prefix from that address range to its LAN interface and starts advertising two /64 prefixes (one from each ISP) in its RA messages. The IPv6 client creates a second IPv6 address from the second advertised prefix – it now has two IPv6 addresses on its LAN interface.

The Problem

There are numerous problems associated with this setup, some of them architectural, many more due to suboptimal implementations, omissions, or strict adherence to RFCs in host and router stacks.

The easy one first: when an IPv6 client with multiple IPv6 addresses starts a new session, it chooses a source address that best matches the destination address (ULA address for ULA destination, global address for global destination ...) without any knowledge of the network topology.

Distributing Address Selection Policy using DHCPv6 draft describes a potential solution ... but it has to be implemented in both routers and hosts, and it’s implemented nowhere at this moment.

For example, when the IPv6 client in our small network connects to the outside world, it might choose a source IPv6 address assigned by the wrong ISP.

You can see the source address used by the client with the netstat -n -p tcpv6 (Windows) or netstat -n -f inet6 -p tcp (Linux/OSX) command. It seems that Windows picks the lowest IPv6 address while OSX picks the oldest IPv6 address when all interface IPv6 addresses are equivalent according to RFC 3484 rules.

The best that could happen is asymmetrical routing:

In some rare cases, the ISP actually performs RPF check and drops the packet with an unexpected source IPv6 address.

The whole situation might have been survivable were this the only problem to solve (and the lack of RPF checks on the ISP side causes people to claim that IPv6 multihoming works). Unfortunately, there are many others, for example:

  • When the CPE router (Cisco IOS router running 15.1(4)M) loses an uplink, it does not stop advertising the delegated prefix it received through that uplink (implementation issue). One of the client IPv6 addresses is thus completely invalid without client being aware of it.
  • If you clear the delegated prefix manually (or with an EEM applet) on the CPE router, it stops advertising the prefix in its RA messages ... but the prefix remains valid on the IPv6 hosts until it expires (architectural issue). Prefix expiration is based on its preferred lifetime, which is derived straight from DHCPv6 prefix delegation and is usually measured in weeks.
  • It might be possible to reduce the preferred lifetime in the RA messages to a very low number, but the lifetime of an interface prefix based on a delegated prefix is not configurable (implementation issue).

Please don’t try to tell me that the whole thing works if you use two CPE routers. It might work once the host stacks implement RFC 3484 bis, but we’re not there yet (and I’ll describe that scenario in an upcoming blog post).

More information and tested router configurations

Various IPv6 access- and core network designs and numerous sample configurations are included in the Building IPv6 Service Provider Core webinar (currently available only as a recording).

Do your own tests

If you want to test how your hosts behave in this scenario or try to fix my router configurations, use these configurations as a starting point:

CPE router configuration (cleaned up)

version 15.1
service timestamps debug datetime msec
service timestamps log datetime msec
no service password-encryption
!
hostname CPE
!
ipv6 unicast-routing
ipv6 cef
!
interface FastEthernet0/0
 description Inside interface
 ipv6 address ISPA ::1/64
 ipv6 address ISPB ::1/64
 ipv6 nd router-preference High
 ipv6 nd ra interval 10
!
interface FastEthernet1/0
 description ISP A uplink
 ipv6 address autoconfig default
 ipv6 dhcp client pd ISPA
!
interface FastEthernet1/1
 description ISP B uplink
 ipv6 address autoconfig
 ipv6 dhcp client pd ISPB
!
ipv6 route ::/0 FastEthernet1/1 FE80::2 30
!
line con 0
 exec-timeout 0 0
 privilege level 15
line vty 0 4
 exec-timeout 0 0
 privilege level 15
 no login
!
ntp logging
end

PE-router configuration (cleaned up)

hostname PE
!
ipv6 unicast-routing
ipv6 cef
ipv6 dhcp pool ISPA
 prefix-delegation pool ISPA
!
ipv6 dhcp pool ISPB
 prefix-delegation pool ISPB
!
interface Loopback0
 ipv6 address 2001:DB8:CAFE::1/64
!
interface FastEthernet0/0
 description ISP A interface
 ipv6 address FE80::1 link-local
 ipv6 address 2001:DB8:1:FF01::1/64
 ipv6 dhcp server ISPA
!
interface FastEthernet0/1
 description ISP B interface
 
 ipv6 address FE80::2 link-local
 ipv6 address 2001:DB8:7:FF01::1/64
 ipv6 dhcp server ISPB
!
ipv6 local pool ISPA 2001:DB8:1::/49 60
ipv6 local pool ISPB 2001:DB8:7::/49 60
!
line con 0
 exec-timeout 0 0
 privilege level 15
line vty 0 4
 exec-timeout 0 0
 privilege level 15
 no login
!
ntp logging
end

12 comments:

  1. In my mind, we can split the issues into 2 major problems:

    1) Host address selection.
    Can this be mitigated by initiating a connection from multiple addresses simulatniously and abandoning those not first-to-succeed? While inefficient, this appears to be a workable solution until smarter people take a crack at the problem.

    2) CPE next-hop selection.
    Can this be mitigated by an upstream affinity mechanism? Only send packets sourced from ISPA networks via ISPA, and B via B. Today, that means PBR source based next-hop but a technology to constrain topology based on this kind of setup should not be prohibitively complex. Things get worse with multiple CPE routers to multiple providers but we are talking small site for this specific example.

    Regarding DHCPv6 extensions to provide CPE additional information:
    If we are going to provide the CPE additional IPv6 specific information via DHCP, why not provide a list of valid source subnets as well to direct such a source-based forwarding mechanism to make the routing policy clear (ipv6 unicast reverse path forwarding, which most providers cant implement today due to well known cisco hardware deficiencies)

    While in my previous post I represented these as software deficiencies, and to clarify im including the OS network stack and ipv6 communication protocol extensions to dns and dhcp in that camp as well. The network is doing exactly what it should, providing an end to end layer 3 path between a given set of endpoints.

    There truly has to be a better way to resolve our issues with an operationally new protocol suite than try to jam it into the old model. My approach to IPv6 deployment at work has been to use it as an opportunity to kick off the shackals of legacy communications systems and force developers to take responsability for their communication. It takes significantly less effort to manipulate how software communicates than to re-architect the network infrastructure around every percieved communications problem.

    ReplyDelete
  2. Ole and Mark have described an elegant framework here:
    http://tools.ietf.org/html/draft-townsley-troan-ipv6-ce-transitioning-00
    If found both sufficient and flexible, this approach could form the basis for CPE multihoming of almost any kind.

    ReplyDelete
  3. Now we're in perfect agreement ;)

    As for the CPE job - we could do something along the lines of what needs to be done with PBR today (assuming CPE has just two default routes), a full-blown implementation is described in the draft Frank mentioned - effectively it's a VRF per WAN link with source address-based VRF selection for egress traffic.

    ReplyDelete
  4. It's great that an appreciated member of the IP world finally brings this topic into open discussion. Even the simple example presented here proves the current state of IPv6 capabilities plainly ridiculous compared to the flexibility that the all-evil IPv4 NAT is able to offer.

    We started thinking about IPv6 implementation in our enterprise network about a year ago. We have about 5000 IP hosts distributed across 100 sites in 40 countries around the world. The IPv4 implementation is a MPLS core network for corporate business applications like SAP and local Internet connectivity at each site for general youtubing and booking faces, that can also be used for backup VPN connectivity in case of MPLS failure. Private addressing inside the corporate network and IPv4 NAT when going to Internet. Works perfectly. Add to that L2L tunnels to partner organizations, centrally routed IP authenticated services etc., still no problem.

    So, with great enthusiasm I started familiarizing with IPv6 and googling for examples of enterprise addressing plans and configurations to use as a basis for the design. No match. What? Google more. Nothing. What is this? Can't be true.

    After endless googling and discussions with more experienced network experts the sad end result was what is described here and in the previous article. No can do. The greatest and extremely hyped protocol just is not capable of providing a reasonable solution for a simple low-end enterprise network.

    Fortunately we are big enough to get a /48 PI address space for the multihomed central site connectivity so that we are able to practice IPv6 in a small scale through the one exit point to Internet. Having 16 bits for subnetting allows us to have the same limited logical segmenting capability that we have with the IPv4 private 10/8 addressing currently in use. But at the same time nothing gained from the added address space. 64 bits wasted for in average 50 hosts or in median 20 hosts per site and the network structure still cramped in 16 bits. Great going, IPv6 is so brilliant and saves all our problems plus the whole world. Not.

    So what would be the options if IPv6 implementation would be an absolute necessity at this point?
    - Buy lots of expensive MPLS bandwidth to route all the world's social media through one central pipe?
    - Implement a strict corporate policy with punishment and threats to ban all generic Internet use?
    - Split the /48 PI space to /56 or even smaller subnets polluting the routing tables and probably soon discarded when the tables explode? Still excludes the possiblity for policy routing based on whatever reason.
    - Buy a truckload of high-end routers with all fancy half-standard concoctions and still route traffic through some rare conversion points (LISP etc.)?

    Fortunately we don't have immediate need for IPv6 at the moment and can just do the fun-and-play part. Just hoping for NPT66 or similar to emerge before the doomsday dawns.

    ReplyDelete
  5. Sounds like http://www.psg.com/lists/multi6/multi6.2002/msg00860.html and section 4.2 in http://tools.ietf.org/id/draft-huitema-multi6-hosts-01.txt from 2002.

    ReplyDelete
  6. There are other even more stupid options (and no good ones). Here's a particularly harmful one:

    * Get an AS number
    * Ask for a /32 (with enough "positive thinking" you might get one)
    * Split the /32 into /48s (one per site) and start announcing /48s into the Internet from remote sites.

    On the other hand, assuming source address selection does work according to RFC 3484, ULA might be a good solution to your problem.

    ReplyDelete
  7. Patrick Frejborg14 December, 2011 18:34

    Have a look on RFC6306, look for exit and approach routing in the long term routing architecture - multi-homing would become multi-pathing and there would be a placeholder for an identifier to create a session layer in the stack.

    ReplyDelete
  8. This sounds like a misuse of IPv6. Why not use an administratively scoped address for "internal communications" and provider assigned unicast global space for the local internet surfing? Wouldnt that solve all of your problems?

    ReplyDelete
  9. hsxx could probably solve his problems with ULAs, but the moment you want to have two Internet connections for redundancy (from every site) you're stuck.

    ReplyDelete
  10. Except for the fact that you can assign multiple PA blocks for every host network for internet access and the previously discussed host or network source-based provider affinity piece solved...

    Again, if you try to apply IPv4 thinking to IPv6, you will end up with the same problems. If you approach it from a new angle you will find new solutions that could potentially result in a far superior internet to what we have today.

    ReplyDelete
  11. Павел Доронин06 February, 2012 08:31

    CPE configuration

    interface FastEthernet0/0
    description Inside interface
    ipv6 address 2001::1/64
    ipv6 address 2002::1/64
    ipv6 nd router-preference High
    ipv6 nd ra interval 300 msec
    ipv6 nd ra lifetime 1 - when IPS goes down ra lifetime is only 1 second
    ipv6 nd ipv6 nd prefix 2001::1/64 5 2 - valid and preffered lifetime.
    ipv6 nd ipv6 nd prefix 2001::1/64 5 2 - not sure about particular values of this timers.

    Tested client (Cisco router with no ipv6 unicast-routing command) with CPE configuration above:
    Client port configured

    Router(config-if)#ipv6 address autoconfig

    Router#sh ipv6 int f0/0
    FastEthernet0/0 is up, line protocol is up
    IPv6 is enabled, link-local address is FE80::C200:85FF:FE00:0
    No Virtual link-local address(es):
    Global unicast address(es):
    2001::C200:85FF:FE00:0, subnet is 2001::/64 [EUI/CAL/PRE]
    valid lifetime 4 preferred lifetime 1
    2002::C200:85FF:FE00:0, subnet is 2002::/64 [EUI/CAL/PRE]
    valid lifetime 4 preferred lifetime 1
    Joined group address(es):
    FF02::1
    FF02::1:FF00:0
    MTU is 1500 bytes
    ICMP error messages limited to one every 100 milliseconds
    ICMP redirects are enabled
    ICMP unreachables are sent
    ND DAD is enabled, number of DAD attempts: 1
    ND reachable time is 30000 milliseconds
    Default router is FE80::C201:85FF:FE00:0 on FastEthernet0/0

    We can see that client autoconfigure 2 different addresses from 2 different ISP's. So this can be any client (Windows, Linux etc.), and there is no need in total static IPv6 configuration, only on CPE router. (As far as I know DNS also can be configured from RA but i do not tested it yet.)

    Router#sh ipv6 routers
    Router FE80::C201:85FF:FE00:0 on FastEthernet0/0, last update 0 min
    Hops 64, Lifetime 1 sec (not 1800 seconds), AddrFlag=0, OtherFlag=0, MTU=1500
    HomeAgentFlag=0, Preference=Medium
    Reachable time 0 msec, Retransmit time 0 msec
    Prefix 2001::/64 onlink autoconfig
    Valid lifetime 5, preferred lifetime 2
    Prefix 2002::/64 onlink autoconfig
    Valid lifetime 5, preferred lifetime 2

    For example, when the IPv6 client in our small network connects to the outside world, it might choose a source IPv6 address assigned by the wrong ISP. Use PBR: if address from ISP1 use ISP1 as next-hop, if from ISP2 use ISP2. In this case we also do not need default routes.
    And what happens if the link to ISP1 is down and the end-host chooses to use the source IP address belonging to ISP1?
    When IPS1 is down, we use IP SLA to discover this and EEM to withdrawn ISP's 1 prefix 2001::1/64 from RA. This with happens with 300ms interval and RA lifetime is only 1 second in our case, so end host can't use source address from ISP1 anymore. Correct me please if i wrong.

    ReplyDelete
  12. Павел Доронин07 February, 2012 03:24

    You can also use DHCPv6 to assign two or more IP addresses to clients.
    https://supportforums.cisco.com/message/3551460

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.