Small-site multihoming in IPv6: mission impossible?

Summary: I can’t figure out how to make small-site multihoming (without BGP or PI address space) work reliably and decently fast (failover in seconds, not hours) with IPv6. I’m probably not alone.

Problem: There are cases where a small site needs (or wants) to have Internet connectivity from two ISPs without going through the hassle of getting a BGP AS number and provider-independent address space, and running BGP with both upstream ISPs.

The primary/backup scenario is very easy to implement with multiple per-interface NAT rules in IPv4 world. With some load balancing trick, you can use both links simultaneously and if you really want to stretch the envelope, you can try to deploy publicly-accessible servers (although I would try every hosting solution before pulling this stunt).

Is this realistic? Sure it is, let me give you a personal example. I usually work from home and Internet is one of my indispensable tools; it’s totally unacceptable to have no Internet connectivity for a few hours (or days). I’m positive more and more individuals and small businesses will have similar requirements.

What’s the big deal with IPv6? The IPv4 approach to this problem involves heavy use of NAT44, which allows us to control the return path (based on source IP address in the outgoing packet). As of today, there’s no production-grade NAT66 (see comments to this post), so the same principle cannot be deployed in IPv6 world.

Worst case, if we can’t make small-site multihoming work reliably with IPv6, a lot of users will be forced to go down the PI/BGP path and the Internet routing tables will explode even faster than expected (I’m describing the Internet routing problems in my Upcoming Internet Challenges webinar – register here).

Alternative approaches? Multihoming was supposed to be an integral part of IPv6 (not really, a lot of details are missing – another topic of my Upcoming Internet Challenges webinar), but maybe the following trick would work for small sites. Please share your opinions in the comments.

Could this work?

A CPE router with two uplinks will get delegated prefixes from both ISPs through DHCPv6. You can assign both prefixes to the LAN interface and your IPv6 hosts using stateless autoconfiguration (SLAAC – RFC 4862) will get an address from each delegated prefix (having multiple IPv6 addresses per interface is a standard IPv6 feature). However, the address selection rules the IPv6 hosts are suppose to use (RFC 3484) don’t take in account the path availability.

If one of the upstream links fails, your IPv6 hosts would continue using the IPv6 address from the now-unreachable address space and although the outbound traffic would be forwarded over the remaining link, the return traffic would end up in wrong AS (with the failed link to your site) and would be dropped.

Assuming DHCPv6 prefix delegation and DHCPv6 clients in CPE routers work as intended, it’s possible to detect link loss and subsequent delegated prefix loss, and revoke the IPv6 prefix from router advertisements sent to the LAN interfaces, but that might be a slow process. The minimum valid lifetime of an IPv6 prefix in ND messages used for stateless autoconfiguration is two hours to prevent denial-of-service attacks (see paragraph (e) of section 5.5.3 of RFC 4862), so it could take up to two hours for the IPv6 connectivity to be fully operational after a link loss. Not something I would be happy with.

Last but not least, unless you use some crazy EEM-triggered tricks, your IPv6 hosts will have addresses from both ISPs most of the time. Influencing address selection rules is not trivial (this is how you can do it on Linux and this is the procedure for Windows) and unless you’re pretty experienced your hosts will select one path or the other based on whatever internal decisions they make, not based on the primary/backup selection you’d like to have.

What do you think? Would the end-users who need redundant connectivity implement this kludge or would they request PI address space, BGP AS number and implement BGP (or just ask both ISPs to install static routes for their PI prefix) ... or shall we wait for NAT66?

10 comments:

  1. Multi-prefix multi-homing is somewhat tricky. See http://tools.ietf.org/html/draft-troan-multihoming-without-nat66-01

    You can get around the sending traffic out the wrong link using policy based routing (route on the source address). You have to depend on the host being able to choose a working SA/DA pair, which really requires Happy Eyeballs. http://tools.ietf.org/html/draft-wing-v6ops-happy-eyeballs-ipv6-01

    There are alternatives: SHIM6, LISP, ILNP, NAT66...

    ReplyDelete
  2. @Ivan, why didnt you mention LISP? :-(

    Multihoming a v6 site without PI space & BGP is easy peasy with LISP. You get 2 uplinks to whatever provider with whatever means, and the CPE will tell the mapping system where it's /48 (or /56 or whatever) is located. That's it, just 9 lines of configuration.

    This is real and working at this very moment.

    ReplyDelete
  3. No doubt that you have it working (I know I need to do the same ;), but will it scale if we suddenly get millions of entries on top of what's already there?

    ReplyDelete
  4. LISP multi-homing works best if everyone else is also using LISP... Slight deployment problem.

    ReplyDelete
  5. @Ole,

    Not true, LISP was designed with incremental deployment in mind. Multi-homing works fine the moment you configure two proxy-routers and have 2 uplinks. That's all.

    ReplyDelete
  6. @Ivan

    Exactly! I still cannot ping you over LISP...

    The short version: Yes, LISP is being developed explicitly to address the scaling issues we face today and in a few years. Maintaining state in RAM is cheap. There can be many many EID prefixes - and with LISP they can be aggregated heavily.The RLOC space (the current internet) is a mess anyway, LISP hopes to slow down the growth of the routing table.

    ReplyDelete
  7. The problem with multihoming is the fact that IPv4/IPv6 paths are calculated to the points of attachment, not end-nodes themselves. This effectively makes any additional layer of indirection such as LISP, unscalable, as decoupling locations from IDs brings the problem of validating the path liveness. Taking LISP-ALT architecture, we may notice that due to high aggregation and logical separation from the RLOC topology the destination EID would always be reported as reachable. Furthermore, the use of edge-aggregated (PA) space for RLOCs will also hide network failures at RLOC layer. Thus, every ITR has to probe every EID->RLOC mapping for liveness as the responding ETR has no idea whether the particular path is reachable from querying ITR. This problem is described in depth in: http://tools.ietf.org/html/draft-meyer-loc-id-implications-01 . This document also lists the site synchronization problem, which is a direct result of the "caching" and existence of multiple ingress/egress points for every LISP site.

    Adding another encapsulation layer is a serious architectural move with major implications. As I mentioned previously, LISP edge nodes are unware of the underlying RLOC topology (think of layer of indirection). Every LISP site advertises mappings to ingress entry point being unware of the paths to these points. This results in the fact that traffic load-balancing that is optimum from the edge-site perspective may appear suboptimal from the underlying Internet perspective. In other words, the traffic matrix that LISP sites require may not fit well to the underlying Internet topology. Comparing this to single-ISP networks and MPLS/BGP VPNs you may notice that MPLS TE or IGP TE could be used to optimize the "tunneled" traffic flows. However, there is no common traffic engineering scheme for the Internet.

    Another problem from the set is the statement that RLOC space is poorly aggregatable because of PIs. This is not the only reason. Optimum aggregation requires the network topology to be hierarchical, which is not the case of the Internet, which is more of a self-similar graph. Internet is only hierarchical at the edge, where provider aggregation could be implemented. However, aggregating addresses in such topologies globally is not possible for hierarchial routing.

    To summarize, effective multihoming and mobility require changing the IP routing and addressing architecture. If we continue to remain within the limitations of hierarchical addressing and PoA addressing we'll result in moving the problem from one part of the network to another, but will never get a scalable solution.

    ReplyDelete
  8. You don't need to wait for valid lifetime to expire, only preferred, and you can set that to zero immediately. Valid just means it won't tear down existing connections using that address; once it's deprecated (not preferred) no new connections will be sourced from that prefix. At least that's how it's supposed to work, feel free to report bugs.

    It's true that you don't get much control over which prefix your hosts use; there are some drafts in process to add this information to DHCPv6.

    I don't know about you, but I'm pretty sure neither my home-office nor my local small business network satisfies the current ARIN reqs for new v6 allocations: have v4 space already (nope, using v4 NAT multihoming) or satisfy the v4 requirements (25% utilization now, 50% within a year of a /24 if multihomed -- 64 hosts now, 128 hosts in a year). If you have 128 hosts in your home office, color me impressed.

    ReplyDelete
  9. I have this problem today. I have 5 Internet providers over 3 sites, and only two would do any BGP if I went that way. I have a unique local address scheme ready to go, but I can't get any ipv6 from any of my providers today, even if there was NAT66.

    So I do nothing, as I can't have unpredictable client function because if I went with my ULA scheme browsers would do weirdness with no outbound egress but getting external AAAA records from the outside dns recursive servers.

    Not even getting into infrastructure with no IPv6 support like WAAS, which my business relies on. It's a mess and depressing.

    Is it too late to go CLNS?

    ReplyDelete
  10. @Paulie

    LOL good time to remember the whole epic saga with OSI stack in early 90s :)

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.