Coming Full Circle on IPv6 Address Length

In the Future of Networking with Fred Baker Fred mentioned an interesting IPv6 deployment scenario: give a /64 prefix to every server to support container deployment, and run routing protocols between servers and ToR switches to advertise the /64 prefix to the data center fabric preferably using link-local addresses.

Let’s recap:

  • We just turned 128-bit IPv6 addresses into 64-bit endpoint identifiers.

Do I have to mention that the original IPv6 proposal had 64-bit addresses, and they added the extra 64 bits to support IPX-style auto-configuration?

  • Endpoint address is assigned to a node, not to an interface.
  • Endpoints use a node-to-router protocol to advertise their endpoint address.
  • All routers within a domain advertise individual endpoint addresses.
  • Endpoint addresses are summarized into a larger prefix at the routing domain boundary.

Hooray, yet again we reinvented CLNP. We might have used it 25 years ago instead of inventing a new protocol.

Note: in case you’re still wondering what this IPv6 thing is all about, check out my IPv6 content.

10 comments:

  1. What a wonderful idea! Do all vendors support iBGP over link-local addresses already? Or we're supposed to use eBGP?
    Replies
    1. What about RIP?
      Simple RIP should be enough for this purpose (advertise /64, receive a default)
    2. Of course it would do, but RIP is so 1990s ;) ... aka the days when some server admins still understood networking.
  2. Hi Ivan

    We run dynamic ibgp/ebgp peering with servers downstream on both IPv4/IPv6 on tors .
    Dynamic bgp on ipv4: /26
    Dynamic bgp on IPv6 : /64
    We don't use link local for this IPv6 peering . We use global addressing for this dynamic range peering.
    Neighbor 2001::/64 sample config

    Backend to the servers : dynamic IPv6 ibgp peering for /64. The rest /64 would be eui64 format.
    Bgppeering on servers runs on static configuration because range cannot form neighborhsip on both sides.

    Link local is fine but we need dynamic bgp peering and hence needed global addressing and server address would be in the same range as on th vlan /64 address.

    Seamless running on bgp.

    Link local is link specific and may not be useful for dynamic peering.




  3. It's called Internet Logical Addressing (ILA)
    https://www.youtube.com/watch?v=AZ1gRPUyklw
    Replies
    1. The 8+8 split is not unique to ILA, to be fair - the idea has been re-used multiple times in few proposals :)

      In basic (probably, most common) applicable scenario one does not even need any routing protocol - the ToR switches can be configured with /64 static routes pointing to servers, which, in turn, have static link-local addresses. The ToR does summarize /64's into shorter block, and so on.

      The major benefit is being able to allocate IP per process/container/etc. I think one of the Google's paper was open to admit that going with IP per box for Borg and then juggling available ports per process/container was a major pain.
    2. @petr static routes aren't scalable rite. Tor can configure static routes pointing to server link local address and if we have 20 servers downstream and we need to have 20 static routes on the tor.and tor has to load share traffic downstream based on 20 static routes.
      Does this affect dynamic bgp based multipathing .
      Or we have static routes towards control servers and load share it so that actual data content balancing happens on control to data servers.
    3. @petr bgp downstream peering also shares multipathing towards control servers and control servers takes care of data path forwarding.
      And also static routes pointing to link local address is bit tedious as it requires the nd cache population to receives servers link local address which would be fe80::/10 and adds 48 bits + FFFE format . It's best to have dynamic bgp peering to advertise content blocks upstream from bgp and form bgp peering with control servers
  4. worked fine for us :) we considered using BGP injection, but this adds additional component to deploy and monitor at every server, which is more overhead. there is no scaling issues at all, state is static, pre-configured, and aggregated. management churn was mainly on provisioning side - making sure servers and switches are rebuilt with proper configs, but that was mostly a one-time thing to solve. For link locals - those can be also statically configured, say by encoding the server# within a rack.
    Replies
    1. Sure petr configuring static link locals on all the servers is time consuming and includes fault mgmt manual provisioning as you said . rather than using plug and play IPv6 link locals.
      But that's a good point that you mentioned for bgp overhead which is always a separate control plane component .
      How many servers are there per rack and how do you provision manual link local on servers starting with FE80/10 and the rest with your own addressing .
Add comment
Sidebar