IPv6 Prefixes Longer Than /64 Might Be Harmful

A while ago I wrote a blog post about remote ND attacks, which included the idea of having /120 prefixes on server LANs. As it turns out, it was a bad idea, and as nosx pointed out in his comment: “there is quite a long list of caveats in all vendor camps regarding hardware in the last 6-8 years that has some potentially painful hardware issues regarding prefix length. Classic issues include ACL construction and TCAM specificity.

One would hope that the newly-release data center switches fare better. Fat chance!

There are several interesting issues you might encounter with IPv6-enabled data center switches (other devices performing hardware layer-3 switching might exhibit similar behavior):

  • IPv4 and IPv6 routing tables might share the same TCAM.
  • IPv6 prefixes are four times longer than IPv4 prefixes, so you’d expect a switch with shared TCAM to handle four times as many IPv4 prefixes as IPv6 prefixes. If that’s not the case, tread carefully and dig deeper into the documentation.

A few real-life examples

ToR switches were never known for their huge table sizes (to be honest, they don’t need them if you have a good design), and some of them have dismal IPv6 tables. The worst I found: Juniper’s EX4500 with 1K IPv6 entries and Arista’s 7500 (the core switch) with 2K IPv6 entries. Good luck with your large-scale dual-stack deployment!

Then there’s Cisco’s Nexus 5500 with an interesting TCAM architecture: it can have up to 16K IPv4 or IPv6 routes, and 128 longest-prefix-match IPv6 entries. The fact that the number of IPv4 routes matches the number of IPv6 prefixes tells you a lot about the matching algorithm: either half the TCAM (or more) is empty with IPv4 routes or they’re doing an exact match on the top 64 bits of IPv6 addresses, which seems to be the case as there’s a separate entry for IPv6 LPM routes in the configuration limits document.

To rephrase: Nexus 5500 can have up to 16K /64 IPv6 prefixes and up to 128 non-/64 IPv6 prefixes. It does make perfect sense, assuming your data center uses /64 prefixes internally and a few summary routes (or default routing) toward the outside world or DC core.

Finally, there are loads of DC switches where the maximum number of IPv6 prefixes is half the maximum number of IPv4 prefixes, probably indicating that their TCAM matches only the top half of IPv6 addresses (and that installing /120 or /127 prefixes into these devices might be a Really Bad Idea). Unfortunately, many vendors are not as open and straightforward as Cisco is, and forget to mention these tiny little details in their documentation.

More information

Have to mention that the Data Center Fabrics webinar contains IPv6, IPv4, MAC, ARP and ND table size information for data center switches from nine major vendors. You don’t want to know how many hours I’ve spent poring over datasheets, documentation and release notes.

The webinar is available as a recording or as part of the yearly subscription.

13 comments:

  1. i think i'd add something like "in l3 switches/dc" to the topic, coz for example, with /64 on the peering interfaces between two border routers you can have some nasty ipv6 nd miss issues.
    Replies
    1. Absolutely agree ... but if those prefixes then leak into the DC, you have an interesting challenge ;)
  2. Aha so that's why! Thanks a lot for the clarity, very great (and important) post.
  3. Hi
    So what about RFC6164 ? P2P links SHOULD use /127 mask !
    Thanx
    Replies
    1. Actually, according to the RFC, routers MUST support /127 for p2p links if they want to be compliant with 6164.

      The security implications of /64's are legitimate and it's pretty disappointing to see that the use of > /64 prefixes are harmful due to the *hardware* implementation and TCAM resources rather than actual harm with the protocol.

      I doubt there's any saving older hardware so here's hoping that future gear is built with adequate lookup resources for both v4 AND v6.
    2. I'm positive (but have no hard data) most switches I mentioned above do support /127s with CPU-based switching. As long as the amount of traffic forwarded to those prefixes is low, you'll do just fine.
    3. "As long as the amount of traffic forwarded to those prefixes is low" -- doesn't that mean you're vulnerable to DoS by hosts able to send to those prefixes? Eliminating that risk was the point of using /127 in the first place...

      Plus, that would allow DoSing any router in the path from the host to the /127, as opposed to the ping-pong or ND-exhaustion attacks that only affect the routers directly attached to the PtP link.

      It still sounds like the only real solution is to put ACLs on to restrict traffic to PtP subnets to management hosts only. Then you can use whatever prefix length you want (which should be /64 without these other concerns).
  4. Putting /64 or any other prefix lengths into the light of TCAM clockworks is an interesting point. Vendors need to somehow hack their ways around own hardware limitations, could this be one of the reasons for pushing /64s almost everywhere? (Weee, conspiracy theory! :-) )
  5. My understanding of /127s is that unfortunately Juniper has an ASIC issue which makes equipment using it susceptible to ping-pong attacks. To address this they came out with /127s. So for Juniper equipment with this issue, /127s make sense. However, if you're using other equipment I would consider sticking with /64s. Using all /64s makes things easier - no subnetting. Don't just think of network engineers but help desk and non-network people. Also, if your "point-to-point" link gets migrated to a MetroEthernet type setup you may wish to add more nodes to that "point-to-point" link. Wouldn't it be nice if it was a /64 so it wasn't an issue? IPv6 address space is unimaginably big so conservation is no longer necessary or desirable. I'm not saying you shouldn't use /127s, but I am saying think about the flexibility and simplicity you're giving up but using them. If you do use /127s you may still want to reserve a /64 just in case you change your mind later.
  6. Why are people putting global unicast space on internal transit interfaces exposing an attack surface necessitating the /64 vs /127 arguement?

    I deploy exactly 2 IPv6 configurations, one is 6VPE which is incredibly effective and reliable, the other is traditional BGP ipv6 unicast for host prefix information and OSPFv3 for loopback reachability within the AS. Both setups function fantastically and expose zero internal infrastructure to attack through the use of iacls and link-local addressing.

    IPv6 is not IPv4, and the way you go about building traditional IPv4 networks with big edge firewalls between the internet and the internal systems does NOT work well from a high-availability or effective-security-posture perspective.

    With IPv6, your network is now part of the internet, like it or not. Many of the poor design decisions set to color the industry for the next decade+ will be the result of trying to force ipv6 into the ipv4 paradigm unsuccessfully.
  7. Do we really need to use these prefixes on the interconnects? I mean in the internal network IGP will work fine with only link-local addresses, wouldn't it? And the routing table will be smaller.
    Replies
    1. If you've got good OOB access to all your routers then that should be fine. Without good OOB it can be valuable to be able to access a router if routing is only working properly up to one hop away from it.
  8. I personally came across a Cisco limitation for IPv6 access-list, when investigating the RSP720 possibility to filter using layer 4 informations (TCP|UDP ports)

    http://www.cisco.com/c/en/us/td/docs/ios-xml/ios/ipv6/command/ipv6-cr-book/ipv6-i5.html#wp1239692760
Add comment
Sidebar