IPv6 Neighbor Discovery exhaustion attack and IPv6 subnet sizes

A few days ago I got an interesting question: “What’s your opinion on the IPv6 NDP exhaustion attack and the recommendation to use /120 instead of /64?”

I guess we all heard the fundamentalist IPv6 mantra by now: “Every subnet gets a /64.” Being a good foot soldier, I included it in my Enterprise IPv6 webinar. Time to fix that slide and admit what we also knew for a long time: IPv6 is classless and we have yet to see the mysterious device that dies in flames when sniffing a prefix longer than a /64.

Before you rush out and change all your 64s prefixes to 120s, a few words of caution:

  • You have to use /64 prefixes on subnets running SLAAC.
  • I wouldn’t be surprised if some host stacks would be broken enough to die when faced with a local prefix not equal to /64. Conclusion: use /64 on all subnets to which workstations are attached.
  • Likewise, I wouldn’t expect consumer CPE vendors to understand IPv6 can be classless. As above, use /64s in consumer environment.
  • If a layer-3 forwarding device breaks down when having a prefix longer than /64 in its IPv6 routing table, throw it away.

Jeff Wheeler proposes to use /120 on all (data center?) subnets. I never tested this idea in practice and have no clue whether common server operating systems (Linux, Windows) would work with static IPv6 addresses out of a /120 prefix. Real-life experience? Please write a comment!

As a precaution against yet-to-be-discovered bugs, you could decide to use a single /120 prefix out of a /64 prefix on server-facing subnets (if the /120 prefix fails, you can easily go back to a /64 prefix without renumbering anything else but the affected subnet).

Alternatively, you could decide to be on the safe side, use the /64 prefixes on server subnets, assign static IPv6 addresses with high bits set to zero to servers (for example, use only 2001:DB8:C001:BABE::2/64 through 2001:DB8:C001:BABE::FE/64) and deploy inbound access lists on the L3 switches dropping packets sent to IPv6 addresses outside of that range.

Last but definitely not least, using /64 prefixes on point-to-point core links (and being exposed to script kiddies) is ridiculous. Juniper formalized this line of thinking with a standard-track RFC, recommending /127 prefixes on point-to-point links. And once you leave the 64-everywhere dogma behind, you can make the final step and allocate /128s to loopback addresses (I’ve tested this in Cisco IOS – works like a charm). Welcome back to the VLSM world.

38 comments:

  1. Hello, Ivan, I enjoy your blog. In my slides, I do suggest /120 as an alternative to /64 subnets, but my true intent is to make people realize that something other than /64 should be done, where appropriate, until vendors are able to deliver more router/switch knobs to protect against this attack vector. This may be larger or smaller than a /120; but I think it gets the point across.

    It must be understood that ACLs may not protect your device. There are major-vendor boxes that will still at least reach their NDP policer (if not actually learn ND entries) when receiving packets with new source addresses on a locally-attached subnet, even if the packet will ultimately be discarded due to an ingress ACL on that interface. Operators should test their routers, because vendors absolutely do not provide reliable answers in this area -- including, again, the "big vendors" who we generally expect to Do The Right Thing.

    It is also worth mentioning that this absolutely will break IPv4 on many dual-stack routers. Most people who think they may be ready for IPv6 today are not, and this is only one of many reasons why. We need to do a better job of asking our vendors for needed improvements before we are all forced to play catch-up.
  2. Hi Jeff! Nice to hear from you and thanks for the feedback!

    "ACL won't protect you" - amazing how broken things can get. I always assumed input ACL is the very first thing checked by a L3 device. Am I right in assuming that hitting this particular bug would require the attack to be an inside job (pwned server) ... or the attacker targeting your WAN link?

    "This will break IPv4 on many dual-stack routers" - just to clarify for everyone else reading the comments: I'm assuming you're saying NDP exhaustion attack also breaks IPv4 on those devices that use common v4/v6 L3 adjacency entries. Using /120 on a subnet should not impact IPv4 at all.
  3. It's a religious thing. /127 is OK for router interconnecting segments, but not all equipment supports it. /120 is useless, or use /127 or use /64.

    If you are running any dynamic routing protocol routes are pointing to link-local addresses of neighbor interfaces anyway - so assigning global IPv6 address to router interface is for troubleshooting purposes only - and here comes loopbacks - but problem is that with loopbacks only you don't see exact egress interface, but always loopback of the router. This might suck a bit in some corner cases.

    Use with caution (as everything else) if in doubt :)
  4. I wouldn't run BGP over LLA (although you can supposedly make it work).

    LLA are a pain if you're trying to figure out the exact path across the network with traceroute.

    Also, you might not be able to do hop-by-hop telnetting with LLA if your IGP breaks down (not that telnet to LLA would not work, sometimes you don't have your neighbor's LLA in your ND cache).
  5. Dear Ivan,
    In my opinion implementing IPv6 must not become a burden for network admin, so I try to implement IPv6 in easier way for me. Currently implementing dual-stack network, I try to match IPv6 address assignment with existing IPv4, so it will be easy for admin to know which is which. Before adding IP6 PTR record, all traceroute looks cryptic to me, so by doing that I can compare whether my network is working properly.

    I also try to take advantages of IPv6: /64 is huge, so no need for renumbering and change subnets such as IPv4. I only need to remember 3 kind of allocation: /32 or /48 for one organization, /64 for LAN subnets, and /128 for loopback. That's it.
  6. The biggest hurdle to getting this kind of thing changed will be the service providers. Even on private MPLS L3 VPNs there are prefix size limits applied on the PE. The carrier wont carry or transport prefixes longer than /64. If you want to troubleshoot reachability of those p2p links you will be dissapointed.

    For the internet service its even more draconian, and they only permit /48's. There is a table on wikipedia thats kept fairly up to date regarding the ipv6 routing policies of many large carriers.

    There is alot of speculation about IPv6 but little hard documentation. There are no good reference designs for a global enterprise. There are routing symmetry issues revolving around security and other services that are today the network is providing. Other technology like network-based IDS/IPS, data loss prevention, and web content filtering fall apart quickly as well.

    We are activly deploying IPv6 in a global infrastructure and facing many serious issues. Neither Cisco, nor ATT nor our other vendors and carriers (even with their professional services groups) have good answers to offer at this time for all of the core issues that still exist with ipv6, and they all disagree on what the best practices are. AT&T wants /64's while Sprint wants 127's. Its all just the tip of a very large ice berg.
  7. If we're talking about public Internet, I have nothing against enforcing /48 as the minimum prefix. Go to RIPE/ARIN/*, ask for a PI /32 and allocate a /48 to every site.

    Enforcing prefix lengths in an MPLS/VPN network is plain stupid (or maybe your SP bought those mysterious boxes that self-destruct on receiving a longer prefix). The SP should not enforce the content (including prefix lengths), but just the maximum number of prefixes accepted from a site or total # of prefixes in a VRF.

    Routing symmetry between private and public networks across firewalls ... nightmare! Right now we're working with a customer with similar issues and will probably make it work, but it will be way more complex than NAT would have been.
  8. Everybody is screaming that we are out of IPV4 and we need to implement IPV6. (including me)
    But how are we supposed to implement this if after 10+ Years there is still no rock solid standard?
  9. FWIW, I labbed out a neighbor cache exhaustion attack on a remote /64 and wasn't able to get an 1811 (running IOS 12.4T) to hold more than a few dozen incomplete ND entries at a time. Not sure how this affects valid ND entries but it doesn't seem like a big deal.

    That said, there's no reason not to use /127s on point-to-point links. If a device doesn't support /127s, it's a vendor problem, not a design problem.
  10. I think its important to remember that most providers arent going to re-buy every PE on their network to support IPv6 and that there is quite a long list of caveats in all vendor camps regarding hardware in the last 6-8 years that has some potentially painful hardware issues regarding prefix length. Classic issues include ACL construction and TCAM specificity.

    Given the world they operate in today, until the hardware is completely refreshed over a period of years, those restrictions are probablly going to remain in place. These are not small and stupid telco's either, including AT&T.
  11. Wouldnt IPv6 best practices like iACLs and unadvertised PA space for p2p links mitigate the risk for now?
  12. BGP is another thing, but OSPF-v3 points to LLA of neighbour by default. I agree you need global IPv6 addresses as identifier, if not only for jumping from router to router also for troubleshooting purposes.
  13. @stretch: software-based platforms are not a problem. ASIC-based L3 switches are.

    @nosx: iACLs don't work on all platforms (see Jeff's comment below). Unadvertised PA space definitely helps.
  14. Just for the record, I was configuring /127 for point-to-point core links and /128 for loopbacks on JUNOS (8.x, 9.x, don't remember if I used 7.x too) some years ago and it worked as expected... in a VLSM world.

    Slowly we are refining IPv6 myths and best practices. IMO this makes IPv6 world more sane (/64 loopback...?!) and less fundamentalist.
  15. Is the "man-in-the-middle" a new stuff!?
    If a "close friend" has layer-2 access to your switch -
    you are in trouble w/o IPv6.

    all layer-2/3 switch security must de used and/or adopted for IPv6 too.
    like for nowdays v4 networks - 802.1x/ARP-Inspect/DHCP-Snoop
    the RA Guard is still for Cat6500 only ;(

    Moreover, I'd say SLAAC is good for PoC labs - or maybe you have a
    SLAAC/WinXP solution; otherwise for usual static & dhcp setup you'd be able to
    protect your networks
    BTW
    Since IOS 12.3T I have been using /128 for loopbacks
  16. "have no clue whether common server operating systems (Linux, Windows) would work with static IPv6 addresses out of a /120 prefix. Real-life experience?"

    Not actually answering your request :-) I did a quick lab on a 2901 with 15.1(3)T and a Win7 on the other end. The DHCP-server implementation at the Cisco end gives out addresses with /120 defined in the address prefix section and managed-config flag defined towards the Win7. ND/RD works fine, pings normally, did a wireshar cap at W7 everything goes by the book.
    No idea what happens with W2003 or W2008 but I would suppose they'd work fine too.

    Excellent blog, keep up the good work.
  17. None of the problems we have with IPv6 are new, but the vendors (and the IETF community at large) failed to retrofit the IPv4 security enhancements and features into IPv6 stack and products.

    Today, when we would have to start serious IPv6 deployments, we're faced with "what do you mean you don't have feature XXX in IPv6" revelations.
  18. You can staticly use a /120 on Windows server 2008R2 and Win7.
  19. I Just did a similar test as Strech described. I used an Juniper MX480, generating ~380 pps for random hosts in a subnet on that router, resulting in lots of Neigbour discovery packets.

    I wasn't able to get more than ~250 incomplete entries in the neigbour cache.
    At the same time learning new, valid ND, entries didn't seem to be an issue.

    So although it's a problem in theory, it seems that most implementations, i.e. Cisco (as described by Strech) and Juniper limit the effects of such an attack.
  20. I like your thinking, I've been following this every since a design where we mapped 10.x.y.0/24 subnets by mapping the 2nd and 3rd octets to the end of a /48. For infrastructure /30 links or /32 loopbacks, sticking in a bunch of zeros after the /64 and making the last byte match IPv4 works for /126 and /128's. (We designed the IPv4 addressing from scratch to summarize so mapping was not importing a very legacy IPv4 addressing scheme into IPv6.)

    Note RFC 3627 http://tools.ietf.org/html/rfc3627 section 5, the part about u/l bits being zero. I'm not sure how likely those bits are to get used, I'm tracking IPv6 but not super-closely.
  21. WinServer 2008 and win 7 have a stack rewrite that treat IPv6 as a priority over 4.
    What about just using LLs for the infrastructure links?
    The other issues are possible fragmentation attacks/path mtu poison at end nodes?
    Checksum related attacks, IPv6 forces for UDP etc, router resource issue?
    Header extension stacking processing?
    Scaling at the application to asic level. IPv6 uses 128 bit addresses these have to be split for 64 bit architectures = added cpu cycles vs. IPv4 S/D addresses fit nicely into one 64bit word.
    and more to come.
  22. I also used /128 for loopbacks for years I was testing 6PE and 6VPE...

    /128 are not announced as such by OSPFv3.

    If I do remember correctly the /128 loopbacks are advertised as stub networks by OSPFv3 are advertised as /64... I am quite but not 100% sure for the /64 but not /128 for sure !
  23. Loopbacks having /128 addresses work correctly with OSPFv3 in IOS release 15.0M. They are advertised as prefix link states with /128 prefix length.
  24. I did a similar to stretch and Andree test on a c7609 and a CRS-1 . Neither boxes seemed fussed about it. I could see a few dozen incomplete entries in the ND cache which expired in a few seconds. My question is: has anyone managed to cause any sort of harm to a L3 device with that sort of attack?
  25. so, Jeff, have you actually successfully deployed the attack? If so, which box was vulnerable?
  26. How many spoofed packet-per-second were you sending?
  27. ~30pps for ~4minutes which I can of course increase
  28. ~30pps for ~4minutes
  29. That's way too little. Try with a few thousand pps (duration doesn't matter that much). Each entry times out in three seconds (default values from RFC).
  30. i was going to try the 1k pps suggested by Jeff Wheeler but I was expecting to see some "discomfort" in the switch at a much lower rate. I'd already gone up to 100pps but my pc started becoming unersponsive :D .
  31. The switch works fine as long as it doesn't run out of TCAM. 100 pps is nothing even when CPU-switched. You might want, however, to either change your testing tool (as a last resort you could take my PERL flood program as the starting point) or your PC ;)
  32. i've modified my script and i can send ~500pps . the c7609 went from 1->10% cpu util. I was wondering though: sure, lets change all p2p links to /127 or whatever, but what about datacenter switches and their vlans? Should we configure longer prefixes adn forget about SLAAC?
  33. I don't think using SLAAC on server segments is the best idea there is.
  34. ok, that was supposed to be 2 different things :)
    a. in order to have a sane addressing plan, we've decided long ago to allocate /64 subnets
    b. SLAAC is desirable sometimes (workstation LANs etc)
  35. (A) Never understood why /64-everywhere would make anyone saner ... but who am I to judge that, seen too many weird things in my life to remain sane :-P

    (B) You mentioned SLAAC and data center switches in (almost) the same sentence ;)
  36. Cool! it was a long time I did not try this... long time before 15.0M !

    thanks Ivan
  37. Hi Ivan,

    I am just curious about the real potential of such attack.

    When a resolution is performed with ND default values, a ND entry is created in
    the state INCOMPLETE and a NS is sent. If no NA reply is received after
    RetransTimer milliseconds (default: 1 second) it should then retransmit a NS
    maximum MAX_MULTICAST_SOLICIT (default: 3) times. Then the entry is cleared from
    the cache.

    So the entry will not stay in the table more than 3 seconds before it is cleared.

    For sure if an attacker keep on scanning, it will fill the table faster than the
    table will be purged. But it will take some time to fill up the table
    and the attack must be quite continuous without interruption or entries will be
    deleted automatically.

    This means that it should not be difficult to detect and to isolate the attacker.

    If it comes from the outside it must pass firewalls which should be able to
    manage this and take appropriate action at least to mitigate so it will not be
    able to do much harm if it cannot block it.

    If it is local, an IDS capable of detecting port scan and other attacks should
    also be able to isolate the attacker.

    So is it really such a big threat ?

    Fred
  38. The intruder can definitely cause short-term damage before you isolate him. Unless you have automatic detection/filtering mechanisms, it can take a while to figure out what's going on. Also, he can hit you from numerous source IPv6 addresses (admittedly limited to a single /64 if his ISP is doing a good job).

    Firewalls should be able to protect you if they allow access only to specific IPv6 addresses. If you use something along the lines of "permit tcp any any eq 80" you're toast.
    Replies
    1. Hi Ivan,

      One small doubt. Current, we have device with ipv6 prefix 64 and facing neighbor cache exhaustion while generate TCP/IPv6 syn attack from one of the device (using netwox simulation tool for send syn packets from random sources).

      If we assign ipv6 prefix 112s or 120s, somehow we could able to resolve this cache exhaustion and legitimate user able to access our device properly.

      So, Is there any other way to resolve this neighbor cache exhaustion instead of reducing subnet size from 64s to 112s/120s & add one router prior to our device for accept specific source subnets & restrict remaining. ?

      Thanks Kumar

    2. The easiest way to solve that challenge would be with ingress access lists on the switch/router.

Add comment
Sidebar