Response: NAT Traversal Mess
Let’s look at another part of the lengthy comment Bob left after listening to the Rise of NAT podcast. This one is focused on the NAT traversal mess:
You mentioned that only video-conferencing and BitTorrent use client-to-client connectivity (and they are indeed the main use cases), but hell, do they need to engineer complex systems to circumvent these NATs and firewalls: STUN, TURN, ICE, DHT…
Cleaning up the acronym list first: DHT is unlike the others and has nothing to do with NAT.
Now that we’re left with three acronyms, let’s try to figure out what they do:
- STUN detects NAT devices in the forwarding path and uses clever tricks to create the NAT translations that can ultimately be used to reach a peer node behind another NAT device.
- STUN does not work with all types of NAT. In particular, it does not work with symmetric NAT (NAT using 5-tuple NAT translations – what you’d see on Cisco IOS). TURN tries to fix that.
- ICE seems to be an umbrella solution on top of the other two (please write a comment if I got it wrong)
Wouldn’t it be better if, instead of the above mess, the host could tell the NAT device it needs a public port? Of course, we have at least three protocols to do that, proving yet again the infinite wisdom of xkcd:
- Internet Gateway Device Protocol part of Universal Plug and
PrayPlay - Port Control Protocol
- NAT Port Mapping Protocol
Why do we need STUN/TURN/ICE if we have a (supposedly) working solution? It often comes down to I don’t want to deal with those uncouth people from the IT basement (aka “networking engineers”), a well-known strategy used by the likes of Novell and VMware, or to I want to make this work even though some stupid security people think it should be blocked (see also: using Signal to plan bomb strikes).
I never said NAT is a great solution (it’s not), but it’s still a necessary evil (even in the IPv6 world) that has to be dealt with. There are standard ways to do it, and fortunately, we have tons of libraries you can use to get the job done without going into the details. A quick search for “Python [STUN|TURN|ICE] NAT” resulted in a half-dozen GitHub/PyPi projects. Ideally, we’d have a single library that everyone uses to get the job done (like OpenSSL1 and OpenSSH), but maybe we’re not at that stage yet.
Last but not least, networking engineers love to think that networking’s complexities are unique. Well, I can point you to numerous other IT disciplines full of complexities, but they managed to build layers of abstraction around them. For example, people stopped reinventing compilers, operating systems, and databases ages ago. I haven’t heard anyone (apart from a small circle of database developers) talking about the complexities of distributed databases that arise because they have to deal with byzantine faults and the consequences of the CAP theorem. Why should NAT traversal be any different?
But It Would Be Better in IPv6 World
Here’s the most common counter-argument to my “NAT is a necessary evil” rants, this time made by Daryll Swer:
All these problems don’t exist on native routed (and static) IPv6.
TL&DR: Bollocks.
Most hosts connected to the public IPv6 Internet over a LAN or WiFi sit behind a stateful firewall2 (for various reasons3). Punching holes through that firewall is equivalent to establishing NAT translations.
Oh, but dealing with firewalls is so much simpler in the IPv6 world:
Firewall hole punching only involves STUN, and that’s it. We move on with our lives.
Sort of4. Decent stateful firewalls match on the full 5-tuple, which is functionally equivalent to symmetric NAT, but don’t change the UDP port numbers when packets traverse them (making them equivalent to port-restricted cone NAT), so it’s easier to discover what hole your peer punched in their firewall.
On a more practical note, even the Cisco router5 between me and the global Internet seems to be using port-restricted cone NAT (another term for the same behavior seems to be Endpoint-Independent Mapping – EIM), and I don’t remember when a VoIP call or a video conferencing app would not work. Yes, things are unnecessarily complex (from the perspective of IPv6 fans), but they work. It seems the NAT-induced complexity is still not expensive enough to make migration to IPv6 cost-effective.
However, to be fair, CG-NAT in the IPv4 world does introduce a whole new level of evilness not present in the IPv6 world. For example, if two devices in the same CG-NAT cone6 want to communicate but happen to use a server outside of the NAT cone7 to find their respective IP addresses, the traffic has to go through the NAT device (hairpinning).
-
In other news, that’s not always a good idea. ↩︎
-
As one would expect, there’s an RFC describing the details. ↩︎
-
Copiously described in the Local Network Protection for IPv6 RFC with the added “you don’t need NAT for that” slant. ↩︎
-
As always, leave a comment with enough technical details, and I’ll fix the blog post. ↩︎
-
EIM-NAT seems to be the default on Cisco IOS XE and requires a nerd knob on Cisco IOS Classic. ↩︎
-
A fancy name for the inside NAT interface(s) ↩︎
-
A fancy name for the outside NAT interface(s) ↩︎
> Punching holes through that firewall is equivalent to establishing NAT translations.
I would not say outright 'equivalent'. There are more than one form of NAT, clearly, and not all forms of NAT (PAT) are as simple as firewall punching, the worse offender often called “symmetric NAT”. You yourself, stated: “Decent stateful firewalls match on the full 5-tuple, which is functionally equivalent to symmetric NAT, but don’t change the UDP port numbers when packets traverse them (making them equivalent to port-restricted cone NAT), so it’s easier to discover what hole your peer punched in their firewall.”
Every host in native IPv6 LANs, would have a unique /128 as a minimum (could also have /64 ia_pd routed to the host, but that's not common (yet? RFC9663) in LAN/Wi-Fi networks), the punching is per-unique /128 address (or host), no NAT/PAT aka shared single IP address occurs here across N number of nodes, preventing any kind of port exhaustion related issues and/or “Symmetric NAT-like” issues (which is the majority of default NAT configuration out there, even on CGNAT products excluding the exceptions like IOS-XE, A10 CGNAT etc), it is behaviourally, equivalent with port-restricted cone NAT which I do not disagree, but I disagree that it's a problem for 99% of IPv6-users, STUN has no problems dealing with port-restricted cone NAT behaviour on legacy IPv4 and certainly no problems with native IPv6 (where port exhaustion/re-write will never happen anyway) behind a firewall. I'd know, because I spent a lot of hours over the years, testing popular end-user applications that supports IPv6 P2P (usually VoIP/Video calling software because how much more P2P can we get if not telephony software of sorts!) and PCAPs showed successful src/dst IPs matching each peer's local-endpoint, i.e. ruled out TURN, behind stateful firewalls, of course STUN worked fine. However, I cannot replicate this on most CGNAT deployments that I don't control because the majority of ISPs never enabled EIM+EIF NAT on the CGNAT device, and it breaks further for intra-CGNAT traffic because they refuse to enable hairpining.
STUN/TURN/ICE/WebRTC aren't a joke — the people who wrote the code behind these tools, spent decades tweaking it, fixing it, just to make it work with CGNAT deployments and similar, simply because people fear native routed IPv6 - yes, there's people doing /128 IPv6 and NATting /64 ULAs because NAT is a firewall technology for security compliance according to some (yes, sarcasm): https://www.f5.com/resources/white-papers/the-myth-of-network-address-translation-as-security
Let's not start talking about NAT Slipstreaming attacks and the likes now, people need their /128s and NAT66es!
I should note, EIM-NAT alone isn't sufficient, you need it with EIF-NAT hand-in-hand + hairpinning as discussed, for the 'ideal' P2P friendly NAT/CGNAT deployments. Fortigate has a nice explanation for most people, if the RFCs are a bit confusing. https://docs.fortinet.com/document/fortigate/7.4.6/fortinet-carrier-grade-nat-field-reference-architecture-guide/920625/endpoint-independent-mapping https://docs.fortinet.com/document/fortigate/7.4.6/fortinet-carrier-grade-nat-field-reference-architecture-guide/921514/endpoint-independent-filtering
I don't know whether EIF works on IOS-XE, probably does, but if you have a Windows PC, you can test for RFC conformance using this tool (someone probably can port it to Linux/macOS): https://github.com/HMBSbige/NatTypeTester
Long story short — life's easier with native routed IPv6, no matter how you (as in, anyone who's pro-NAT) try to spin NAT being the saviour of IP networking. It may have been a different reality, had EIM+EIF+Hairpinning = default on all popularly used NATting software/OSes.
Yes, IPv6 multihoming is pain, BGP is great (routed IPv6 over BGP!), but can't BGP everywhere, and there's no good solution here, NAT66/NPTv6 or not, maybe some source routing on the LAN could handle this bit, but not sure how load balancing from local-endpoint would work on source address selection basis (i.e., you have two ISPs, each gave you a unique /48 and your VLAN has two /64s configured for SLAAC/RAs, now the endpoint has two /128s from two separate /64s-ISPs, how would the endpoint know when to do which prefix here for load balancing and that introduces a complexity of its own).
An ISP owner friend of mine, recently discussed on DHCPv6 server HA complexity across N BNGs, and we joked that, life's easier if every residential CPE supported BGP, we could just BGP everything and call it a day and not worry about next-hop failures or DHCPv6 state sync issues. But such is life, I'd still take IPv6 any day over NAT. Particularly in ISP networks, when you have P2P gamers on an Xbox or PS, intelligent CGNAT (EIF+EIM+Hairpinning) with dual-stack IPv6 ensures you don't get support tickets about P2P or port exhaustion issues because there's 1000 CPEs behind a /32.
No matter what way you look at it IPv6 brings less complexity for any of these peer-to-peer connections. Even if firewall hole punching is needed - there is usually just one. Unlike with the CG-NAT boxes most users are behind in addition to their on-site NAT.
You also don't have complications with brand-new protocols such as SIP or FTP, where they embed IPs in the protocol and people have these god-awful "application layer gateways" on their NAT boxes to mangle them.
It might not amount to a commercial justification to run IPv6. But I think from an engineering perspective it's hard to make the case that IPv4+NPAT doesn't add complexity.
Thank you both!
I totally agree with everything you're saying, but unfortunately, I also have this weird drive to point out the state of the Emperor's clothes every now and then.