… updated on Thursday, September 30, 2021 16:19 UTC
Reusing Underlay Network for Infrastructure Services
Boris Lazarov sent me an excellent question:
Does it make sense and are there any inherent problems from design perspective to use the underlay not only for transport of overlay packets, but also for some services. For example: VMWare cluster, vMotion, VXLAN traffic, and some basic infrastructure services that are prerequisite for the rest (DNS).
Before answering it, let’s define some terminology which will inevitably lead us to the it’s tunnels all the way down endstate.
One of the most common ways of implementing a complex service on top of a simple transport network is to use tunnels. We could endlessly argue whether MPLS labels or virtual circuits are tunnels (or whether seven or nine angels can dance on the head of a pin), but let’s not go there.
In every network using tunnels (or some such technology) you have the transport part of the network – called underlay – that the edge nodes use to send tunnel-encapsulated user traffic to other edge nodes to implement customer (or services) network called overlay.
Let’s take VMware NSX-T as an example. The vSphere hypervisors use the IP connectivity provided by the physical data center fabric to exchange Geneve-encapsulated user traffic. Virtual networks are clearly the overlay, physical data center fabric is the underlay.
Unfortunately, VMware never mastered the art of using simple transport networks, they love to push the complexity onto others (and then blame them for being overly complex). In the NSX-V and NSX-T case, a redundantly connected vSphere server requires a single IP subnet spanning all ToR switches it’s connected to (usually a pair of switches)1. It also sends traffic belonging to multiple security domains – administration, storage, vMotion, customer – and VMware recommended designs rightfully suggest implement four different forwarding domains in the physical network.
What’s the easiest way to implement multiple forwarding domains in a modern data center fabric? Ask any vendor and they’ll immediately reply EVPN with VXLAN (not a bad answer, I might skip the EVPN part but that’s just me). The underlay network used by VMware NSX servers all of a sudden becomes an overlay fabric network that has to be implemented with another underlay network – simple IP connectivity within the data center fabric. Does that mean you’ll be running Geneve over VXLAN?2 Of course. Welcome to the modern world of infinite abstractions where bandwidth and CPU cycles seem to be free… or at least someone else is paying for them (hint: that would be you).
Could we bypass that second layer of abstraction and connect all servers straight to the physical IP fabric? Be extremely careful. VXLAN and Geneve are simple data-plane encapsulation schemes that have absolutely no security3. The moment someone gains access to the underlay network, they can inject any traffic into any overlay network4.
To Recap
You should always start your design with “What problem am I trying to solve?” followed by “What is the best tool for the job?” Sometimes a stack of tunnels happens to be the least horrible option.
For more details, you might want to watch Overlay Virtual Networking, VMware NSX Technical Deep Dive, EVPN Technical Deep Dive and Leaf-and-Spine Fabric Architectures webinars.
Revision History
- 2021-09-30
- Added a detailed explanation for the need of a single IP subnet spanning multiple ToR switches.
-
When a server uplink fails, vSwitch activates a backup interface for the virtual switch the overlay kernel interface is attached to. Unchanged underlay tunnel IP address thus appears on another server uplink, and if that uplink happens to be connected to a different ToR switch, you need the same subnet spanning multiple switches (more details). That does NOT mean you need a VLAN spanning those switches, you could use tricks like host routing. More details… ↩︎
-
To be precise, it will be Geneve-over-UDP-over-IP-over-Ethernet-over-VXLAN-over-UDP-over-IP. ↩︎
-
A fact promoted as an astonishing discovery to bedazzled attendees of security conferences… even though it’s clearly stated in VXLAN RFC. ↩︎
-
Most tunneling mechanisms (apart from IPsec for obvious reasons) have the same limitations. MPLS is no better. ↩︎
All networking professional should learn ITU-T G.805 and G.809. Then they would not be surprised. However, lot of them are stopped at the OSI reference model. Then they have difficulties to understand the real word of infinitely embedded overlays and underlays... For an old telco guy, this is nothing new. Just think about a hierarchical TDM network...
Hi Ivan!
I appreciate posting a dedicated blog post on my question.
I totally agree with “What problem am I trying to solve?” and “What is the best tool for the job?”.
My context is vSphere with VSAN and 100% custom made automation backend to provision tenants and services in EVPN, EdgeFW, ADC. Alongside a commercial off-the-shelf self-service "cloud management" frontend for end users.
So far i have seen that the Nexus9k BGP EVPN control-plane is far from 100% reliable and it is not hard at all to make it "misbehave" by, for example, pushing wrong or too much config, or deleting objects in the wrong order. In this project this part is dynamic, because it is based on front-end end-user self-service. So in this respect - misbehaving tenant VRF is one problem, but misbehaving vSAN, vSphere cluster, other fundamental infrastructure services is a totally different problem to have. In my view none of the "benefits" that VXLAN/EVPN gives are needed for the infrastructure services, hence skipping further abstraction layers and most importantly dependencies is the best decision.
Does it make sense?
@Boris: I totally understand that someone painted you into a corner, and that you're trying to get out, but we're still walking in a circle in that tiny part of the room around the corner.
Unfortunately, there's no simple way to change laws of physics, or pull a rabbit out of a hat. You could either ignore VMware recommendations and connect all management traffic to the underlay, or implement tenant networks with NSX-T, or live with the consequences of the decision to implement tenant networks on ToR switches.
On a tangential topic, I've been ranting against the idea of implementing tenant networks in hardware for over a decade, but of course nobody listens.
Which VMWare recommendations are you referring to?
@Boris, just a thought, in line with Ivan's simplicity here. Since engineering is all about intuitive simplification of problems, and you've already had an intuition about how VXLAN/EVPN isn't needed for infrastructure services, why not use the simple physical IP fabric? Occam's razor is a time-tested wisdom, or RFC 1925 rule 12 if you want a modern-day version.
As for security, Ivan blogged about it here:
https://blog.ipspace.net/2018/11/omg-vxlan-is-still-insecure.html
Sprint, Verizon, big ISPs, have been running huge MPLS networks for decades, and they have the same physical security issue -- or non-issue, depends on your perspective -- here. Probably beneficial to go in that direction and research how they physically secure their underlay.
Speaking of MPLS, there's one group that successfully leveraged it for their Cloud services as well, here, if you want to take a deeper look:
https://www.youtube.com/watch?v=TCtR_cujulk&t=30s