Boris Lazarov sent me an excellent question:
Does it make sense and are there any inherent problems from design perspective to use the underlay not only for transport of overlay packets, but also for some services. For example: VMWare cluster, vMotion, VXLAN traffic, and some basic infrastructure services that are prerequisite for the rest (DNS).
Before answering it, let’s define some terminology which will inevitably lead us to the it’s tunnels all the way down endstate.
One of the most common ways of implementing a complex service on top of a simple transport network is to use tunnels. We could endlessly argue whether MPLS labels or virtual circuits are tunnels (or whether seven or nine angels can dance on the head of a pin), but let’s not go there.
In every network using tunnels (or some such technology) you have the transport part of the network – called underlay – that the edge nodes use to send tunnel-encapsulated user traffic to other edge nodes to implement customer (or services) network called overlay.
Let’s take VMware NSX-T as an example. The vSphere hypervisors use the IP connectivity provided by the physical data center fabric to exchange Geneve-encapsulated user traffic. Virtual networks are clearly the overlay, physical data center fabric is the underlay.
Unfortunately, VMware never mastered the art of using simple transport networks, they love to push the complexity onto others (and then blame them for being overly complex). In the NSX-V and NSX-T case, a redundantly connected vSphere server requires a single IP subnet spanning all ToR switches it’s connected to (usually a pair of switches)1. It also sends traffic belonging to multiple security domains – administration, storage, vMotion, customer – and VMware recommended designs rightfully suggest implement four different forwarding domains in the physical network.
What’s the easiest way to implement multiple forwarding domains in a modern data center fabric? Ask any vendor and they’ll immediately reply EVPN with VXLAN (not a bad answer, I might skip the EVPN part but that’s just me). The underlay network used by VMware NSX servers all of a sudden becomes an overlay fabric network that has to be implemented with another underlay network – simple IP connectivity within the data center fabric. Does that mean you’ll be running Geneve over VXLAN?2 Of course. Welcome to the modern world of infinite abstractions where bandwidth and CPU cycles seem to be free… or at least someone else is paying for them (hint: that would be you).
Could we bypass that second layer of abstraction and connect all servers straight to the physical IP fabric? Be extremely careful. VXLAN and Geneve are simple data-plane encapsulation schemes that have absolutely no security3. The moment someone gains access to the underlay network, they can inject any traffic into any overlay network4.
You should always start your design with “What problem am I trying to solve?” followed by “What is the best tool for the job?” Sometimes a stack of tunnels happens to be the least horrible option.
- Added a detailed explanation for the need of a single IP subnet spanning multiple ToR switches.
When a server uplink fails, vSwitch activates a backup interface for the virtual switch the overlay kernel interface is attached to. Unchanged underlay tunnel IP address thus appears on another server uplink, and if that uplink happens to be connected to a different ToR switch, you need the same subnet spanning multiple switches (more details). That does NOT mean you need a VLAN spanning those switches, you could use tricks like host routing. More details… ↩︎
To be precise, it will be Geneve-over-UDP-over-IP-over-Ethernet-over-VXLAN-over-UDP-over-IP. ↩︎
A fact promoted as an astonishing discovery to bedazzled attendees of security conferences… even though it’s clearly stated in VXLAN RFC. ↩︎
Most tunneling mechanisms (apart from IPsec for obvious reasons) have the same limitations. MPLS is no better. ↩︎