Could MPLS-over-IP replace VXLAN or NVGRE?
A lot of engineers are concerned with what seems to be frivolous creation of new encapsulation formats supporting virtual networks. While STT makes technical sense (it allows soft switches to use existing NIC TCP offload functionality), it’s harder to figure out the benefits of VXLAN and NVGRE. Scott Lowe wrote a great blog post recently where he asked a very valid question: “Couldn’t we use MPLS over GRE or IP?” We could, but we wouldn’t gain anything by doing that.
RFC 4023 specifies two methods of MPLS-in-IP encapsulation: MPLS label stack on top of IP (using IP protocol 137) and MPLS label stack on top of GRE (using MPLS protocol type in GRE header). We could use either one of these and use either the traditional MPLS semantics or misuse MPLS label as virtual network identifier (VNI). Let’s analyze both options.
Misusing MPLS label as VNI
In theory, one could use MPLS-over-IP or MPLS-over-GRE instead of VXLAN (or NVGRE) and use the first MPLS label as the VNI. While this might work (after all, NVGRE reuses GRE key as VNI), it would not gain us anything. The existing equipment would not recognize this “creative” use of MPLS labels, and we still wouldn’t have the control plane and would have to rely on IP multicast to emulate virtual network L2 flooding.
The MPLS label = VNI approach would be totally incompatible with existing MPLS stacks and would thus require new software in virtual-to-physical gateways. It would also go against the gist of MPLS – labels should have local significance (whereas VNI has network-wide significance) and should be assigned independently by individual MPLS nodes (egress PE-routers in MPLS/VPN case).
It’s also questionable whether the existing hardware would be able to process MAC-in-MPLS-in-GRE-in-IP packets, which would be the only potential benefit of this approach. I know that some (expensive) linecards in Catalyst 6500 can process IP-in-MPLS-in-GRE packets (as do some switches from Juniper and HP), but can it process MAC-in-MPLS-in-GRE? Who knows.
Finally, like NVGRE, MPLS-over-GRE or MPLS-over-IP framing with MPLS label being used as the VNI lacks entropy that could be used for load balancing purposes; existing switches would not be able to load balance traffic between two hypervisor hosts unless each hypervisor hosts would use multiple IP addresses.
Reusing existing MPLS protocol stack
Reusing MPLS label as VNI buys us nothing; we’re thus better off using STT or VXLAN (at least equal-cost load balancing works decently well). How about using MPLS-over-GRE the way it was intended to be used – as part of the MPLS protocol stack? Here we’re stumbling across several major roadblocks:
- No hypervisor vendor is willing to stop supporting L2 virtual networks because they just might be required for “mission-critical” craplications running over Microsoft’s Network Load Balancing, so we can’t use L3 MPLS VPN.
- There’s no usable Ethernet-over-MPLS standard. VPLS is a kludge (= full mesh of pseudowires) and alternate approaches (draft-raggarwa-mac-vpn and draft-ietf-l2vpn-evpn) are still on the drawing board.
- MPLS-based VPNs require decent control plane, including control-plane protocols like BGP, and that would require some real work on hypervisor soft switches. Implementing an ad-hoc solution like VXLAN based on doing-more-with-less approach (= let’s push the problem into someone else’s lap and require IP multicast in network core) is cheaper and faster.
Summary
Using MPLS-over-IP/GRE to implement virtual networks makes marginal sense, does not solve the load balancing problems NVGRE is facing, and requires significant investment in the hypervisor-side control plane if you want to do it right. I don’t expect to see it implemented any time soon (although Nicira could do it pretty quickly should they find a customer who would be willing to pay for it).
D
Interesting points. I agree mLDP can potentially perform a similar role with a L3 VPN setup. VMkernel in a host can act as a PE router with the IP network running LDP. However, it will also require the VMkernel to run BGP. I think even PBB-VPLS would be a viable option. We are running PBB-VPLS in a service provider environment.
Also, you don't have to run LDP or VPNv4 on the hypervisor, a controller could do that.
A very good add-on discussion to the post would be looking at LISP, and the abstracted control-plane it offers. Once multicast is supported, there is no reason why L2-over-IP could not be leveraged. It natively uses IP-over-IP (UDP), and has the control-plane to scale (analogous to DNS). An interesting topic for sure as LISP evolves.
Furthermore, if edge virtual firewalls are inevitably required to do basic network-level segmentation, then I see no reason why a private cloud with no overlapping address space needs any more than a single all spanning virtual network with edge virtual firewalls to implement all segmentation (network and application level granularity).
It's a rather interesting time to see progression and maturity of network technology to create new solutions to the virtualized data centers.
http://blog.ine.com/2012/08/17/otv-decoded-a-fancy-gre-tunnel/
..."From a high level overview, OTV is basically a layer 2 over layer 3 tunneling protocol. In essence OTV accomplishes the same goal as other L2 tunneling protocols such as L2TPv3, Any Transport over MPLS (AToM), or Virtual Private LAN Services (VPLS). For OTV specifically this goal is to take Ethernet frames from an end station, like a virtual machine, encapsulate them inside IPv4, transport them over the Data Center Interconnect (DCI) network, decapsulate them on the other side, and out pops your original Ethernet frame."
1/offload the network function from hypervisor to physical access switch, as what VM-FEX does
2/ physical access switch support VXLAN
3/ decouple the control plane layer to manage VXLAN.
it'll solve the VM to VM, VM to physical, physical to physical traffic, and VLAN limitations for the whole data center, and decrease quantities of VXLAN switches.
Cooper Wu/http://www.linkedin.com/pub/cooper-wu/4b/79a/bb
From cloud perspective , you'll disagree with my option, I believe you prefer putting all service within the virtualization framework ,nothing to do with physical network infrastructure.
Network admin will be going mad , since all they can see are tunnel packets , they are of no help if there is problem.
It's true that at current stage, there are limitations and complex to orchestrate and tightly couple hypervisor with ToR. : )
If combining with O/F , hypervisor tightly couples with control plane servers/clusters , not ToR switch. it'll be more reasonable.
Whatever the reason, the whole industry has standardized on VXLAN (even Microsoft will support it in the next Hyper-V release... or maybe it's already shipping?), so we better get used to it ;)