A lot of engineers are concerned with what seems to be frivolous creation of new encapsulation formats supporting virtual networks. While STT makes technical sense (it allows soft switches to use existing NIC TCP offload functionality), it’s harder to figure out the benefits of VXLAN and NVGRE. Scott Lowe wrote a great blog post recently where he asked a very valid question: “Couldn’t we use MPLS over GRE or IP?” We could, but we wouldn’t gain anything by doing that.
RFC 4023 specifies two methods of MPLS-in-IP encapsulation: MPLS label stack on top of IP (using IP protocol 137) and MPLS label stack on top of GRE (using MPLS protocol type in GRE header). We could use either one of these and use either the traditional MPLS semantics or misuse MPLS label as virtual network identifier (VNI). Let’s analyze both options.
Misusing MPLS label as VNI
In theory, one could use MPLS-over-IP or MPLS-over-GRE instead of VXLAN (or NVGRE) and use the first MPLS label as the VNI. While this might work (after all, NVGRE reuses GRE key as VNI), it would not gain us anything. The existing equipment would not recognize this “creative” use of MPLS labels, and we still wouldn’t have the control plane and would have to rely on IP multicast to emulate virtual network L2 flooding.
The MPLS label = VNI approach would be totally incompatible with existing MPLS stacks and would thus require new software in virtual-to-physical gateways. It would also go against the gist of MPLS – labels should have local significance (whereas VNI has network-wide significance) and should be assigned independently by individual MPLS nodes (egress PE-routers in MPLS/VPN case).
It’s also questionable whether the existing hardware would be able to process MAC-in-MPLS-in-GRE-in-IP packets, which would be the only potential benefit of this approach. I know that some (expensive) linecards in Catalyst 6500 can process IP-in-MPLS-in-GRE packets (as do some switches from Juniper and HP), but can it process MAC-in-MPLS-in-GRE? Who knows.
Finally, like NVGRE, MPLS-over-GRE or MPLS-over-IP framing with MPLS label being used as the VNI lacks entropy that could be used for load balancing purposes; existing switches would not be able to load balance traffic between two hypervisor hosts unless each hypervisor hosts would use multiple IP addresses.
Reusing existing MPLS protocol stack
Reusing MPLS label as VNI buys us nothing; we’re thus better off using STT or VXLAN (at least equal-cost load balancing works decently well). How about using MPLS-over-GRE the way it was intended to be used – as part of the MPLS protocol stack? Here we’re stumbling across several major roadblocks:
- No hypervisor vendor is willing to stop supporting L2 virtual networks because they just might be required for “mission-critical” craplications running over Microsoft’s Network Load Balancing, so we can’t use L3 MPLS VPN.
- There’s no usable Ethernet-over-MPLS standard. VPLS is a kludge (= full mesh of pseudowires) and alternate approaches (draft-raggarwa-mac-vpn and draft-ietf-l2vpn-evpn) are still on the drawing board.
- MPLS-based VPNs require decent control plane, including control-plane protocols like BGP, and that would require some real work on hypervisor soft switches. Implementing an ad-hoc solution like VXLAN based on doing-more-with-less approach (= let’s push the problem into someone else’s lap and require IP multicast in network core) is cheaper and faster.
Using MPLS-over-IP/GRE to implement virtual networks makes marginal sense, does not solve the load balancing problems NVGRE is facing, and requires significant investment in the hypervisor-side control plane if you want to do it right. I don’t expect to see it implemented any time soon (although Nicira could do it pretty quickly should they find a customer who would be willing to pay for it).