Using MPLS+EVPN in Data Center Fabrics
Here’s a question I got from someone attending the Building Next-Generation Data Center online course:
Cisco NCS5000 is positioned as a building block for a data center MPLS fabric – a leaf-and-spine fabric with MPLS and EVPN control plane. This raised a question regarding MPLS vs VXLAN: why would one choose to build an MPLS-based fabric instead of a VXLAN-based one assuming hardware costs are similar?
There’s a fundamental difference between MPLS- and VXLAN-based transport: the amount of coupling between edge and core devices.
MPLS-based VPN solutions require an end-to-end LSP (virtual circuit) between edge devices that has to be set up on every hop of the way and coordinated between edge and core devices using whatever control-plane protocol you use for MPLS. The LSP also has to be kept operational throughout various network failures, and the changes signaled to the edge devices.
Long story short: MPLS requires a very tight coupling between edge and core nodes. Furthermore, the need for end-to-end LSP installed in every core node prevents any form of state summarization in the core nodes unless you go for another layer of complexity like Carrier’s Carrier MPLS/VPN.
VXLAN-based VPN solutions require nothing more than IP connectivity between edge devices. The edge devices don’t have to participate in the core control-plane protocol (apart from using ARP) and the changes in the transport core are not signaled to the edge.
There is almost no state sharing between edge and core nodes in VXLAN, and no per-edge-node state in the core (unless you’re running VXLAN between loopback interfaces of edge nodes), making VXLAN more robust and easier to scale.
Also keep in mind though that mature full-blown MPLS control plane (using LDP or SR in IS-IS/OSPF) is not available from all vendors (a cynical mind would say that might be the reason Cisco is pushing MPLS based fabric).
However, there’s no free lunch. Edge nodes can easily detect transport failures when they’re tightly coupled with the fabric core (using traditional MPLS or VXLAN between loopback interfaces). Reliable detection of transport failures in loosely-coupled systems requires an end-to-end monitoring layer… something that solutions like VMware NSX never implemented because it’s easier to offload hard problems to the network and expect MLAG support on fabric edge (as I explained in the logical switches part of the NSX Technical Deep Dive webinar).
Finally, the tight coupling of edge and core nodes in MPLS gives you the ability to do traffic engineering between fabric edges. Whether you need that in a data center fabric, and whether the gains justify the operational efforts (as opposed to buying more bandwidth) is another interesting question that I never got a clear answer to when I asked it in RIFT and OpenFabric episodes of Software Gone Wild.
Which one should I use?
As long as you’re doing encapsulation on fabric edge in a reasonably-small fabric, and the number of prefixes in your fabric doesn’t exceed the MPLS forwarding capacity of your switches, and you don’t plan to build a multi-vendor fabric, it doesn’t matter that much whether you use MPLS or VXLAN encapsulation.
Doing MPLS in the hypervisor makes less sense due to tight coupling required between edge and core devices to synchronize labels. Juniper Contrail is the only product I’m aware of that used MPLS between hypervisors and even they moved to VXLAN.
Am I missing something? Please write a comment!
Obviously you could get around that limitation by using a controller that would install labels (or label stacks) in edge nodes... but the question remains: what would you get with that?
You could use BGP to distribute SIDs - draft-ietf-idr-bgp-prefix-sid, there's XR implementation, I believe Juniper should have one too... I'm in no way advocating to run MPLS in DC, especially not for TE reasons as some of our Cisco friends may suggest :) There's number of a rather large DCs that run MPLS, so mileage may very. I'm a big believer in SRinUDP architecture (based on RFC7510) that allows to program (programmatically interact with)WAN for end2end path DC-WAN(with TE)-DC. More info in MPLS Segment Routing in IP Networks draft-bryant-mpls-unified-ip-sr-03
Speaking of "large DCs running MLPS" - do they have MPLS at the WAN edge to select egress WAN interface on HTTP(S) proxy servers, or do they use MPLS as the core fabric transport?
Thank you!
MPLSoGRE not being very ECMP friendly is probably the motivation for moving to a udp based transport. In that sense not much of a difference between MPLSoUDP and VXLAN, except for resource overheads in hardware/software perhaps.
Xiaohu