Using MPLS+EVPN in Data Center Fabrics

Here’s a question I got from someone attending the Building Next-Generation Data Center online course:

Cisco NCS5000 is positioned as a building block for a data center MPLS fabric – a leaf-and-spine fabric with MPLS and EVPN control plane. This raised a question regarding MPLS vs VXLAN: why would one choose to build an MPLS-based fabric instead of a VXLAN-based one assuming hardware costs are similar?

There’s a fundamental difference between MPLS- and VXLAN-based transport: the amount of coupling between edge and core devices.

MPLS-based VPN solutions require an end-to-end LSP (virtual circuit) between edge devices that has to be set up on every hop of the way and coordinated between edge and core devices using whatever control-plane protocol you use for MPLS. The LSP also has to be kept operational throughout various network failures, and the changes signaled to the edge devices.

Long story short: MPLS requires a very tight coupling between edge and core nodes. Furthermore, the need for end-to-end LSP installed in every core node prevents any form of state summarization in the core nodes unless you go for another layer of complexity like Carrier’s Carrier MPLS/VPN.

VXLAN-based VPN solutions require nothing more than IP connectivity between edge devices. The edge devices don’t have to participate in the core control-plane protocol (apart from using ARP) and the changes in the transport core are not signaled to the edge.

There is almost no state sharing between edge and core nodes in VXLAN, and no per-edge-node state in the core (unless you’re running VXLAN between loopback interfaces of edge nodes), making VXLAN more robust and easier to scale.

Also keep in mind though that mature full-blown MPLS control plane (using LDP or SR in IS-IS/OSPF) is not available from all vendors (a cynical mind would say that might be the reason Cisco is pushing MPLS based fabric).

However, there’s no free lunch. Edge nodes can easily detect transport failures when they’re tightly coupled with the fabric core (using traditional MPLS or VXLAN between loopback interfaces). Reliable detection of transport failures in loosely-coupled systems requires an end-to-end monitoring layer… something that solutions like VMware NSX never implemented because it’s easier to offload hard problems to the network and expect MLAG support on fabric edge (as I explained in the logical switches part of the NSX Technical Deep Dive webinar).

Finally, the tight coupling of edge and core nodes in MPLS gives you the ability to do traffic engineering between fabric edges. Whether you need that in a data center fabric, and whether the gains justify the operational efforts (as opposed to buying more bandwidth) is another interesting question that I never got a clear answer to when I asked it in RIFT and OpenFabric episodes of Software Gone Wild.

Which one should I use?

As long as you’re doing encapsulation on fabric edge in a reasonably-small fabric, and the number of prefixes in your fabric doesn’t exceed the MPLS forwarding capacity of your switches, and you don’t plan to build a multi-vendor fabric, it doesn’t matter that much whether you use MPLS or VXLAN encapsulation.

Doing MPLS in the hypervisor makes less sense due to tight coupling required between edge and core devices to synchronize labels. Juniper Contrail is the only product I’m aware of that used MPLS between hypervisors and even they moved to VXLAN.

Am I missing something? Please write a comment!

11 comments:

  1. If I'm not mistaken, with Segment Routing you'd have MPLS domain wide labels as labels are in the IGP. Wouldn't that allow you to simplify your core (in reference to your CSC remark) network? I know you can have controllers in SR networks determining the path by stacking labels, but it can also be a bit more "hands-off" as labels are already in the IGP, so a controller is not a hard requirement to create a label-stack.
    Replies
    1. Segment routing using OSPF or IS-IS as control plane carries labels in router LSAs, so the whole MPLS domain has to be a single OSPF/IS-IS area.

      Obviously you could get around that limitation by using a controller that would install labels (or label stacks) in edge nodes... but the question remains: what would you get with that?
  2. If you want to be a unicorned locked-in snowflake go with MPLS.
  3. Ivan,

    You could use BGP to distribute SIDs - draft-ietf-idr-bgp-prefix-sid, there's XR implementation, I believe Juniper should have one too... I'm in no way advocating to run MPLS in DC, especially not for TE reasons as some of our Cisco friends may suggest :) There's number of a rather large DCs that run MPLS, so mileage may very. I'm a big believer in SRinUDP architecture (based on RFC7510) that allows to program (programmatically interact with)WAN for end2end path DC-WAN(with TE)-DC. More info in MPLS Segment Routing in IP Networks draft-bryant-mpls-unified-ip-sr-03
    Replies
    1. How could I have missed that. The answer is BGP... what exactly was the question ;)) But yeah, that definitely increases the scalability by an order of magnitude.

      Speaking of "large DCs running MLPS" - do they have MPLS at the WAN edge to select egress WAN interface on HTTP(S) proxy servers, or do they use MPLS as the core fabric transport?

      Thank you!
  4. On hypervisors both Contrail and Nuage support(ed) MPLSoGRE data planes, with Contrail additionally supporting MPLSoUDP.
    MPLSoGRE not being very ECMP friendly is probably the motivation for moving to a udp based transport. In that sense not much of a difference between MPLSoUDP and VXLAN, except for resource overheads in hardware/software perhaps.
  5. Not a fan at all of the NCS 5K platforms. Currently evaluating the 5502 and it's beaten in every conceivable way by the competing Juniper platform, including cost.
  6. One could also consider VXLAN and MPLS coexistence. As in east-west traffic domain over VXLAN and north-south traffic domain (across WAN) over MPLS. CE attach in VXLAN contexts. Local FE and external routes are selectively exported across VXLAN VRF and MPLS VRF (or global table) at leaf for north-south flows. BE-BE and BE-FE flows stay VXLAN. Only FE leaf need the MPLS+VXLAN capabilities. Best of both worlds.
    Replies
    1. Effectively it is distributed service chaining of VXLAN tenant through local MPLS service functions for north-south flows.
  7. Yes, I'm sure that you had missed something important: the MPLS/BGP VPN doesn't require that the underlay must be an MPLS fabric thanks to those MPLS-in-IP tunneling technologies such as MPLS-in-UDP [RFC7510].

    Xiaohu
    Replies
    1. While you're technically correct, there's a gap between theory and practice: how many leading data center switching vendors implemented MPLS-in-UDP?
Add comment
Sidebar