EVPN With MPLS Data Plane in Data Centers
Mr. Anonymous (my most loyal reader and commentator) sent me this question as a comment to one of my blog posts:
Is there any use case of running EVPN (or PBB EVPN) in DC with MPLS Data Plane, most vendors seems to be only implementing NVO to my understanding.
Sure there is: you already have MPLS control plane and want to leverage the investment.
You might also want to persuade your customers that this is a must-have feature because your competitors don’t have it in their products.
Sarcasm aside:
- EVPN uses BGP Tunnel Encapsulation Attribute (RFC 5512, new version in draft-ietf-idr-tunnel-encaps) to indicate the encapsulation to use to reach the egress PE device (we shouldn’t call them routers anymore, right)?
- The values of this attribute can be found in IANA registry.
- Almost all of them rely on something-over-IP tunneling.
Maybe the real question should be “why is that the case?” In a word: decoupling.
The Benefits of Decoupling
When using whatever-over-IP encapsulation technology with EVPN, the transport fabric remains simple and clean. It’s a pure IP fabric using a single encapsulation (IP) and running a single routing protocol.
MPLS encapsulation used with EVPN control plane requires end-to-end LSPs between PE devices. You’ll have to deal with two encapsulations (IP and MPLS), two sets of forwarding tables (FIB and LFIB), and additional control-plane protocols – LDP, MPLS-TE, BGP IPv4+labels or segment routing.
Each LSP has to exist in the forwarding table of every node it traverses. That introduces extra state in the transport fabric - not relevant if you have 10 switches, highly relevant if you have thousands of switches (example: WAN deployment) and the core switches use high-speed merchant silicon.
Admittedly, the MPLS encapsulation introduces lower overhead than whatever-over-IP encapsulation. That overhead becomes relevant when the bandwidth becomes expensive: in WAN networks, not in data centers.
To summarize: there’s no free lunch. You have to accept higher encapsulation overhead or more complex transport fabric. I know what I would do when designing data center infrastructure.
Bandwidth within a data center is usually cheaper than the operational complexity of managing MPLS TE.
As a bonus using MPLS in this case avoids any translations at the DC edge.
Then again, the next trend is moving to SRv6 because you can more easily extend the traffic engineering part into the virtual machine (VNF).
So RFC1925, Rule 11 is very much alive :)
I think there is a use case for the TE.
If you think about micro-services in networking and you buy chip boxes but a lot of them (instead of two links of 40g 8 of 10g ) and you use TE to not oversubscribe your links...
We're in agreement that if you need to add LDP, RSVP-TE, etc to your datacenter routing stack, not only does it add complexity but also limits your platform choices to OSs that support those protocols well.
However, I feel that a lot of the confusion, complexity, & "woo, scary" surrounding MPLS is due to the complexity & scarcity of LDP & RSVP implementations.
I use RFC3017 BGP-LU in the lab to teach people MPLS without them having to learn LDP or RSVP.
If we look at a different (& hopefully more typical) scenario of a datacenter leaf/spine fabric running e.g. BGP-only IP-unnumbered commodity silicon boxes, then you add BGP-LU & whatever BGP VPN address families you need to all your sessions, & you're done.
The highlights of this scenario are:
- BFD on your BGP sessions will get you "fast-enough" convergence.
- if your platform(s) support it, you get L3VPN, L2VPN, VPLS, EVPN, & possibly other BGP VPN techs
- knobs like "vpn-unequal-cost" make load-balancing/multipath easy
- interop with other encapsulations like LDP pseudowires & EVPN VXLAN at your borders if you need it elsewhere
- the P routers ("spine" boxes) don't need to support any of that stuff, just BGP-LU, & you can use "horses for courses" heterogenous PE routers ("leaf" boxes) where you need them too.
- it's just BGP, which you're going to need to learn anyway if you have any hope of scaling
- & all your VPNs are routed to your edge boxes in the same way, & you get communities, route-targets (& constrained RT filtering), RRs, MEDs, localpref, etc.
From the digging I've done, if you're not using a bog-standard tunneling protocol like MPLS, VXLAN, NVGRE, & maybe Geneve (e.g. I've had a heck of a time finding modern switch chips that just support plain-old GRE, though there are a few "smart-NICs" that do it), you're not going to find support for such tunneling-encapsulations-du-jour in commodity silicon.
That said, IIRC Juniper does support MPLS over UDP on commodity silicon boxes like ACX5K for interop with their Contrail VRouter. But I haven't seen many others. I think most of those encapsulations are limited to implementation in software on a hypervisor/VM.
(BTW just to be clear, my original post advocates a BGP-only fabric without an IGP or LDP/RSVP)
& honestly, I find MPLS itsself to be more straightforward to deal with on the wire than any of those X-over-IP protocols (again, I'm not talking about MPLS signaling, I'm talking about the data-plane implementation). It's the only one that gives you per-hop visibility into the path of your tunnels (i.e. you can log into a P router & see what labels the ingress & egress PE are sending you) & comes with a full suite of carrier-grade OAM protocols.
I do agree with Blake's last thesis that MPLS is more straightforward (but that could be subjective due to my networking past and love to MPLS). I also believe that it is too early to say that something-over-IP tunneling won everything.
There are still cases for MPLS even in DC that can be much more realistic with MPLS support in recent Linux kernel - so on commodity HW sooner or later, simplification with methods mentioned above (SR, BGP-LU or centralization), Egress Peer Engineering case or some others which are still waiting for publicity and hype :)