Why Are We Using EVPN Instead of SPB or TRILL?
Dan left an interesting comment on one of my previous blog posts:
It strikes me that the entire industry lost out when we didn’t do SPB or TRILL. Specifically, I like how Avaya did SPB.
Oh, we did TRILL. Three vendors did it in different proprietary ways, but I’m digressing.
TRILL was a zombie the moment VXLAN appeared. TRILL requires new chipsets on every switch in the path whereas VXLAN runs over IP. Apart from slightly shorter packet headers, it provided nothing IP had not had for decades.
SPB was a mess. They couldn’t decide whether to use VLAN tags or PBB header for extra encapsulation, resulting in SPBM and SPBV. In the end, only SPBM remained, but the damage had already been done.
SPB’s development in IEEE (instead of IETF) didn’t help either. The IEEE standardization process reeks of CCITT (remember the guys creating X.25?), it was either hard or impossible to access recent standards (let alone drafts), and they thought everything they created had to look like Ethernet. IEEE went as far as overcomplicating IS-IS to prevent micro-loops because they refused to add a hop count to the data-plane encapsulation1. In real life, some vendors skipped that part when implementing SPBM, assuming that the networks wouldn’t be bothered (too much) by an occasional forwarding loop (yeah, that sounds reassuring).
The original EVPN required an MPLS data plane, which was a perfect match for service providers already running MPLS. Since the early days of MPLS, you could also run MPLS over GRE, enabling a gradual deployment of MPLS over IP networks. In any case, EVPN was just another service using MPLS LSPs. Adding EVPN to an existing MPLS network was utterly non-disruptive.
Adding VXLAN encapsulation to EVPN was the watershed moment. Finally, you could decouple the end-user services from the transport core network and implement the virtual networks on edge nodes (hypervisors) with no control-plane interaction with the physical network.
SPBM required end-to-end PBB encapsulation and tight integration of edge- and core switches (every switch had to run IS-IS). You could not deploy it gradually on top of an existing bridged network; the network core had to run SPBM. While Avaya eventually added PBB-over-IP (or was it GRE?), that addition was proprietary and too late to matter. VXLAN had already won.
Now for the technical details:
IS-IS, as an interior routing protocol, can handle 1000s routers. We don’t need anything more scalable like BGP unless you’re AWS/Microsoft/Google/Facebook.
I’ve been saying the same thing for years, and intelligent people (including those who designed AWS fabric with OSPF) know that. Only the EBGP-as-better-IGP cargo cult followers think you need2 BGP as an underlay routing protocol in a data center fabric.
IS-IS doesn’t need addressing because it’s an ISO protocol. As long as the interface can run Ethernet, an adjacency can form. No IPv4 or IPv6 addresses needed, link-local or otherwise.
IS-IS needs CLNS node addresses to run but does not need layer-3 interface addresses. However, if you want to run your transport services on top of IP (see above), you need IPv6 LLA and (at minimum) IPv4 loopback addresses.
Keep in mind, all this “Interface EBGP Session” stuff is needed to bootstrap all the other stuff we will need: multi-protocol BGP, adjusting the NLRI in BGP, VXLAN-GPO, loopbacks for the VTEPs, routing protocols to coordinate with the devices in the overlay (e.g., firewalls), etc.
The “interface EBGP session” stuff is there only to please the cargo cult followers (or hyperscalers buying Arista and Cisco gear). EVPN using IBGP between loopback interfaces works just fine.
However, you need something to propagate endpoint reachability information between fabric edge devices. My Avaya (now Extreme) friends tell me IS-IS works fine for typical enterprise use cases. Still, I’m pretty sure we would eventually hit some limits of IS-IS LSPs (I wrote about the details almost exactly a decade ago).
Finally, EVPN advertises endpoint MAC addresses in BGP updates, which allows load balancing and ESI-based multihoming. SPBM (like traditional bridging) relies on datapath MAC learning. Comparing EVPN to SPBM is a bit like comparing a boring 4WD family sedan to a roadster3.
As for “routing protocols to coordinate with the devices in the overlay,” that’s the bane of every “disruptive” greenfield solution (including SD-WAN). Life is simple if you limit yourself to VLANs and directly connected IP prefixes. Once you want to integrate your solution with the existing customer network, you must implement routing protocols and two-way redistribution, and your solution becomes as complex as MPLS/VPN or EVPN. For example, (according to my vague recollection of a lunch discussion) Avaya’s SPBM solution had a lovely (proprietary) IP Multicast implementation that worked great as long as the multicast domain was limited to the Avaya fabric and you did not insist on using PIM.
-
TRILL was designed by people with layer-3 packet forwarding experience and had a hop count field in the TRILL header. ↩︎
-
As opposed to “it’s cool and looks great on my resume” or “I have this cool gear that treats EBGP like OSPF, so let’s use it.” I have nothing against EBGP-only designs as long as they result in a simpler overall design. IBGP-over-EBGP or EBGP-over-EBGP is (from my perspective) not in that category. Also, I wanted to point out that you don’t need EBGP to run a data center fabric. ↩︎
-
Roadsters are great unless you’re going on holiday with three kids or have a snowstorm arriving in a few hours. ↩︎
> IBGP-over-EBGP or EBGP-over-EBGP
What? There are people doing that? I wasn't even aware nor seen that before, crazy.
BTW, I found the EBGP-over-EBGP approach too abhorrent to describe it.
Also: https://blog.ipspace.net/tag/evpn.html#rants
All of this nonsense complexity wouldn't exist if we just stopped spanning and stretching Ethernet and instead either develop a successor to Ethernet, or move to layer 3-only hardware + networking. STP wasn't enough, came SPB and TRILL, wasn't enough, came VXLAN/EVPN sigh, next up, Ethernet over HTTPS Port 443 — half-joking, but probably will become reality as the industry is moving everything to port 443 like SSH3 now.
I'm currently trying to push IPv6-native-layer-3-only approach for Docker/Swarm at least: https://github.com/docker/docs/issues/19556
K8s apparently has some eBPF based "DSR" and similar that removes the need for layer 2 adjacency (VXLAN/EVPN encap) - However, it is to be noted I'm no expert on K8s. You can find some info on this on that Docker issue, other users talked about it.
SPB/TRILL were (are?) both great ideas that should have seen more market acceptance. Simple, reliable, scalable. What's not to like?
So why did the industry buck those technologies and pivot to VXLAN and EVPN? I blame software. The SDN movement has only ever wanted IP connectivity ("We'll take it from here" was their mantra). They had painfully little interest in layer-2 design, efficiency be damned.
The hardware folks could have doubled-down on L2, as Avaya did. But many saw existential risk, as if a few Stanford Linux nerds would somehow make hardware irrelevant (or commodity). So, faster than you can say "openflow", the HW companies embraced IP overlays, imagining a world of HW and SW VTEPs passing packets around while humming kumbaya. (to date, HW/SW overlay integration is extremely rare)
Who cares if the configuration is complex and individualized? Why aren't you using a controller or automation? Who cares if MTU blows up? That's a software problem. Who cares if TCP offloads fail. That's a NIC problem. Who cares if traffic flow is incomprehensible complex with the rabbit hole of nested overlays, distributed gateways, and multiple routing tiers? That's job security.
@Bogdan Golab
I am speechless…
I even checked the release date of this RFC. It's not April 1st...