Multivendor EVPN Just Works
Shipping netlab release 1.9.0 included running 36 hours of integration tests, including fifteen VXLAN/EVPN tests covering:
- Bridging multiple VLANs
- Asymmetric IRB, symmetric IRB, central routing, and running OSPF within an IRB VRF.
- Layer-3 only VPN, including routing protocols (OSPF and BGP) between PE-router and CE-routers
- All designs evangelized by the vendors: IBGP+OSPF, EBGP-only (including reusing BGP AS number on leaves), EBGP over the interface (unnumbered) BGP sessions, IBGP-over-EBGP, and EBGP-over-EBGP.
All tests included one or two devices under test and one or more FRR containers1 running EVPN/VXLAN with the devices under test. The results were phenomenal; apart from a few exceptions, everything Just Worked™️.
The only caveats2 we identified in the process3 were:
- An ArubaCX quirk (probably an artifact of software-based packet forwarding) prevented it from interoperating with Linux VXLAN driver as a VXLAN-to-VXLAN router.
- A weird bug in FRRouting OSPF daemon causes OSPF hello packets to have incorrect MTU when sent over the VXLAN segment.
- Centralized VXLAN-to-VXLAN routing didn’t work on SR Linux, but it might have been a configuration issue.
I know vendors have made interoperability claims for years, but we all know what intellectual capital they bring to the interoperability tests. This time, we were running publicly available images (sometimes not even the newest ones4) using the configurations we could scramble together from vendor documentation.
Admittedly, we did not do any-to-any tests but used FRRouting and Linux VXLAN driver as the baseline, and we couldn’t test the quality of hardware programming. Still, even considering all that, I was amazed at how well it all worked.
Want to repeat the tests? Everything is open-source ;) All you have to do is install the necessary software and jump over the hurdles created by vendors’ image download process.
Want to see the device configurations we used? You can find them in the test logs:
- Open the test results
- Click on the device name
- Click on the checkmark in the Devices Configured column in the relevant test results row.
-
I’m not rich enough to buy enough RAM to run multiple instances of some vendors’ bloatware. I also don’t have enough time left in this life to wait for all of them to boot. ↩︎
-
After fixing the BGP next-hop handling on EBGP EVPN address family on numerous platforms and figuring out where to apply the allowas-in keyword. ↩︎
-
Ignoring Dell OS10 boot failures due to the random inaccessibility of their SSH server. ↩︎
-
I’m not going to waste 12GB of RAM for a single switch instance. ↩︎
Hi Ivan, great work.
EVPN has come a long way, for unicast services there is now a level of maturity and feature consistency across vendors for the most common deployment models; vlan-based, A-A and symmetric IRB, to ensure successful interop. This is illustrated in the EANTC test reports of the past few years, with the focus now moving to more complex deployment models involving dual-stack, OISM and EVPN GWs.
I don’t think we should make light of the EANTC work, it’s an independent test event which all the major vendors attend, with the sole aim of validating interoperability through compliance to the IETF standards. So the focus is not just making things work but rather ensuring compliance to the relevant standard(s) with the results independently verified.
Happy to discuss further if you have any questions.
Alex
Thanks for the feedback Alex!
Just a minor detail: I absolutely didn't want to make light of EANTC work; I just wanted to say that the environment in which those tests are executed often differs slightly from what one might encounter in a typical deployment. I would suspect that the software images are not exactly the LTS releases, and the people configuring the boxes might know a bit more about the nerd knobs and inner workings of the devices than an average networking engineer.
Best, Ivan
Ivan, very interesting, thanks for this.
A couple of things called my attention for srl:
Central routing Unnumbered EBGP
Both are fully supported in srlinux. For central routing, proxy-arp is needed in the layer-2 leaf node, since srlinux assumes the central IRB mac/ip will be distributed via EVPN.
In case it helps.. Thanks. Jorge
Thanks a million for the feedback! We solved the unnumbered EBGP in the meantime (it works now).
For the central routing case, everyone else works with pure layer-2 switching at the edge, so I'm not going to change the setup but will add a caveat to the test results.
Thanks again, Ivan