As I explained in a previous blog post, most leaf-and-spine best-practices (as in: what to do if you have no clue) use BGP as the IGP routing protocol (regardless of whether it’s needed) with the same AS number shared across all spine switches to implement valley-free routing.
This design has an interesting consequence: when a link between a leaf and a spine switch fails, they can no longer communicate.
For example, when the link between L1 and C1 in the following diagram fails, there’s no connectivity between L1 and C1 as there’s no valley-free path between them.
No big deal, right? After all, we built the data center fabric to exchange traffic between external devices attached to leaf nodes. Well, I hope you haven’t connected a firewall or load balancer straight to the spine switches using MLAG or a similar trick.
Lesson learned: connect all external devices (including network services devices) to leaf switches. Spine switches should provide nothing more than intra-fabric connectivity.
There’s another interesting consequence. As you might know, some vendors love designs that use IBGP or EBGP EVPN sessions between loopback interfaces of leaf and spine switches on top of EBGP underlay.
Guess what happens after the L1-C1 link failure: the EVPN session between loopback interfaces (regardless of whether it’s an IBGP or EBGP session) is lost no matter what because L1 cannot reach C1 (and vice versa) anyway.
Inevitable conclusion: all the grand posturing explaining how EVPN sessions between loopback interfaces running on top of undelay EBGP are so much better than EVPN running as an additional address family on the directly-connected EBGP session in a typical data center leaf-and-spine fabric is plain merde (pardon my French).
Even worse, I’ve seen a vendor-produced design that used:
- EBGP in a small fabric that would work well enough with OSPF for the foreseeable future;
- IBGP EVPN sessions between loopback interfaces of leaf and spine switches;
- Different AS numbers on spine switches to make it all work, turning underlay EBGP into a TCP version of RIPv2 using AS path length as the hop count.
As I said years ago: the road to broken design is paved with great recipes.
Lesson learned: whenever evaluating a design, consider all possible failure scenarios.
Don’t get me wrong. There might be valid reasons to use IBGP EVPN sessions on top of EBGP underlay. There are valid reasons to use IBGP route reflectors implemented as VNF appliances for scalability… but the designs promoted by most networking vendors these days make little sense once you figure out how routing really works.
For the few people interested in the red pill
If you want to know more about leaf-and-spine fabrics (and be able to figure out where exactly the vendor marketers cross the line between unicorn-colored reality and plain bullshit), start with the Leaf-and-Spine Fabric Architectures and EVPN Technical Deep Dive webinars (both are part of Standard ipSpace.net subscription).
You can take one step further and enroll in the Designing and Building Data Center Fabrics online course which includes three design assignments reviewed by a member of ipSpace.net ExpertExpress team.
Finally, when you want to be able to design more than just the data center fabrics, check out the Building Next-Generation Data Center online course.