Category: design
Leaf-and-Spine Fabric Myths (Part 2)
The next set of Leaf-and-Spine Fabric Myths listed by Evil CCIE focused on BGP:
BGP is the best choice for leaf-and-spine fabrics.
I wrote about this particular one here. If you’re not a BGP guru don’t overcomplicate your network. OSPF, IS-IS, and EIGRP are good enough for most environments. Also, don’t ever turn BGP into RIP with AS-path length serving as hop count.
Don't Make a Total Mess When Dealing with Exceptions
A while ago I had the dubious “privilege” of observing how my “beloved” airline Adria Airways deals with exceptions. A third-party incoming flight was 2.5 hours late and in their infinite wisdom (most probably to avoid financial impact) they decided to delay a half-dozen outgoing flights for 20-30 minutes while waiting for the transfer passengers.
Not surprisingly, when that weird thingy landed and they started boarding the outgoing flights (now all at the same time), the result was a total mess with busses blocking each other (this same airline loves to avoid jet bridges).
Implications of Valley-Free Routing in Data Center Fabrics
As I explained in a previous blog post, most leaf-and-spine best-practices (as in: what to do if you have no clue) use BGP as the IGP routing protocol (regardless of whether it’s needed) with the same AS number shared across all spine switches to implement valley-free routing.
This design has an interesting consequence: when a link between a leaf and a spine switch fails, they can no longer communicate.
Valley-Free Routing in Data Center Fabrics
You might have noticed that almost every BGP as Data Center IGP design uses the same AS number on all spine switches (there are exceptions coming from people who use BGP as RIP with AS-path length serving as hop count… but let’s not go there).
There are two reasons for that design choice:
Routing in Data Center: What Problem Are You Trying to Solve?
Here’s a question I got from an attendee of my Building Next-Generation Data Center online course:
As far as I understood […] it is obsolete nowadays to build a new DC fabric with routing on the host using BGP, the proper way to go is to use IGP + SDN overlay. Is my understanding correct?
Ignoring for the moment the fact that nothing is ever obsolete in IT, the right answer is it depends… this time on answer(s) to two seemingly simple questions “what services are we offering?” and “what connectivity problem are we trying to solve?”.
Repost: Tony Przygienda on Valley-Free (or Non-ZigZag) Routing
Most blog posts generate the usual noise from the anonymous peanut gallery (if only they’d have at least a sliver of Statler and Waldorf in them), but every now and then there’s a comment that’s pure gold. The one made by Tony Przygienda (of RIFT fame) on Valley-Free Routing post is so good and relevant that I decided to republish it as a separate blog post. Enjoy!
Valley-Free Routing
Reading academic articles about Internet-wide routing challenges you might stumble upon valley-free routing – a pretty important concept with applications in WAN and data center routing design.
If you’re interested in the academic discussions, you’ll find a pretty exhaustive list of papers on this topic in the Informative References section of RFC 7908; here’s the over-simplified version.
Do We Need Leaf-and-Spine Fabrics?
Evil CCIE left a lengthy comment on one of my blog posts including this interesting observation:
It's always interesting to hear all kind of reasons from people to deploy CLOS fabrics in DC in Enterprise segment typically that I deal with while they mostly don't have clue about why they should be doing it in first place. […] Usually a good justification is DC to support high amount of East-West Traffic....but really? […] Ask them if they even have any benchmarks or tools to measure that in first place :)
What he wrote proves that most networking practitioners never move beyond regurgitating vendor marketing (because that’s so much easier than making the first step toward becoming an engineer by figuring out how technology really works).
Traditional Leaf-and-Spine Fabric Versus Cisco ACI
One of my subscribers wondered whether it would make sense to build a traditional leaf-and-spine fabric or go for Cisco ACI. He started his email with:
One option is a "standalone" Spine/Leaf VXLAN-with EVPN deployment based on Nexus equipment. This approach could probably be accompanied by some kind of automation like Ansible to ease operation/maintenance of the network.
This is what I would do these days if the customer feels comfortable investing at least the minimum amount of work into an automation solution. Having simpler technology + well-understood automation solution is (in my biased opinion) better than having a complex black box.
Schneier’s Law Applied to Networking
A while ago I stumbled upon Schneier’s law (must-read):
Any person can invent a security system so clever that she or he can't think of how to break it.
I’m pretty sure there’s a networking equivalent:
Any person can create a clever network design that is so complex that she or he can't figure out how it will fail in production.
I know I’ve been there with my early OSPF network designs.
Start with Business Requirements, not Technology
This is the feedback I got from someone who used ExpertExpress to discuss the evolution of their data center:
The session has greatly simplified what had appeared to be a complex and difficult undertaking for us. Great to get fresh ideas on how we could best approach our requirements and with the existing equipment we have. Very much looking forward to putting into practice what we discussed.
And here’s what Nicola Modena (the expert working with the customer) replied:
As I told you, the problem is usually to map the architectures and solutions that are found in books, whitepapers, and validated designs into customer’s own reality, then to divide the architecture into independent functional layers, and most importantly to always start from requirements and not technology.
A really good summary of what ipSpace.net is all about ;) Thank you, Nicola!
Avoid Summarization in Leaf-and-Spine Fabrics
I got this design improvement suggestion after publishing When Is BGP No Better than OSPF blog post:
Putting all the leafs in the same ASN and filtering routes sent down to the leafs (sending just a default) are potential enhancements that make BGP a nice option.
Tony Przygienda quickly wrote a one-line rebuttal: “unless links break ;-)”
Is EBGP Really Better than OSPF in Leaf-and-Spine Fabrics?
Using EBGP instead of an IGP (OSPF or IS-IS) in leaf-and-spine data center fabrics is becoming a best practice (read: thing to do when you have no clue what you’re doing).
The usual argument defending this design choice is “BGP scales better than OSPF or IS-IS”. That’s usually true (see also: Internet), and so far, EBGP is the only reasonable choice in very large leaf-and-spine fabrics… but does it really scale better than a link-state IGP in smaller fabrics?
Scaling EVPN BGP Routing Designs
As discussed in a previous blog post, IETF designed EVPN to be next-generation BGP-based VPN technology providing scalable layer-2 and layer-3 VPN functionality. EVPN was initially designed to be used with MPLS data plane and was later extended to use numerous data plane encapsulations, VXLAN being the most common one.
Design Requirements
Like any other BGP-based solution, EVPN uses BGP to transport endpoint reachability information (customer MAC and IP addresses and prefixes, flooding trees, and multi-attached segments), and relies on an underlying routing protocol to provide BGP next-hop reachability information.
Dissecting IBGP+EBGP Junos Configuration
Networking engineers familiar with Junos love to tell me how easy it is to configure and operate IBGP EVPN overlay on top of EBGP IP underlay. Krzysztof Szarkowicz was kind enough to send me the (probably) simplest possible configuration (here’s another one by Alexander Grigorenko)