Category: design
Valley-Free Routing in Data Center Fabrics
You might have noticed that almost every BGP as Data Center IGP design uses the same AS number on all spine switches (there are exceptions coming from people who use BGP as RIP with AS-path length serving as hop count… but let’s not go there).
There are two reasons for that design choice:
Routing in Data Center: What Problem Are You Trying to Solve?
Here’s a question I got from an attendee of my Building Next-Generation Data Center online course:
As far as I understood […] it is obsolete nowadays to build a new DC fabric with routing on the host using BGP, the proper way to go is to use IGP + SDN overlay. Is my understanding correct?
Ignoring for the moment the fact that nothing is ever obsolete in IT, the right answer is it depends… this time on answer(s) to two seemingly simple questions “what services are we offering?” and “what connectivity problem are we trying to solve?”.
Repost: Tony Przygienda on Valley-Free (or Non-ZigZag) Routing
Most blog posts generate the usual noise from the anonymous peanut gallery (if only they’d have at least a sliver of Statler and Waldorf in them), but every now and then there’s a comment that’s pure gold. The one made by Tony Przygienda (of RIFT fame) on Valley-Free Routing post is so good and relevant that I decided to republish it as a separate blog post. Enjoy!
Valley-Free Routing
Reading academic articles about Internet-wide routing challenges you might stumble upon valley-free routing – a pretty important concept with applications in WAN and data center routing design.
If you’re interested in the academic discussions, you’ll find a pretty exhaustive list of papers on this topic in the Informative References section of RFC 7908; here’s the over-simplified version.
Do We Need Leaf-and-Spine Fabrics?
Evil CCIE left a lengthy comment on one of my blog posts including this interesting observation:
It's always interesting to hear all kind of reasons from people to deploy CLOS fabrics in DC in Enterprise segment typically that I deal with while they mostly don't have clue about why they should be doing it in first place. […] Usually a good justification is DC to support high amount of East-West Traffic....but really? […] Ask them if they even have any benchmarks or tools to measure that in first place :)
What he wrote proves that most networking practitioners never move beyond regurgitating vendor marketing (because that’s so much easier than making the first step toward becoming an engineer by figuring out how technology really works).
Traditional Leaf-and-Spine Fabric Versus Cisco ACI
One of my subscribers wondered whether it would make sense to build a traditional leaf-and-spine fabric or go for Cisco ACI. He started his email with:
One option is a "standalone" Spine/Leaf VXLAN-with EVPN deployment based on Nexus equipment. This approach could probably be accompanied by some kind of automation like Ansible to ease operation/maintenance of the network.
This is what I would do these days if the customer feels comfortable investing at least the minimum amount of work into an automation solution. Having simpler technology + well-understood automation solution is (in my biased opinion) better than having a complex black box.
Schneier’s Law Applied to Networking
A while ago I stumbled upon Schneier’s law (must-read):
Any person can invent a security system so clever that she or he can't think of how to break it.
I’m pretty sure there’s a networking equivalent:
Any person can create a clever network design that is so complex that she or he can't figure out how it will fail in production.
I know I’ve been there with my early OSPF network designs.
Start with Business Requirements, not Technology
This is the feedback I got from someone who used ExpertExpress to discuss the evolution of their data center:
The session has greatly simplified what had appeared to be a complex and difficult undertaking for us. Great to get fresh ideas on how we could best approach our requirements and with the existing equipment we have. Very much looking forward to putting into practice what we discussed.
And here’s what Nicola Modena (the expert working with the customer) replied:
As I told you, the problem is usually to map the architectures and solutions that are found in books, whitepapers, and validated designs into customer’s own reality, then to divide the architecture into independent functional layers, and most importantly to always start from requirements and not technology.
A really good summary of what ipSpace.net is all about ;) Thank you, Nicola!
Avoid Summarization in Leaf-and-Spine Fabrics
I got this design improvement suggestion after publishing When Is BGP No Better than OSPF blog post:
Putting all the leafs in the same ASN and filtering routes sent down to the leafs (sending just a default) are potential enhancements that make BGP a nice option.
Tony Przygienda quickly wrote a one-line rebuttal: “unless links break ;-)”
Is EBGP Really Better than OSPF in Leaf-and-Spine Fabrics?
Using EBGP instead of an IGP (OSPF or IS-IS) in leaf-and-spine data center fabrics is becoming a best practice (read: thing to do when you have no clue what you’re doing).
The usual argument defending this design choice is “BGP scales better than OSPF or IS-IS”. That’s usually true (see also: Internet), and so far, EBGP is the only reasonable choice in very large leaf-and-spine fabrics… but does it really scale better than a link-state IGP in smaller fabrics?
Scaling EVPN BGP Routing Designs
As discussed in a previous blog post, IETF designed EVPN to be next-generation BGP-based VPN technology providing scalable layer-2 and layer-3 VPN functionality. EVPN was initially designed to be used with MPLS data plane and was later extended to use numerous data plane encapsulations, VXLAN being the most common one.
Design Requirements
Like any other BGP-based solution, EVPN uses BGP to transport endpoint reachability information (customer MAC and IP addresses and prefixes, flooding trees, and multi-attached segments), and relies on an underlying routing protocol to provide BGP next-hop reachability information.
Dissecting IBGP+EBGP Junos Configuration
Networking engineers familiar with Junos love to tell me how easy it is to configure and operate IBGP EVPN overlay on top of EBGP IP underlay. Krzysztof Szarkowicz was kind enough to send me the (probably) simplest possible configuration (here’s another one by Alexander Grigorenko)
Get Familiar with Leaf-and-Spine Fabrics
An attendee of my Building Next-Generation Data Center online course asked me what the best learning path might be for a total (data center) beginner that has to design and install a small leaf-and-spine fabric in a near future.
This blog post was written for ipSpace.net subscribers who want to get the most out of ipSpace.net content. If you’re only interested in free stuff, you might feel it’s a waste of your time. You’ve been warned ;)
Is OSPF or IS-IS Good Enough for My Data Center?
Our good friend mr. Anonymous has too many buzzwords and opinions in his repertoire, at least based on this comment he left on my Using 4-byte AS Numbers with EVPN blog post:
But IGPs don't scale well (as you might have heard) except for RIFT and Openfabric. The others are trying to do ECMP based on BGP.
Should you be worried about OSPF or IS-IS scalability when building your data center fabric? Short answer: most probably not. Before diving into a lengthy explanation let's give our dear friend some homework.
Pragmatic Data Center Fabrics
I always love to read the practical advice by Andrew Lerner. Here’s another gem that matches what Brad Hedlund, Dinesh Dutt and myself (plus numerous others) have been saying for ages:
One specific recommendation we make in the research is to “Build a rightsized physical infrastructure by using a leaf/spine design with fixed-form factor switches and 25/100G capable interfaces (that are reverse-compatible with 10G).”
There’s a slight gotcha in that advice: it trades implicit complexity of chassis switches with explicit complexity of fixed-form switches.