Category: data center
When I still cared about CCIE certification, I was always tripped up by the weird scenario with (A) mismatched ARP and MAC timeouts and (B) default gateway outside of the forwarding path. When done just right you could get persistent unicast flooding, and I’ve met someone who reported average unicast flooding reaching ~1 Gbps in his data center fabric.
One would hope that we wouldn’t experience similar problems in modern leaf-and-spine fabrics, but one of my readers managed to reproduce the problem within a single subnet in FabricPath with anycast gateway on spine switches when someone misconfigured a subnet mask in one of the servers.
A few weeks ago we published an interesting discussion on network operating system details based on an excellent set of questions by James Miles.
- How hard is it to virtualize network devices?
- What is the expected performance degradation?
- Does it make sense to use containers to do that?
- What are the operational implications of running virtual network devices?
- What will be the impact on hardware vendors and networking engineers?
And of course we couldn’t avoid the famous last question: “Should network engineers program network devices?”
The designers of Cumulus Linux CLI were always focused on simplifying network device configurations. One of the first features along these lines was BGP across unnumbered interfaces, then they introduced simplified EVPN configurations, and recently auto-MLAG and auto-BGP.
You can watch a short description of these features by Dinesh Dutt and Pete Lumbis in Simplify Network Configuration with Cumulus Linux and Smart Datacenter Defaults videos (part of Cumulus Linux section of Data Center Fabrics webinar).
Before we start: if you’re new to my blog (or stumbled upon this blog post by incident) you might want to read the Considerations for Host-Based Firewalls for a brief overview of the challenge, and my explanation why flow-tracking tools cannot be used to auto-generate firewall policies.
As expected, the “you cannot do it” post on LinkedIn generated numerous comments, ranging from good ideas to borderline ridiculous attempts to fix a problem that has been proven to be unfixable (see also: perpetual motion).
In mid-September, Carl Buchmann, Fred Hsu, and Thomas Grimonet had an excellent presentation describing Arista’s Ansible roles and collections. They focused on two collections: CloudVision integration, and Arista Validated Designs. All the videos from that presentation are available with free ipSpace.net subscription.
Want to know even more about Ansible and network automation? Join our 2-day automation event featuring network automation experts from around the globe talking about their production-grade automation solutions or tools they created, and get immediate access to automation course materials and reviewed hands-on exercises.
While I keep telling you that Google-sized solutions aren’t necessarily the best fit for your environment, some of the hyperscaler presentations contain nuggets that apply to any environment no matter how small it is.
One of those must-watch presentations is Fault Tolerance through Optimal Workload Placement together with a wonderful TL&DR summary by the one-and-only Todd Hoff of the High Scalability fame.
James Miles got tons of really interesting questions while watching the Network Operating System Models webinar by Dinesh Dutt, and the only reasonable thing to do when he sent them over was to schedule a Q&A session with Dinesh to discuss them.
We got together last week and planned to spend an hour or two discussing the questions, but (not exactly unexpectedly) we got only halfway through the list in the time we had, so we’re continuing next week.
Earlier this year, Pete Lumbis returned as an ipSpace.net webinar guest speaker with a great presentation describing data center switching ASICs from the perspective of networking engineers. After a brief intro, he started with ASIC Basics… a topic which generated a 25-minute Q&A session.
Several engineers formerly working for a large virtualization vendor were pretty upset with me when I claimed that the virtualization consultants promote “disaster recovery using stretched VLANs” designs instead of alternatives that would implement proper separation of failure domains.
Guess what… it’s even worse than I thought.
Here’s a sequence of comments I received after reposting one of my “disaster recovery doesn’t need stretched VLANs” blog posts on LinkedIn sometime in late 2019:
Justin Pietsch published another must-read article, this time dealing with operational complexity of load balancers and IP multicast. Here are just a few choice quotes to get you started:
- A critical lesson I learned is that running out of capacity is the worst thing you can do in networking
- You can prevent a lot of problems if you can deep dive into an architecture and understand it’s tradeoffs and limitations
- Magic infrastructure is often extremely hard to troubleshoot and debug
You might find what he learned useful the next time you’re facing a unicorn-colored slide deck from your favorite software-defined or intent-based vendor ;))
In June 2020 I published the first part of Redundant Server Connectivity in Layer-3-Only Fabrics article describing the target design and application-layer requirements.
One of the readers commenting the ideas in my Disaster Recovery and Failure Domains blog post effectively said “In an active/passive DR scenario, having L3 DCI separation doesn’t protect you from STP loop/flood in your active DC, so why do you care?”
He’s absolutely right - if you have a cold disaster recovery site, it doesn’t matter if it’s bombarded by a gazillion flooded packets per second… but how often do you have a cold recovery site?
A long while ago I decided to write an article explaining how you could run VMware NSX on ESXi servers with redundant connections to two top-of-rack switches on top of a layer-3-only fabric (a fabric with IP subnets and VLANs limited to a single top-of-rack switch). Turns out that’s Mission Impossible, so I put the article on the back burner and slowly forgot about it.
Well, not exactly. Every now and then my subconsciousness would kick it up and I’d figure out yet-another reason why it’s REALLY hard to do it right. After a while, I decided to try again, and completely rewrote the article. The first part is already online, more details coming (hopefully) soon.
Pete Lumbis started his Cumulus Linux 4.0 update with an overview of differences between Cumulus Linux on hardware switches and Cumulus VX, and continued with an in-depth list of ASIC families supported by Cumulus Linux.
You can watch his presentation, as well as the more in-depth overview of Cumulus Linux concepts by Dinesh Dutt, in the recently-updated What Is Cumulus Linux All About video.
I got this question about the use of AS numbers on data center leaf switches participating in an MLAG cluster:
In the Leaf-and-Spine Fabric Architectures you made the recommendation to have the same AS number on all members of an MLAG cluster and run iBGP between them. In the Autonomous Systems and AS Numbers article you discuss the option of having different AS number per leaf. Which one should I use… and do I still need the EBGP peering between the leaf pair?
As always, there’s a bit of a gap between theory and practice ;), but let’s start with a leaf-and-spine fabric diagram illustrating both concepts: