Category: data center
In June 2020 I published the first part of Redundant Server Connectivity in Layer-3-Only Fabrics article describing the target design and application-layer requirements.
One of the readers commenting the ideas in my Disaster Recovery and Failure Domains blog post effectively said “In an active/passive DR scenario, having L3 DCI separation doesn’t protect you from STP loop/flood in your active DC, so why do you care?”
He’s absolutely right - if you have a cold disaster recovery site, it doesn’t matter if it’s bombarded by a gazillion flooded packets per second… but how often do you have a cold recovery site?
A long while ago I decided to write an article explaining how you could run VMware NSX on ESXi servers with redundant connections to two top-of-rack switches on top of a layer-3-only fabric (a fabric with IP subnets and VLANs limited to a single top-of-rack switch). Turns out that’s Mission Impossible, so I put the article on the back burner and slowly forgot about it.
Well, not exactly. Every now and then my subconsciousness would kick it up and I’d figure out yet-another reason why it’s REALLY hard to do it right. After a while, I decided to try again, and completely rewrote the article. The first part is already online, more details coming (hopefully) soon.
Pete Lumbis started his Cumulus Linux 4.0 update with an overview of differences between Cumulus Linux on hardware switches and Cumulus VX, and continued with an in-depth list of ASIC families supported by Cumulus Linux.
You can watch his presentation, as well as the more in-depth overview of Cumulus Linux concepts by Dinesh Dutt, in the recently-updated What Is Cumulus Linux All About video.
I got this question about the use of AS numbers on data center leaf switches participating in an MLAG cluster:
In the Leaf-and-Spine Fabric Architectures you made the recommendation to have the same AS number on all members of an MLAG cluster and run iBGP between them. In the Autonomous Systems and AS Numbers article you discuss the option of having different AS number per leaf. Which one should I use… and do I still need the EBGP peering between the leaf pair?
As always, there’s a bit of a gap between theory and practice ;), but let’s start with a leaf-and-spine fabric diagram illustrating both concepts:
When I started designing Data Center Infrastructure for Networking Engineers webinar I wanted to create something that would allow someone fluent in networking but not in adjacent fields like servers or storage to grasp the fundamentals of data center technologies, from server virtualization and containers to data center fabrics and storage protocols.
Here’s what a network architect said about the webinar:
Whenever I was comparing VMware NSX and Cisco ACI a few years ago (in late 2010s in case you’re reading this in a far-away future), someone would inevitably ask “and how would you connect a bare metal server to a VMware NSX environment?”
While NSX-T has that capability since release 2.5 (more about that in a later blog post), let’s start with the big question: why would you need to?
Got mentioned in this tweet a while ago:
Watching @ApstraInc youtube stream regarding BGP in the DC with @doyleassoc and @jtantsura.Maybe BGP is getting bigger and bigger traction from big enterprise data centers but I still see an IGP being used frequently. I am eager to have @ioshints opinion on that hot subject.
Maybe I’ve missed some breaking news, but assuming I haven’t my opinion on that subject hasn’t changed.
Another interesting question I got from an ipSpace.net subscriber:
Assuming we can simplify the physical network when using overlay virtual network solutions like VMware NSX, do we really need datacenter switches (example: Cisco Nexus instead of Catalyst product line) to implement the underlay?
Let’s recap what we really need to run VMware NSX:
TL&DR: It’s 2020, and VXLAN with EVPN is all the rage. Thank you, you can stop reading.
On a more serious note, I got this questions from an Johannes Spanier after he read my do we need complex data center switches for NSX underlay blog post:
Would you agree that for smaller NSX designs (~100 hypervisors) a much simpler Layer2 based access-distribution design with MLAGs is feasible? One would have two distribution switches and redundant access switches MLAGed together.
I would still prefer VXLAN for a number of reasons:
Every now and then someone tries to justify the “wisdom” of migrating VMs from on-premises data center into a public cloud (without renumbering them) with the idea of “scaling out into the public cloud” aka “cloud bursting”. My usual response: this is another vendor marketing myth that works only in PowerPoint.
To be honest, that statement is too harsh. You can easily scale your application into a public cloud assuming that:
Dinesh Dutt, a pragmatic IP routing guru, the mastermind behind great concepts like simplified BGP configuration, and one of the best ipSpace.net authors, finally decided to start blogging. His first article: describing the impact of having 256 100GE ports in a single ASIC (Tomahawk 4). Hope you’ll enjoy his musings as much as I did ;)
Every now and then I find an IT professional claiming we should not be worried about split-brain scenarios because you have redundant links.
I might understand that sentiment coming from software developers, but I also encountered it when discussing stretched clusters or even SDN controllers deployed across multiple data centers.
Finally I found a great analogy you might find useful. A reader of my blog pointed me to the awesome Why Must Systems Be Operated blog post explaining the same problem from the storage perspective, so the next time you might want to use this one: “so you’re saying you don’t need backup because you have RAID disks”. If someone agrees with that, don’t walk away… RUN!
Got this question from one of ipSpace.net subscribers:
Do we really need those intelligent datacenter switches for underlay now that we have NSX in our datacenter? Now that we have taken a lot of the intelligence out of our underlying network, what must the underlying network really provide?
Reading the marketing white papers the answer would be IP connectivity… but keep in mind that building your infrastructure based on information from vendor white papers usually gives you the results your gullibility deserves.
I always tell networking engineers attending our Building Network Automation Solutions online course to create minimalistic data models with (preferably) no redundant information. Not surprisingly, that’s a really hard task (see this article for an example) - using a simple automation tool like Ansible you end with either a messy and redundant data model or Jinja2 templates (or Ansible playbooks) full of hard-to-understand and impossible-to-maintain business logic.
Stephen Harding solved this problem the right way: his data center fabric deployment solution uses a dynamic inventory script that translates operator-friendly fabric description (data model) into template-friendly set of device variables.