Category: data center
Pete Lumbis started his Cumulus Linux 4.0 update with an overview of differences between Cumulus Linux on hardware switches and Cumulus VX, and continued with an in-depth list of ASIC families supported by Cumulus Linux.
You can watch his presentation, as well as the more in-depth overview of Cumulus Linux concepts by Dinesh Dutt, in the recently-updated What Is Cumulus Linux All About video.
I got this question about the use of AS numbers on data center leaf switches participating in an MLAG cluster:
In the Leaf-and-Spine Fabric Architectures you made the recommendation to have the same AS number on all members of an MLAG cluster and run iBGP between them. In the Autonomous Systems and AS Numbers article you discuss the option of having different AS number per leaf. Which one should I use… and do I still need the EBGP peering between the leaf pair?
As always, there’s a bit of a gap between theory and practice ;), but let’s start with a leaf-and-spine fabric diagram illustrating both concepts:
When I started designing Data Center Infrastructure for Networking Engineers webinar I wanted to create something that would allow someone fluent in networking but not in adjacent fields like servers or storage to grasp the fundamentals of data center technologies, from server virtualization and containers to data center fabrics and storage protocols.
Here’s what a network architect said about the webinar:
Whenever I was comparing VMware NSX and Cisco ACI a few years ago (in late 2010s in case you’re reading this in a far-away future), someone would inevitably ask “and how would you connect a bare metal server to a VMware NSX environment?”
While NSX-T has that capability since release 2.5 (more about that in a later blog post), let’s start with the big question: why would you need to?
Got mentioned in this tweet a while ago:
Watching @ApstraInc youtube stream regarding BGP in the DC with @doyleassoc and @jtantsura.Maybe BGP is getting bigger and bigger traction from big enterprise data centers but I still see an IGP being used frequently. I am eager to have @ioshints opinion on that hot subject.
Maybe I’ve missed some breaking news, but assuming I haven’t my opinion on that subject hasn’t changed.
Another interesting question I got from an ipSpace.net subscriber:
Assuming we can simplify the physical network when using overlay virtual network solutions like VMware NSX, do we really need datacenter switches (example: Cisco Nexus instead of Catalyst product line) to implement the underlay?
Let’s recap what we really need to run VMware NSX:
TL&DR: It’s 2020, and VXLAN with EVPN is all the rage. Thank you, you can stop reading.
On a more serious note, I got this questions from an Johannes Spanier after he read my do we need complex data center switches for NSX underlay blog post:
Would you agree that for smaller NSX designs (~100 hypervisors) a much simpler Layer2 based access-distribution design with MLAGs is feasible? One would have two distribution switches and redundant access switches MLAGed together.
I would still prefer VXLAN for a number of reasons:
Every now and then someone tries to justify the “wisdom” of migrating VMs from on-premises data center into a public cloud (without renumbering them) with the idea of “scaling out into the public cloud” aka “cloud bursting”. My usual response: this is another vendor marketing myth that works only in PowerPoint.
To be honest, that statement is too harsh. You can easily scale your application into a public cloud assuming that:
Dinesh Dutt, a pragmatic IP routing guru, the mastermind behind great concepts like simplified BGP configuration, and one of the best ipSpace.net authors, finally decided to start blogging. His first article: describing the impact of having 256 100GE ports in a single ASIC (Tomahawk 4). Hope you’ll enjoy his musings as much as I did ;)
Every now and then I find an IT professional claiming we should not be worried about split-brain scenarios because you have redundant links.
I might understand that sentiment coming from software developers, but I also encountered it when discussing stretched clusters or even SDN controllers deployed across multiple data centers.
Finally I found a great analogy you might find useful. A reader of my blog pointed me to the awesome Why Must Systems Be Operated blog post explaining the same problem from the storage perspective, so the next time you might want to use this one: “so you’re saying you don’t need backup because you have RAID disks”. If someone agrees with that, don’t walk away… RUN!
Got this question from one of ipSpace.net subscribers:
Do we really need those intelligent datacenter switches for underlay now that we have NSX in our datacenter? Now that we have taken a lot of the intelligence out of our underlying network, what must the underlying network really provide?
Reading the marketing white papers the answer would be IP connectivity… but keep in mind that building your infrastructure based on information from vendor white papers usually gives you the results your gullibility deserves.
I always tell networking engineers attending our Building Network Automation Solutions online course to create minimalistic data models with (preferably) no redundant information. Not surprisingly, that’s a really hard task (see this article for an example) - using a simple automation tool like Ansible you end with either a messy and redundant data model or Jinja2 templates (or Ansible playbooks) full of hard-to-understand and impossible-to-maintain business logic.
Stephen Harding solved this problem the right way: his data center fabric deployment solution uses a dynamic inventory script that translates operator-friendly fabric description (data model) into template-friendly set of device variables.
Here’s an interesting tidbit from “Last Week in AWS” blog:
From a philosophical point of view, AWS fundamentally considers an API to be a promise. Services that aren’t promoted anymore are still available […] Think about that for a second - a service launched 13 years ago is still actively supported to the point where you can use it today.
This is a common objection I get when trying to persuade network architects they don’t need stretched VLANs (and IP subnets) to implement data center disaster recovery:
Changing IP addresses when activating DR is hard. You’d have to weigh the manageability of stretching L2 and protecting it, with the added complexity of breaking the two sites into separate domains [and subnets]. We all have apps with hardcoded IP’s, outdated IPAM’s, Firewall rules that need updating, etc.
Let’s get one thing straight: when you’re doing disaster recovery there are no live subnets, IP addresses or anything else along those lines. The disaster has struck, and your data center infrastructure is gone.
One of the responses to my Disaster Recovery Faking blog post focused on failure domains:
What is the difference between supporting L2 stretched between two pods in your DC (which everyone does for seamless vMotion), and having a 30ms link between these two pods because they happen to be in different buildings?
I hope you agree that a single broadcast domain is a single failure domain. If not, let agree to disagree and move on - my life is too short to argue about obvious stuff.