Another interesting question I got from an ipSpace.net subscriber:
Assuming we can simplify the physical network when using overlay virtual network solutions like VMware NSX, do we really need datacenter switches (example: Cisco Nexus instead of Catalyst product line) to implement the underlay?
Let’s recap what we really need to run VMware NSX:
TL&DR: It’s 2020, and VXLAN with EVPN is all the rage. Thank you, you can stop reading.
On a more serious note, I got this questions from an Johannes Spanier after he read my do we need complex data center switches for NSX underlay blog post:
Would you agree that for smaller NSX designs (~100 hypervisors) a much simpler Layer2 based access-distribution design with MLAGs is feasible? One would have two distribution switches and redundant access switches MLAGed together.
I would still prefer VXLAN for a number of reasons:
Dinesh Dutt, a pragmatic IP routing guru, the mastermind behind great concepts like simplified BGP configuration, and one of the best ipSpace.net authors, finally decided to start blogging. His first article: describing the impact of having 256 100GE ports in a single ASIC (Tomahawk 4). Hope you’ll enjoy his musings as much as I did ;)
Got this question from one of ipSpace.net subscribers:
Do we really need those intelligent datacenter switches for underlay now that we have NSX in our datacenter? Now that we have taken a lot of the intelligence out of our underlying network, what must the underlying network really provide?
Reading the marketing white papers the answer would be IP connectivity… but keep in mind that building your infrastructure based on information from vendor white papers usually gives you the results your gullibility deserves.
A little while ago I explained why you can’t use more than 4K VXLAN segments on a ToR switch (at least with most ASICs out there). Does that mean that you’re limited to a total of 4K virtual ethernet segments?
Of course not.
You could implement overlay virtual networks in software (on hypervisors or container hosts), although even there the enterprise products rarely give you more than a few thousand logical switches (to use NSX terminology)… but that’s a product, not technology limitation. Large public cloud providers use the same (or similar) technology to run gazillions of tenant segments.
I always tell attendees in the Building Network Automation Solutions to create minimalistic data models with (preferably) no redundant information. Not surprisingly, that’s a really hard task (see this article for an example) - using a simple automation tool like Ansible you end with either a messy and redundant data model or Jinja2 templates (or Ansible playbooks) full of hard-to-understand and impossible-to-maintain business logic.
An attendee in my Building Next-Generation Data Center online course was asked to deploy numerous relatively small OpenStack cloud instances and wanted select the optimum virtual networking technology. Not surprisingly, every $vendor had just the right answer, including Arista:
We’re considering moving from hypervisor-based overlays to ToR-based overlays using Arista’s CVX for approximately 2000 VLANs.
As I explained in Overlay Virtual Networking, Networking in Private and Public Clouds and Designing Private Cloud Infrastructure (plus several presentations) you have three options to implement virtual networking in private clouds:
One of my readers sent me this email after reading my Loop Avoidance in VXLAN Networks blog post:
Not much has changed really! It’s still a flood/learn bridged network, at least in parts. We count 2019 and talk a lot about “fabrics” but have 1980’s networks still.
The networking fundamentals haven’t changed in the last 40 years. We still use IP (sometimes with larger addresses and augmentations that make it harder to use and more vulnerable), stream-based transport protocol on top of that, leak addresses up and down the protocol stack, and rely on technology that was designed to run on 500 meters of thick yellow cable.
One of the attendees of the Building Next-Generation Data Center online course solved the build small data center fabric challenge with Virtual Chassis Fabric (VCF). I pointed out that I would prefer not to use VCF as it uses centralized control plane and is thus a single failure domain.
Here are his arguments for using VCF:
In the market overview section of the introductory part of data center fabric architectures webinar I made a recommendation to use larger number of fixed-configuration spine switches instead of two chassis-based spines when building a medium-sized leaf-and-spine fabric, and explained the reasoning behind it (increased availability, reduced impact of spine failure).
One of the attendees wondered about the “right” number of spine switches – does it has to be four, or could you have three or five spines. In his words:
Layer 2 Fabrics can't be extended beyond 2 Spine switches. I had a long argument with a $vendor guys on this. They don't even count SPB as Layer 2 fabric and so forth.
The root cause of this myth is the lack of understanding of what layer-2, layer-3, bridging and routing means. You might want to revisit a few of my very old blog posts before moving on: part 1, part 2, what is switching, layer-3 switches and routers.
BGP is the best choice for leaf-and-spine fabrics.
I wrote about this particular one here. If you’re not a BGP guru don’t overcomplicate your network. OSPF, IS-IS, and EIGRP are good enough for most environments. Also, don’t ever turn BGP into RIP with AS-path length serving as hop count.
Apart from the “they have no clue what they’re talking about” observation, Evil CCIE left a long list of leaf-and-spine fabric myths he encountered in the wild in a comment on one of my blog posts. He started with:
Clos fabric (aka Leaf And Spine fabric) is a non-blocking fabric
That was obviously true in the days when Mr. Clos designed the voice switching solution that still bears his name. In the original Clos network every voice call would get a dedicated path across the fabric, and the number of voice calls supported by the fabric equaled the number of alternate end-to-end paths.
As I explained in a previous blog post, most leaf-and-spine best-practices (as in: what to do if you have no clue) use BGP as the IGP routing protocol (regardless of whether it’s needed) with the same AS number shared across all spine switches to implement valley-free routing.
This design has an interesting consequence: when a link between a leaf and a spine switch fails, they can no longer communicate.
For example, when the link between L1 and C1 in the following diagram fails, there’s no connectivity between L1 and C1 as there’s no valley-free path between them.
One of the attendees of my Building Next-Generation Data Center online course tried to figure out whether you can build larger broadcast domains with VXLAN than you could with VLANs. Here’s what he sent me:
I'm trying to understand differences or similarities between VLAN and VXLAN technologies in a view of (*cast) domain limitation.