Category: data center
A few years ago, I was asked to deliver a What Is SDDC presentation that later became a webinar. I forgot about that webinar until I received feedback from one of the viewers a week ago:
If you like to learn from the teachers with the “straight to the point” approach and complement the theory with many “real-life” scenarios, then ipSpace.net is the right place for you.
Roman Pomazanov documented his thoughts on the beauties of large layer-2 domains in a LinkedIn article and allowed me to repost it on ipSpace.net blog to ensure it doesn’t disappear
First of all: “L2 is a single failure domain”, a problem at one point can easily spread to the entire datacenter.
TL&DR: Installing an Ethernet NIC with two uplinks in a server is easy1. Connecting those uplinks to two edge switches is common sense2. Detecting physical link failure is trivial in Gigabit Ethernet world. Deciding between two independent uplinks or a link aggregation group is interesting. Detecting path failure and disabling the useless uplink that causes traffic blackholing is a living hell (more details in this Design Clinic question).
Want to know more? Let’s dive into the gory details.
Did you know most chassis switches look like leaf-and-spine fabrics1 from the inside? If you didn’t, you might want to watch the short Chassis Architectures video by Pete Lumbis (author of ASICs for Networking Engineers part of the Data Center Fabric Architectures webinar).
One of the topics he couldn’t possibly skip was the question of how many packet buffers one needs in a data center switch.
One of my readers asked for my opinion about the provocative “It’s Time to Replace TCP in the Datacenter” article by prof. John Ousterhout. I started reading it, found too many things that didn’t make sense, and decided to ignore it as another attempt of a proverbial physicist solving hard problems in someone else’s field.
However, pointers to that article kept popping up, and I eventually realized it was a position paper in a long-term process that included conference talks, interviews and keynote speeches, so I decided to take another look at the technical details.
An attendee in the Building Next-Generation Data Center online course sent me an interesting dilemma:
Some customers don’t like EVPN because of complexity (it is required knowledge BGP, symmetric/asymmetric IRB, ARP suppression, VRF, RT/RD, etc). They agree, that EVPN gives more stability and broadcast traffic optimization, but still, it will not save DC from broadcast storms, because protections methods are the same for both solutions (minimize L2 segments, storm-control).
We’ll deal with the unnecessary EVPN-induced complexity some other time, today let’s start with a few intro-level details.
Almost exactly a decade ago I wrote that VXLAN isn’t a data center interconnect technology. That’s still true, but you can make it a bit better with EVPN – at the very minimum you’ll get an ARP proxy and anycast gateway. Even this combo does not address the other requirements I listed a decade ago, but maybe I’m too demanding and good enough works well enough.
However, there is one other bit that was missing from most VXLAN implementations: LAN-to-WAN VXLAN-to-VXLAN bridging. Sounds weird? Supposedly a picture is worth a thousand words, so here we go.
Last week I described some of the data center switching ASIC design tradeoffs and the ASIC families Broadcom created to fit somewhere in that multi-dimensional space.
Next step: how could you design your data center fabric to make the most out of them? To keep things simple, we’ll build a typical leaf-and-spine fabric with a WAN edge layer (sometimes called border leaf switches).
A brief mention of Broadcom ASIC families in the Networking Hardware/Software Disaggregation in 2022 blog post triggered an interesting discussion of ASIC features and where one should use different ASIC families.
Like so many things in life, ASIC design is all about tradeoffs. Usually you’re faced with a decision to either implement X (whatever X happens to be), or have high-performance product, or have a reasonably-priced product. It’s very hard to get two out of three, and getting all three is beyond Mission Impossible.
Are FabricPath, TRILL or SPB still alive, or has everyone moved to VXLAN? Are they worth studying?
TL&DR: Barely. Yes. No.
Layer-2 Fabric craziness exploded in 2010 with vendors playing the usual misinformation games that eventually resulted in totally fragmented market full of partial- or proprietary solutions. At one point in time, some HP data center switches supported only TRILL, and other data center switches from the same company supported only SPB.
Now for individual technologies:
One of my readers sent me an intriguing challenge based on the following design:
- He has a data center with two core switches (C1 and C2) and two Cisco Nexus edge switches (E1 and E2).
- He’s using static default routing from core to edge switches with HSRP on the edge switches.
- E1 is the active HSRP gateway connected to the primary WAN link.
The following picture shows the simplified network diagram:
Got this question from one of my readers:
When adopting the BGP on the VM model (say, a Kubernetes worker node on top of vSphere or KVM or Openstack), how do you deal with VM migration to another host (same data center, of course) for maintenance purposes? Do you keep peering with the old ToR even after the migration, or do you use some BGP trickery to allow the VM to peer with whatever ToR it’s closest to?
Short answer: you don’t.
Kubernetes was designed in a way that made worker nodes expendable. The Kubernetes cluster (and all properly designed applications) should recover automatically after a worker node restart. From the purely academic perspective, there’s no reason to migrate VMs running Kubernetes.