Category: fabric
OpenFabric with Russ White on Software Gone Wild
Continuing the series of data center routing protocol podcasts, we sat down with Russ White (of the CCDE fame), author of another proposal: OpenFabric.
As always, we started with the “what’s wrong with what we have right now, like using BGP as a better IGP” question, resulting in “BGP is becoming the trash can of the Internet”.
Pragmatic Data Center Fabrics
I always love to read the practical advice by Andrew Lerner. Here’s another gem that matches what Brad Hedlund, Dinesh Dutt and myself (plus numerous others) have been saying for ages:
One specific recommendation we make in the research is to “Build a rightsized physical infrastructure by using a leaf/spine design with fixed-form factor switches and 25/100G capable interfaces (that are reverse-compatible with 10G).”
There’s a slight gotcha in that advice: it trades implicit complexity of chassis switches with explicit complexity of fixed-form switches.
Video: Automated Data Center Fabric Deployment Demo
I was focused on network automation this week, starting with a 2-day workshop and continuing with an overview of real-life automation wins. Let’s end the week with another automation story: automated data center fabric deployment demonstrated by Dinesh Dutt during his part of Network Automation Use Cases webinar.
You’ll need at least free ipSpace.net subscription to watch the video.
How Self-Sufficient Do You Want to Be?
The first car I got decades ago was a simple mechanical beast – you’d push something, and a cable would make sure something else moved somewhere. I could also fix 80% of the problems, and people who were willing to change spark plugs and similar stuff could get to 90+%.
Today the cars are distributed computer systems that nobody can fix once they get a quirk that is not discoverable with level-1 diagnostic tools.
Using EVPN in Very Small Data Center Fabrics
I had an interesting “how do you build a small fabric without throwing every technology in the mix” discussion with Nicola Modena and mentioned that I don’t see a reason to use EVPN in fabrics with just a few switches. He disagreed and gave me a few good scenarios where EVPN might be handy. Before discussing them let’s establish a baseline.
The Setup
Assume you’re building two small data center fabrics (small because you have only a few hundred VMs and two because of redundancy and IT auditors).
Video: Big- or Small-Buffer Switches
After describing the basics of internal data center switch architectures, JR Rivers focused on the crux of the problem the vendors copiously exploit to create a confusopoly: is it better to use big- or small-buffer switches?
BGP in EVPN-Based Data Center Fabrics
EVPN is one of the major reasons we’re seeing BGP used in small and mid-sized data center fabrics. In theory, EVPN is just a BGP address family and shouldn’t impact your BGP design. However, suboptimal implementations might invalidate that assumption.
I've described a few EVPN-related BGP gotchas in BGP in EVPN-Based Data Center Fabrics, a section of Using BGP in Data Center Leaf-and-Spine Fabrics article.
Video: Avaya [now Extreme] Data Center Solutions
I haven’t done an update on what Avaya was doing in the data center space for years, so I asked my good friend Roger Lapuh to do a short presentation on:
- Avaya’s data center switches and their Shortest Path Bridging (SPB) fabric;
- SPB fabric features;
- Interesting use cases enabled by SPB fabric.
The videos are now available to everyone with a valid ipSpace.net account – the easiest way to get it is a trial subscription.
Video: Switch Buffer Architectures
A while ago (in the time of big-versus-small buffers brouhaha), I asked JR Rivers to do a short presentation focusing on buffering requirements of data center switches. He started by describing typical buffer architectures you might find in data center switches.
BGP as a Better IGP? When and Where?
A while ago I helped a large enterprise redesign their data center fabric. They did a wonderful job optimizing their infrastructure, so all they really needed were two switches in each location.
Some vendors couldn’t fathom that. One of them proposed to build a “future-proof” (and twice as expensive) leaf-and-spine fabric with two leaves and two spines. On top of that they proposed to use EBGP as the only routing protocol because draft-lapukhov-bgp-routing-large-dc – a clear case of missing the customer needs.
Security or Convenience, That’s the Question
One of my readers was so delighted that something finally happened after I wrote about a NX-OS bug that he sent me a pointer to another one that has been pending for a long while, and is now officially terminated as FAD (Functions-as-Designed… even documented in the Further Problem Description).
Here’s what he wrote (slightly reworded)…
Let’s Pretend We Run Distributed Storage over a Thick Yellow Cable
One of my friends wanted to design a nice-and-easy layer-3 leaf-and-spine fabric for a new data center, and got blindsided by a hyperconverged vendor. Here’s what he wrote:
We wanted to have a spine/leaf L3 topology for an NSX deployment but can’t do that because the Nutanix servers require L2 between their nodes so they can be in the same cluster.
I wanted to check his claims, but Nutanix doesn’t publish their documentation (I would consider that a red flag), so I’m assuming he’s right until someone proves otherwise (note: whitepaper is not a proof of anything ;).
Optimize Data Center Infrastructure: Build an Optimized Fabric
I published the last part of my Optimize Data Center Infrastructure series: build an optimized data center fabric.
To learn more about data center fabric designs, check the new online course or enroll into the Spring 2018 session of Building Next-Generation Data Center course.
Pluribus Networks… 2 Years Later
I first met Pluribus Networks 2.5 years ago during their Networking Field Day 9 presentation, which turned controversial enough that I was advised not to wear the same sweater during NFD16 to avoid jinxing another presentation (I also admit to be a bit biased in those days based on marketing deja-moo from a Pluribus sales guy I’d been exposed to during a customer engagement).
Pluribus NFD16 presentations were better; here’s what I got from them:
Another Reason to Run Linux on Your Data Center Switches
Arista’s OpenFlow implementation doesn’t support TLS encryption. Usually that’s not a big deal, as there aren’t that many customers using OpenFlow anyway, and those that do hopefully do it over a well-protected management network.
However, lack of OpenFlow TLS encryption might become an RFP showstopper… not because the customer would really need it but because the customer is in CYA mode (we don’t know what this feature is or why we’d use it, but it might be handy in a decade, so we must have it now) or because someone wants to eliminate certain vendors based on some obscure missing feature.