If you’re working solely with IP-based networks, you’re probably quick to assume that hop-by-hop destination-only forwarding is the only packet forwarding paradigm that makes sense. Not true, even today’s networks use a variety of forwarding mechanisms, most of them called some variant of routing or switching.
What exactly is the difference between the two, and what is bridging? I’m answering these questions (and a few others like what’s the difference between data-, control- and management planes) in the Bridging, Routing and Switching Terminology video.
CPU is not designed for the purpose of packet forwarding. One example is packet order retaining. It is impossible for a multicore CPU to retain the packet order as is received after parallel processing by multiple cores. Another example is scheduling. Yes CPU can do scheduling, but at a very high tax of CPU cycles.
We did a number of Software Gone Wild podcasts trying to figure out whether smart NICs address a real need or whether it’s just another vendor attempt to explore all potential markets. As expected, we got opposing views from Luke Gorrie claiming a NIC should be as simple as possible to Silvano Gai explaining how dedicated hardware performs the same operations at lower cost, lower power consumption and way higher speeds.
In theory, there’s no doubt that Silvano is right. Just look at how expensive some router line cards are, and try to figure out how much it would cost to get 25.6 Tbps of forwarding performance that we’ll get in a single ASIC (Tomahawk-4) in software (assuming ~10 Gbps per CPU core). High-speed core packet forwarding has to be done in dedicated hardware.
Greg Ferro is back with some great technical content, this time explaining why hardware-based packet capture might return unexpected results.
Here’s one of the weirdest ideas I’ve found recently: patch together two dangling ends of virtual Ethernet cables with PBR.
To be fair, Jon Langemak used that example to demonstrate how powerful tc could be. It’s always fun to see a totally-unexpected aspect of Linux networking… even though it looks like the creators of those tools believed in Perl mentality of creating a gazillion variants of line noise to get the job done.
Pete Lumbis started his Cumulus Linux 4.0 update with an overview of differences between Cumulus Linux on hardware switches and Cumulus VX, and continued with an in-depth list of ASIC families supported by Cumulus Linux.
You can watch his presentation, as well as the more in-depth overview of Cumulus Linux concepts by Dinesh Dutt, in the recently-updated What Is Cumulus Linux All About video.
A few weeks ago I described the basics of AWS networking, now it’s time to describe how different Azure is.
As always, it would be best to watch my Azure Networking webinar to get the details. This blog post is the abridged CliffsNotes version of the webinar (and here’s the reason I won’t write a similar blog post for other public clouds ;).
A while ago we discussed a software-focused view of Network Interface Cards (NICs) with Luke Gorrie, and a hardware-focused view of them with Or Gerlitz (Mellanox), Andy Gospodarek (Broadcom) and Jiri Pirko (Mellanox).
Why would anyone want to implement features in hardware and not in software, and what would be the best hardware implementation? We discussed these dilemmas with Silvano Gai in Episode 110 of Software Gone Wild podcast.
There was an obvious invisible elephant in the virtual Cloud Field Day 7 (CFD7v) event I attended in late April 2020. Most everyone was talking about AWS, how their stuff runs on AWS, how it integrates with AWS, or how it will help others leapfrog AWS (yeah, sure…).
Although you REALLY SHOULD watch my AWS Networking webinar (or something equivalent) to understand what problems vendors like VMWare or Pensando are facing or solving, I’m pretty sure a lot of people think they can get away with CliffsNotes version of it, so here they are ;)
In my quest to understand how much buffer space we really need in high-speed switches I encountered an interesting phenomenon: we no longer have the gut feeling of what makes sense, sometimes going as far as assuming that 16 MB (or 32MB) of buffer space per 10GE/25GE data center ToR switch is another $vendor shenanigan focused on cutting cost. Time for another set of Fermi estimates.
Let’s take a recent data center switch using Trident II+ chipset and having 16 MB of buffer space (source: awesome packet buffers page by Jim Warner). Most of switches using this chipset have 48 10GE ports and 4-6 uplinks (40GE or 100GE).
You haven’t mentioned Intel's Omni-Path at all. Should I be surprised?
While Omni-Path looks like a cool technology (at least at the whitepaper level), nobody ever mentioned it (or Intel) in any data center switching discussion I was involved in.
A while ago we did a podcast with Luke Gorrie in which he explained why he’d love to have simple, dumb, and easy-to-work-with Ethernet NICs. What about the other side of the coin – smart NICs with their own CPU, RAM and operating system? Do they make sense, when and why would you use them, and how would you integrate them with Linux kernel?
In Episode 98 we focused on another interesting application developed by Max Rottenkolber: high-speed VPN gateway using IPsec on top of Snabb Switch (details). Enjoy!
Craig Weinhold sent me his thoughts on using Cisco ACI to implement cross-data-center L4-7 services. While we both believe this is not the way to do things (because you should start with proper application architecture), you might find his insights useful if you have to deal with legacy environments that believe in Santa Claus and solving application problems with networking infrastructure.
An “easy button” for multi-DC is like the quest for the holy grail. I explain to my clients that the answer is right in front of them – local IP addressing, L3 routing, and DNS. But they refuse to accept that, draw their swords, and engage in a fruitless war against common sense. Asymmetry, stateful inspection, ingress routing, split-brain, quorums, host mobility, cache coherency, non-RFC complaint ARP, etc.
In 2014, we did a series of podcasts on Snabb Switch (Snabb Switch and OpenStack, Deep Dive), a software-only switch delivering 10-20 Gbps of forwarded bandwidth per x86 core. In the meantime, Snabb community slowly expanded, optimized the switching code, built a number of solutions on top of the packet forwarding core, and even forked a just-in-time Lua compiler to get better performance.