Roman Pomazanov documented his thoughts on the beauties of large layer-2 domains in a LinkedIn article and allowed me to repost it on ipSpace.net blog to ensure it doesn’t disappear
First of all: “L2 is a single failure domain”, a problem at one point can easily spread to the entire datacenter.
Got this question from a networking engineer attending the Building Next-Generation Data Center online course:
Has anyone an advice on LACP fast rate? When and why should you use it instead of normal LACP?
Apart from forming link aggregation groups, you can use LACP to detect link- and node failures (more details). However:
TL&DR: Installing an Ethernet NIC with two uplinks in a server is easy1. Connecting those uplinks to two edge switches is common sense2. Detecting physical link failure is trivial in Gigabit Ethernet world. Deciding between two independent uplinks or a link aggregation group is interesting. Detecting path failure and disabling the useless uplink that causes traffic blackholing is a living hell (more details in this Design Clinic question).
Want to know more? Let’s dive into the gory details.
Sometime last autumn, I was asked to create a short “network security challenges” presentation. Eventually, I turned it into a webinar, resulting in almost four hours of content describing the interesting gotchas I encountered in the past (plus a few recent vulnerabilities like turning WiFi into a thick yellow cable).
Each webinar section started with a short “This is why we have to deal with these stupidities” introduction. You’ll find all of them collected in the Root Causes video starting the Network Security Fallacies part of the How Networks Really Work webinar.
With the widespread deployment of Ethernet-over-something technologies, it became possible to build MLAG clusters without a physical peer link, replacing it with a virtual link across the core fabric. Avaya was one of the first vendors to implement virtual peer links with Provider Backbone Bridging (PBB) transport, and some data center switching vendors (example: Cisco) offer similar functionality with VXLAN transport.
Pim van Pelt built an x86/Linux-based using Vector Packet Processor that can forwarding IP traffic at 150 Mpps/180 Gbps forwarding rates on a 2-CPU Dell server with E5-2660 (8 core) CPU.
People keep telling me how well large language models like ChatGPT work for them, so now and then, I give it another try, most often resulting in another disappointment1. It might be that I suck at writing prompts2, or it could be that I have a knack for looking in the wrong places3.
This time4 I tried to “figure out5” why we need iSCSI checksums if we have iSCSI running over Ethernet which already has checksums. Enjoy the (ChatGPT) circular arguments and hallucinations with plenty of platitudes and no clear answer.
The “beauty” (from an attacker perspective) of the original shared-media Ethernet was the ability to see all traffic sent to other hosts. While it’s trivial to steal someone else’s IPv4 address, the ability to see their traffic allowed you to hijack their TCP sessions without the victim being any wiser (apart from the obvious session timeout). Really smart attackers could go a step further, insert themselves into the forwarding path, and inject extra payload into unencrypted sessions.
A recently-discovered WiFi vulnerability brought us back to that wonderful world.
Did you know that most chassis switches look like leaf-and-spine fabrics1 from the inside? If you didn’t, you might want to watch the short Chassis Architectures video by Pete Lumbis (author of ASICs for Networking Engineers part of the Data Center Fabric Architectures webinar).
An ipSpace.net subscriber sent me a question along the lines of “does it matter that EVPN uses BGP to implement dynamic MAC learning whereas in traditional switching that’s done in hardware?” Before going into those details, I wanted to establish the baseline: is dynamic MAC learning really implemented in hardware?
Hardware-based switching solutions usually use a hash table to implement MAC address lookups. The above question should thus be rephrased as is it possible to update the MAC hash table in hardware without punting the packet to the CPU? One would expect high-end (expensive) hardware to be able do it, while low-cost hardware would depend on the CPU. It turns out the reality is way more complex than that.
One of the topics he couldn’t possible skip was the question of how many packet buffers one needs in a data center switch.
It’s easy to get excited about what seems to be a new technology and conclude that it will forever change the way we do things. For example, I’ve seen claims that SmartNICs (also known as Data Processing Units – DPU) will forever change the network.
TL&DR: Of course they won’t.
Before we start discussing the details, it’s worth remembering what a DPU is: it’s another server with its own CPU, memory, and network interface card (NIC) that happens to have PCI hardware that emulates the host interface cards. It might also have dedicated FPGA or ASICs.
One of the most exciting announcements from the last AWS re:Invent was the Elastic Network Adapter (ENA) Express functionality that uses the Scalable Reliable Datagram (SRD) protocol as the transport protocol for the overlay virtual networks. AWS claims ENA Express can push 25 Gbps over a single TCP flow and that SRD improves the tail latency (99.9 percentile) for high-throughput workloads by 85%.
Ignoring the “DPUs could change the network forever” blogosphere reactions (hint: they won’t), let’s see what could be happening behind the scenes and why SRD improves TCP throughput and tail latency.