Category: switching

MLAG Deep Dive: LAG Member Failures in VXLAN Fabrics

In the Dealing with LAG Member Failures blog post, we figured out how easy it is to deal with a LAG member failure in a traditional MLAG cluster. The failover could happen in hardware, and even if it’s software-driven, it does not depend on the control plane.

Let’s add a bit of complexity and replace a traditional layer-2 fabric with a VXLAN fabric. The MLAG cluster members still use an MLAG peer link and an anycast VTEP IP address (more details).

read more see 2 comments

MLAG Deep Dive: Dealing with LAG Member Failures

Craig Weinhold pointed me to a complex topic I managed to ignore in my MLAG Deep Dive series: how does an MLAG cluster reroute around a failure of a LAG member link?

In this blog post, we’ll focus on traditional MLAG cluster implementations using a peer link; another blog post will explore the implications of using VXLAN and EVPN to implement MLAG clusters.

We’ll also ignore the interesting question of “how is the LAG member link failure detected?1 and focus on “what happens next?” using the sample MLAG topology:

read more add comment

AMS-IX Outage: Layer-2 Strikes Again

On November 22nd, 2023, AMS-IX, one of the largest Internet exchanges in Europe, experienced a significant performance drop lasting more than four hours. While its peak performance is around 10 Tbps, it dropped to about 2.1 Tbps during the outage.

AMS-IX published a very sanitized and diplomatic post-mortem incident summary in which they explained the outage was caused by LACP leakage. That phrase should be a red flag, but let’s dig deeper into the details.

read more see 5 comments

Path Failure Detection on Multi-Homed Servers

TL&DR: Installing an Ethernet NIC with two uplinks in a server is easy1. Connecting those uplinks to two edge switches is common sense2. Detecting physical link failure is trivial in Gigabit Ethernet world. Deciding between two independent uplinks or a link aggregation group is interesting. Detecting path failure and disabling the useless uplink that causes traffic blackholing is a living hell (more details in this Design Clinic question).

Want to know more? Let’s dive into the gory details.

read more see 5 comments

Network Security Vulnerabilities: the Root Causes

Sometime last autumn, I was asked to create a short “network security challenges” presentation. Eventually, I turned it into a webinar, resulting in almost four hours of content describing the interesting gotchas I encountered in the past (plus a few recent vulnerabilities like turning WiFi into a thick yellow cable).

Each webinar section started with a short “This is why we have to deal with these stupidities” introduction. You’ll find all of them collected in the Root Causes video starting the Network Security Fallacies part of the How Networks Really Work webinar.

You need Free Subscription to watch the video.
add comment

MLAG Clusters without a Physical Peer Link

With the widespread deployment of Ethernet-over-something technologies, it became possible to build MLAG clusters without a physical peer link, replacing it with a virtual link across the core fabric. Avaya was one of the first vendors to implement virtual peer links with Provider Backbone Bridging (PBB) transport, and some data center switching vendors (example: Cisco) offer similar functionality with VXLAN transport.

read more see 1 comments

ChatGPT Explaining the Need for iSCSI CRC

People keep telling me how well large language models like ChatGPT work for them, so now and then, I give it another try, most often resulting in another disappointment1. It might be that I suck at writing prompts2, or it could be that I have a knack for looking in the wrong places3.

This time4 I tried to “figure out5” why we need iSCSI checksums if we have iSCSI running over Ethernet which already has checksums. Enjoy the (ChatGPT) circular arguments and hallucinations with plenty of platitudes and no clear answer.

read more see 2 comments