Building network automation solutions

9 module online course

Start now!

Category: switching

VLAN Interfaces and Subinterfaces

Early bridges implemented a single bridging domain across all ports. Within a few years, we got multiple bridging domains within a single device (including bridging implementation in Cisco IOS). The capability to have multiple bridging domains stretched across several devices was still missing… until the modern-day Pandora opened the VLAN box and forever swamped us in the complexities of large-scale bridging.

read more add comment

How Routers Became Bridges

Network terminology was easy in the 1980s: bridges forwarded frames between Ethernet segments based on MAC addresses, and routers forwarded network layer packets between network segments. That nirvana couldn’t last long; eventually, a big-enough customer told Cisco: “I don’t want to buy another box if I already have your too-expensive router. I want your router to be a bridge.

Turning a router into a bridge is easier than going the other way round1: add MAC table and dynamic MAC learning, and spend an evening implementing STP.

read more see 1 comments

Router Interfaces and Switch Ports

When I started implementing the netlab VLAN module, I encountered (at least) three different ways of configuring physical interfaces and bridging domains even though the underlying packet forwarding operations (and sometimes even the forwarding hardware) are the same. That confusopoly is guaranteed to make your head spin for years, and the only way to figure out what’s going on behind the scenes is to go back to the fundamentals.

read more add comment

Repost: Buffers, Congestion, Jitter, and Shapers

Béla Várkonyi left a great comment on a blog post discussing (among other things) whether we need large buffers on spine switches. I don’t know how many people read the comments; this one is too valuable to be lost somewhere below the fold


You might want to add another consideration. If you have a lot of traffic aggregation even when the ingress and egress port are roughly at the same speed or when the egress port has more capacity, you could still have congestion. Then you have two strategies, buffer and suffer jitter and delay, or drop and hope that the upper layers will detect it and reduce the sending by shaping.

read more add comment

MLAG Deep Dive: Layer-3 Forwarding

The layer-2 forwarding and flooding in an MLAG cluster are intricate but still reasonably easy to understand. Layer-3 gets more interesting; its quirks depend heavily on layer-2 implementation. While most MLAG implementations exhibit similar bridging behavior, expect interesting differences in routing behavior.

We’ll have to expand by-now familiar network topology to cover layer-3 edge cases. We’ll still work with two switches in an MLAG cluster, but we’ll have an external router attached to both of them. The hosts connected to the switches belong to two subnets (red and blue).

read more see 1 comments

MLAG Deep Dive: Layer-2 Flooding

In the previous blog post of the MLAG Technology Deep Dive series, we explored the intricacies of layer-2 unicast forwarding. Now let’s focus on layer-2 BUM1 flooding functionality of an MLAG system.

Our network topology will have two switches and five hosts, some connected to a single switch. That’s not a good idea in an MLAG environment, but even if you have a picture-perfect design with everything redundantly connected, you will have to deal with it after a single link failure.

read more add comment

Select the Best Switching ASIC For the Job

Last week I described some of the data center switching ASIC design tradeoffs and the ASIC families Broadcom created to fit somewhere in that multi-dimensional space.

Next step: how could you design your data center fabric to make the most out of them? To keep things simple, we’ll build a typical leaf-and-spine fabric with a WAN edge layer (sometimes called border leaf switches).

read more see 1 comments

MLAG Deep Dive: Dynamic MAC Learning

In the first blog post of the MLAG Technology Deep Dive series, we explored the components of an MLAG system and the fundamental control plane requirements.

This post focuses on a major building block of the layer-2 data plane functionality: MAC learning. We’ll keep using the same network topology with two switches and five hosts, and assume our system tries its best to implement hot-potato switching (sending the frames toward the destination MAC address on the shortest possible path).

read more add comment

Data Center Switching ASICs Tradeoffs

A brief mention of Broadcom ASIC families in the Networking Hardware/Software Disaggregation in 2022 blog post triggered an interesting discussion of ASIC features and where one should use different ASIC families.

Like so many things in life, ASIC design is all about tradeoffs. Usually you’re faced with a decision to either implement X (whatever X happens to be), or have high-performance product, or have a reasonably-priced product. It’s very hard to get two out of three, and getting all three is beyond Mission Impossible.

read more see 2 comments

MLAG Deep Dive: System Overview

Multi-Chassis Link Aggregation (MLAG) – the ability to terminate a Port Channel/Link Aggregation Group on multiple switches – is one of the more convoluted1 bridging technologies2. After all, it’s not trivial to persuade two boxes to behave like one and handle the myriad corner cases correctly.

In this series of deep dive blog posts, we’ll explore the intricacies of MLAG, starting with the data plane considerations and the control plane requirements resulting from the data plane quirks. If you wonder why we need all that complexity, remember that Ethernet networks still try to emulate the ancient thick yellow cable that could lose some packets but could never reorder packets or deliver duplicate packets.

read more see 1 comments

Living with Small Forwarding Tables

A friend of mine working for a mid-sized networking vendor sent me an intriguing question:

We have a product using an old ASIC that has 12K forwarding entries, and would like to extend its lifetime. I know you were mentioning some useful tricks, would you happen to remember what they were?

This challenge has no perfect solution, but there are at least three tricks I’ve encountered so far (as always, comments are most welcome):

read more see 5 comments

Flow-Based Packet Forwarding

In the Cache-Based Packet Forwarding blog post I described what happens when someone tries to bypass the complexities of IP routing table lookup with a forwarding cache.

Now imagine you want to implement full-featured fast packet forwarding including ingress- and egress ACL, NAT, QoS… but find the required hardware (TCAM) too expensive. Wouldn’t it be nice if we could send the first packet of every flow to a CPU to figure out what to do with it, and download the results into a high-speed flow cache where they could be used to switch the subsequent packets of the same flow. Welcome to flow-based packet forwarding.

read more see 2 comments

Cache-Based Packet Forwarding

In the previous blog post in this series I described how convoluted routing table lookups could become when you have to deal with numerous layers of indirection (BGP prefix ⇨ BGP next hop ⇨ IGP next hop ⇨ link bundle ⇨ outgoing interface). Modern high-end hardware can deal with the resulting complexity; decades ago we had to use router CPU to do multiple (potentially recursive) lookups in the IP routing table (there was no FIB at that time).

Network devices were always pushed to the bleeding edge of performance, and smart programmers always tried to optimize the CPU-intensive processes. One of the obvious packet forwarding optimizations relied on the fact that within a short timeframe most packets have to be forwarded to a small set of destinations. Welcome to the wonderful world of cache-based forwarding.

read more see 8 comments

The Impact of Jumbo Maximum Frame Size on Data Center Switches

Sander Steffann sent me an intriguing question a long while ago:

I was wondering if there are any downsides to setting “system mtu jumbo 9198” by default on every switch? I mean, if all connected devices have MTU 1500 they won’t notice that the switch could support longer frames, right?

That’s absolutely correct, and unless the end hosts get into UDP fights things will always work out (aka TCP MSS saves the day)… but there must be a reason switching vendors don’t use maximum frame sizes larger than 1514 by default (Cumulus Linux seems to be an exception, and according to Sébastien Keller Arista’s default maximum frame size is between 9214 and 10178 depending on the platform).

read more see 3 comments
Sidebar