Category: switching

Tuesday, June 7, 2022 06:38 +0000

MLAG Deep Dive: Dynamic MAC Learning

In the first blog post of the MLAG Technology Deep Dive series, we explored the components of an MLAG system and the fundamental control plane requirements.

This post focuses on a major building block of the layer-2 data plane functionality: MAC learning. We’ll keep using the same network topology with two switches and five hosts, and assume our system tries its best to implement hot-potato switching (sending the frames toward the destination MAC address on the shortest possible path).

Data Center Switching ASICs Tradeoffs

A brief mention of Broadcom ASIC families in the Networking Hardware/Software Disaggregation in 2022 blog post triggered an interesting discussion of ASIC features and where one should use different ASIC families.

Like so many things in life, ASIC design is all about tradeoffs. Usually you’re faced with a decision to either implement X (whatever X happens to be), or have high-performance product, or have a reasonably-priced product. It’s very hard to get two out of three, and getting all three is beyond Mission Impossible.

read more see 2 comments

Wednesday, June 1, 2022 06:11 +0000

MLAG Deep Dive: System Overview

Multi-Chassis Link Aggregation (MLAG) – the ability to terminate a Port Channel/Link Aggregation Group on multiple switches – is one of the more convoluted¹ bridging technologies². After all, it’s not trivial to persuade two boxes to behave like one and handle the myriad corner cases correctly.

In this series of deep dive blog posts, we’ll explore the intricacies of MLAG, starting with the data plane considerations and the control plane requirements resulting from the data plane quirks. If you wonder why we need all that complexity, remember that Ethernet networks still try to emulate the ancient thick yellow cable that could lose some packets but could never reorder packets or deliver duplicate packets.

read more see 1 comments

switching

Monday, May 9, 2022 06:46 UTC

Living with Small Forwarding Tables

A friend of mine working for a mid-sized networking vendor sent me an intriguing question:

We have a product using an old ASIC that has 12K forwarding entries, and would like to extend its lifetime. I know you were mentioning some useful tricks, would you happen to remember what they were?

This challenge has no perfect solution, but there are at least three tricks I’ve encountered so far (as always, comments are most welcome):

Flow-Based Packet Forwarding

In the Cache-Based Packet Forwarding blog post I described what happens when someone tries to bypass the complexities of IP routing table lookup with a forwarding cache.

Now imagine you want to implement full-featured fast packet forwarding including ingress- and egress ACL, NAT, QoS… but find the required hardware (TCAM) too expensive. Wouldn’t it be nice if we could send the first packet of every flow to a CPU to figure out what to do with it, and download the results into a high-speed flow cache where they could be used to switch the subsequent packets of the same flow. Welcome to flow-based packet forwarding.

read more see 2 comments

Thursday, February 24, 2022 08:57 +0000… updated on Monday, February 28, 2022 16:07 UTC

Cache-Based Packet Forwarding

In the previous blog post in this series I described how convoluted routing table lookups could become when you have to deal with numerous layers of indirection (BGP prefix ⇨ BGP next hop ⇨ IGP next hop ⇨ link bundle ⇨ outgoing interface). Modern high-end hardware can deal with the resulting complexity; decades ago we had to use router CPU to do multiple (potentially recursive) lookups in the IP routing table (there was no FIB at that time).

Network devices were always pushed to the bleeding edge of performance, and smart programmers always tried to optimize the CPU-intensive processes. One of the obvious packet forwarding optimizations relied on the fact that within a short timeframe most packets have to be forwarded to a small set of destinations. Welcome to the wonderful world of cache-based forwarding.

The Impact of Jumbo Maximum Frame Size on Data Center Switches

Sander Steffann sent me an intriguing question a long while ago:

I was wondering if there are any downsides to setting “system mtu jumbo 9198” by default on every switch? I mean, if all connected devices have MTU 1500 they won’t notice that the switch could support longer frames, right?

That’s absolutely correct, and unless the end hosts get into UDP fights things will always work out (aka TCP MSS saves the day)… but there must be a reason switching vendors don’t use maximum frame sizes larger than 1514 by default (Cumulus Linux seems to be an exception, and according to Sébastien Keller Arista’s default maximum frame size is between 9214 and 10178 depending on the platform).

Packet Forwarding 101: Header Lookups

Whenever someone asks me about LISP, I answer, “it’s a nice idea, but cache-based forwarding never worked well.” Oldtimers familiar with the spectacular failures of fast switching and various incarnations of flow switching usually need no further explanation. Unfortunately, that lore is quickly dying out, so let’s start with the fundamentals: how does packet forwarding work?

Packet forwarding used by bridges and routers (or Layer-2/3 switches if you believe in marketing terminology) is just a particular case of statistical multiplexing – a mechanism where many communication streams share the network resources by slicing the data into packets that are sent across the network. The packets are usually forwarded independently; every one of them must contain enough information to be propagated by each intermediate device it encounters on its way across the network.

read more see 2 comments

Wednesday, February 9, 2022 09:35 UTC

OMG: VTP Is Insecure

One of my readers sent me an interesting pointer:

I just watched a YouTube video by a security researcher showing how a five line python script can be used to unilaterally configure a Cisco switch port connected to a host computer into a trunk port. It does this by forging a single virtual trunk protocol (VTP) packet. The host can then eavesdrop on broadcast traffic on all VLANs on the network, as well as prosecute man-in-the-middle of attacks.

I’d say that’s a “startling revelation” along the lines of “OMG, VXLAN is insecure” – a wonderful way for a security researcher to gain instant visibility. From a more pragmatic perspective, if you enable an insecure protocol on a user-facing port, you get the results you deserve¹.

While I could end this blog post with the above flippant remark, it’s more fun considering two fundamental questions.

More: Hardware Differences between Routers and Switches

Aaron Glenn sent me his thoughts on hardware differences between routers and switches based on the last paragraph of Dmytro Shypovalov’s views on the topic

To conclude, what is the difference between routers and switches in my opinion? I have absolutely no idea.

read more see 1 comments

switching

Thursday, December 9, 2021 07:48 UTC

Response: Hardware Differences between Routers and Switches

Dmytro Shypovalov sent me his views on the hardware differences between routers and switches. Enjoy!

So, a long time ago routers were L3 with CPU forwarding and switches were L2 with ASIC. Then they had invented TCAM and L3 switches, and since then ASICs have evolved to support more features (QoS, encapsulations etc) and store more routes, while CPU-based architectures have evolved to specialised NPU and parallel processing (e.g. Cisco QFX) to handle more traffic, while supporting all features of CPU forwarding.

Hardware Differences between Routers and Switches

One of my readers sent me this age-old question:

Is there a real difference in the underlying hardware of switches and routers in terms of the traffic processing chips and their capabilities in terms of routing and switching (or should I say only switching)?

Let’s get the terminology straight. Router is a technical term for a device that forwards packets based on network layer information. Switch is a marketing term for a device that does something with packets.

Rephrasing the question: is there a hardware difference between a box marketed as a router and another box marketed as a layer-3 switch?

TL&DR: Yes.

read more see 1 comments

switching

Tuesday, September 14, 2021 06:51 +0000

Stateful Switchover (SSO) 101

Stateful Switchover (SSO) is another seemingly awesome technology that can help you implement high availability when facing a ~~broken~~ non-redundant network design. Here’s how it’s supposed to work:

A network device runs two copies of the control plane (primary and backup);
Primary control plane continuously synchronizes its state with the backup control plane;
When the primary control plane crashes, the backup control plane already has all the required state and is ready to take over in moments.

Delighted? You might be disappointed once you start digging into the details.

read more see 1 comments

Tuesday, September 7, 2021 06:47 +0000

Non-Stop Forwarding (NSF) 101

Non-Stop Forwarding (NSF) is one of those ideas that look great in a slide deck and marketing collaterals, but might turn into a giant can of worms once you try to implement them properly (see also: stackable switches or VMware Fault Tolerance).

NSF has been around for at least 15 years, so I’m positive at least some vendors got most of the details right; I’m also pretty sure a few people have scars to prove they’ve been around the non-optimal implementations.

read more see 2 comments

Monday, September 6, 2021 06:34 UTC

Comparing Forwarding Performance of Data Center Switches

One of my subscribers is trying to decide whether to buy an -EX or an -FX version of a Cisco Nexus data center switch:

I was comparing Cisco Nexus 93180YC-FX and Nexus 93180YC-EX. They have the same port distribution (48x 10/25G + 6x40/100G), 3.6 Tbps switching capacity, but the -FX version has just 1200 Mpps forwarding rate while EX version goes up to 2600 Mpps. What could be the reason for the difference in forwarding performance?

Both switches are single-ASIC switches. They have the same total switching bandwidth, thus it must take longer for the FX switch to forward a packet, resulting in reduced packet-per-seconds figure. It looks like the ASIC in the -FX switch is configured in more complex way: more functionality results in more complexity which results in either reduced performance or higher cost.