switching « ipSpace.net blog

Thursday, December 9, 2021 07:48 UTC

Response: Hardware Differences between Routers and Switches

Dmytro Shypovalov sent me his views on the hardware differences between routers and switches. Enjoy!

So, a long time ago routers were L3 with CPU forwarding and switches were L2 with ASIC. Then they had invented TCAM and L3 switches, and since then ASICs have evolved to support more features (QoS, encapsulations etc) and store more routes, while CPU-based architectures have evolved to specialised NPU and parallel processing (e.g. Cisco QFX) to handle more traffic, while supporting all features of CPU forwarding.

Hardware Differences between Routers and Switches

One of my readers sent me this age-old question:

Is there a real difference in the underlying hardware of switches and routers in terms of the traffic processing chips and their capabilities in terms of routing and switching (or should I say only switching)?

Let’s get the terminology straight. Router is a technical term for a device that forwards packets based on network layer information. Switch is a marketing term for a device that does something with packets.

Rephrasing the question: is there a hardware difference between a box marketed as a router and another box marketed as a layer-3 switch?

TL&DR: Yes.

Stateful Switchover (SSO) 101

Stateful Switchover (SSO) is another seemingly awesome technology that can help you implement high availability when facing a ~~broken~~ non-redundant network design. Here’s how it’s supposed to work:

A network device runs two copies of the control plane (primary and backup);
Primary control plane continuously synchronizes its state with the backup control plane;
When the primary control plane crashes, the backup control plane already has all the required state and is ready to take over in moments.

Delighted? You might be disappointed once you start digging into the details.

Non-Stop Forwarding (NSF) 101

Non-Stop Forwarding (NSF) is one of those ideas that look great in a slide deck and marketing collaterals, but might turn into a giant can of worms once you try to implement them properly (see also: stackable switches or VMware Fault Tolerance).

NSF has been around for at least 15 years, so I’m positive at least some vendors got most of the details right; I’m also pretty sure a few people have scars to prove they’ve been around the non-optimal implementations.

Comparing Forwarding Performance of Data Center Switches

One of my subscribers is trying to decide whether to buy an -EX or an -FX version of a Cisco Nexus data center switch:

I was comparing Cisco Nexus 93180YC-FX and Nexus 93180YC-EX. They have the same port distribution (48x 10/25G + 6x40/100G), 3.6 Tbps switching capacity, but the -FX version has just 1200 Mpps forwarding rate while EX version goes up to 2600 Mpps. What could be the reason for the difference in forwarding performance?

Both switches are single-ASIC switches. They have the same total switching bandwidth, thus it must take longer for the FX switch to forward a packet, resulting in reduced packet-per-seconds figure. It looks like the ASIC in the -FX switch is configured in more complex way: more functionality results in more complexity which results in either reduced performance or higher cost.

Worth Reading: Azure Datacenter Switch Failures

Microsoft engineers published an analysis of switch failures in 130 Azure regions (review of the article, The Next Platform summary):

A data center switch has a 2% chance of failing in 3 months (= less than 10% per year);
~60% of the failures are caused by hardware faults or power failures, another 17% are software bugs;
50% of failures lasted less than 6 minutes (obviously crashes or power glitches followed by a reboot).
Switches running SONiC had lower failure rate than switches running vendor NOS on the same hardware. Looks like bloatware results in more bugs, and taking months to fix bugs results in more crashes. Who would have thought…

see 4 comments

Tuesday, May 18, 2021 07:40 UTC… updated on Monday, May 24, 2021 12:05 UTC

Packet Bursts in Data Center Fabrics

When I wrote about the (non)impact of switching latency, I was (also) thinking about packet bursts jamming core data center fabric links when I mentioned the elephants in the room… but when I started writing about them, I realized they might be yet another red herring (together with the supposed need for large buffers in data center switches).

Here’s how it looks like from my ignorant perspective when considering a simple leaf-and-spine network like the one in the following diagram. Please feel free to set me straight, I honestly can’t figure out where I went astray.

Does Small Packet Forwarding Performance Matter in Data Center Switches?

TL&DR: No.

Here’s another never-ending vi-versus-emacs-type discussion: merchant silicon like Broadcom Trident cannot forward small (64-byte) packets at line rate. Does that matter, or is it yet another stimulating academic talking point and/or red herring used by vendor marketing teams to justify their high prices?

Here’s what I wrote about that topic a few weeks ago:

Response: Is Switching Latency Relevant?

Minh Ha left another extensive comment on my Is Switching Latency Relevant blog post. As is usual the case, it’s well worth reading, so I’m making sure it doesn’t stay in the small print (this time interspersed with a few comments of mine in gray boxes)

I found Cisco apparently manages to scale port-to-port latency down to 250ns for L3 switching, which is astonishing, and way less (sub 100ns) for L1 and L2.

I don’t know where FPGA fits into this ultra low-latency picture, because FPGA, compared to ASIC, is bigger, and a few times slower, due to the use of Lookup Table in place of gate arrays, and programmable interconnects.

Fundamentals: Is Switching Latency Relevant?

One of my readers wondered whether it makes sense to buy low-latency switches from Cisco or Juniper instead of switches based on merchant silicon like Trident-3 or Jericho (regardless of whether they are running NX-OS, Junos, EOS, or Linux).

As always, the answer is it depends, but before getting into the details, let’s revisit what latency really is. We’ll start with a simple two-node network.

Implementing Layer-2 Networks in a Public Cloud

A few weeks ago I got an excited tweet from someone working at Oracle Cloud Infrastructure: they launched full-blown layer-2 virtual networks in their public cloud to support customers migrating existing enterprise spaghetti mess into the cloud.

Let’s skip the usual does everyone using the applications now have to pay for Oracle licenses and I wonder what the lock in might be when I migrate my workloads into an Oracle cloud jokes and focus on the technical aspects of what they claim they implemented. Here’s my immediate reaction (limited to the usual 280 characters, because that’s the absolute upper limit of consumable content these days):

Routing in Stretched VLAN Designs

One of my readers was “blessed” with the stretched VLANs requirement combined with the need for inter-VLAN routing and sub-par equipment from a vendor not exactly known for their data center switching products. Before going on, you might want to read his description of the challenge he’s facing and what I had to say about the idea of building stackable switches across multiple locations.

Of course it’s possible that my reader failed to explain the challenge in enough details to get good advice from the vendor SE, or that he had to deal with a clueless SE, or that he’s using ancient gear or that the stars just weren’t aligned… but I don’t think anyone should ever be painted into the corner he found himself in.

Here’s an overview diagram of what my reader was facing. The core switches in each location work as a single device (virtual chassis), and there’s MLAG between core and edge switches. The early 2000s just called and they were proud of the design (but to be honest, sometimes one has to work with the tools his boss bought, so…).

Repost: On the Importance of Line-Rate Switching of Small Packets

I made a flippant remark in a blog comment…

While it’s academically stimulating to think about forwarding small packets (and applicable to large-scale VoIP networks), most environments don’t have to deal with those. Looks like it’s such a non-issue that I couldn’t find recent data; in the good old days ~50% of the packets were 1500 byte long.

… and Minh Ha (by now a regular contributor to my blog) quickly set me straight with a lengthy comment that’s too good to be hidden somewhere at the bottom of a page. Here it is (slightly edited). Also, you might want to read other comments to the original blog post for context.

Video: Finding Paths Across the Network

Regardless of the technology used to get packets across the network, someone has to know how to get from sender to receiver(s), and as always, you have multiple options:

Almighty controller
On-demand dynamic path discovery (example: probing)
Participation in a routing protocol

For more details, watch Finding Paths Across the Network video.

Watch the video

The video is part of How Networks Really Work webinar and available with Free ipSpace.net Subscription.

add comment

Tuesday, February 2, 2021 06:35 UTC

Rant: Broadcom and Network Operating System Vendors

Minh Ha left the following rant as a comment on my 5-year-old What Are The Problems with Broadcom Tomahawk? blog post. It’s too good to be left gathering dust there. Counterarguments and other perspectives are highly welcome.

So basically a lot of vendors these days are just glorified Broadcom resellers :p. It’s funny how some of them try to up themselves by saying they differentiate their offerings with their Network OS.

Category: switching