Comparing Forwarding Performance of Data Center Switches
One of my subscribers is trying to decide whether to buy an -EX or an -FX version of a Cisco Nexus data center switch:
I was comparing Cisco Nexus 93180YC-FX and Nexus 93180YC-EX. They have the same port distribution (48x 10/25G + 6x40/100G), 3.6 Tbps switching capacity, but the -FX version has just 1200 Mpps forwarding rate while EX version goes up to 2600 Mpps. What could be the reason for the difference in forwarding performance?
Both switches are single-ASIC switches. They have the same total switching bandwidth, thus it must take longer for the FX switch to forward a packet, resulting in reduced packet-per-seconds figure. It looks like the ASIC in the -FX switch is configured in more complex way: more functionality results in more complexity which results in either reduced performance or higher cost.
I don’t know if that means that the -FX variant can not give all the bandwidth to all ports at the same time, or if it is not a non-blocking switch.
Whether a switch is non-blocking switch depends on internal architecture (whatever “non-blocking” means for a single-ASIC design). Can it saturate all links? Most probably it can assuming the forwarded packets are big enough. For more details, see Does Small Packet Forwarding Performance Matter in Data Center Switches? and follow the links.
Lukas Krattiger confirmed the numbers and did the math for me:
The minimum packet size for line rate is close to each other (EX = 72, FX = 166) and the bandwidth of the ASIC is the same.
I would love to see an environment generating over a billion 100-byte packets per second on a single switch1. However, my reader was still worried:
How can I be sure that the -FX forwarding rate is high enough for me?
Using the golden rule of good design – Know Your Requirements:
- Figure out how much traffic you have in your data center (total bandwidth and packets-per-second). It’s as easy as collecting port statistics over a reasonable period of time.
- Figure out how much traffic you could realistically get in a few years, or multiply existing traffic by whatever fudge factor you feel comfortable.
- Check whether the hardware you’re planning to buy supports that much traffic.
Unless you’re doing something very specific, you’ll probably find that most modern data center switches easily handle an order of magnitude more than what you could reasonably need.
And as often as I’m mentioning it, there are still no takers. ↩︎
Subscriber should also take into account recent End-of-Life Announcement from Cisco about Nexus 93180YC-EX models. Replacement product is N9K-C93180YC-FX3 which has also 1200 Mpps forwarding rate. That should be sufficient for most environments.
Last month, I made a lab test with FX version using Spirent with 48x25Gbps. We got a maximum of ~68% of throughput without loss. If I’m not wrong, it represents almost 1600 Mpps. About the main focus of the blog post, I agree that the requirements are so important to spend money properly according to the project. 😅
... which refers to the following slide deck from Cisco Live: https://www.ciscolive.com/c/dam/r/ciscolive/emea/docs/2020/pdf/BRKDCN-3222.pdf
Just thinking out loud here, but if your measure is the current environment, wouldn’t the current switches maximum pps be the limiting factor? I say this because going back to requirements if the goal is simply to replace the current implementation with no less than what it can do, this seems like a reasonable limitation. I would think the it is a reasonable assumption that most environments will grow, require more traffic, faster delivery, etc. as the source to destination interface SERDES ratio continues to increase, timely delivery becomes more and more important. Lower latency from Experience can definitely decrease buffer utilization during burst. So the question really becomes what feature can we love without to benefit packet delivery? And at what cost financially? It always comes down to economics doesn’t it. Anyways, great article as always!
"... but if your measure is the current environment, wouldn’t the current switches maximum pps be the limiting factor" -- it depends on whether the network is the weakest link (= bottleneck).
It rarely is... unless you have an ancient 3-tier fabric built with Nexus 7000s and Fabric Extenders ;))
'I don’t know if that means that the -FX variant can not give all the bandwidth to all ports at the same time, or if it is not a non-blocking switch.'
There's no non-blocking switch at this level of bandwidth; I stand by this statement until am proven wrong. The fabric scheduler is the limiting factor, and just like what we learn with OpenFlow, centralized anything doesn't scale. Nick Mckeown, inventor of the beautifully simple Islip scheduler -- used widely thanks to its simplicity -- admitted the limitation of the scheduler; that's why he loved the idea of a load-balanced switch, which does away with painful fabric scheduling.
Also, I'd take pps mentioned with a grain of salt, because many of these platforms are not deterministic. Their pps performance tends to degrade as more and more features are activated in the forwarding pipeline, due to various nonlinear effects. That's why a switch with better small packet performance should perform better than another one, as more features like traffic classification, are enabled. So 2600 Mpps of basic Layer 2 or Layer 3 switching, might halve as you pack on more features.
Correlation of traffic also degrades pps, so a figure like 2600 Mpps for ex, needs context. Is 2600 Mpps achieved under Markovian traffic, or more realistic fractal and correlated traffic? Real life is almost always non-Markovian, non-linear, non-equilibrium... so those numbers specified in manuals, should only be used as a rough guideline, just like models.
That said, unless you run a hard real-time or other specialized systems with strict timing requirement, most switches today are more than powerful enough. You only need to understand the finer details to avoid being fooled by vendor marketing :)).
The -EX has a 2-slice ASIC. Each slice contributes half of the PPS forwarding capacity. The -FX has a single slice ASIC. A slice is a self-contained packet forwarding engine with its own buffer memory. In a multi-slice design there is some sort of internal interconnect to bridge all the slices together. It is hard for me to think of the EX as having twice the performance of the FX since in the real world, some significant portion of the traffic will need to pass between the slices. But this may be a personality defect on my part.
I prefer single slice designs because the full packet memory is available to absorb microbursts.
"It is hard for me to think of the EX as having twice the performance of the FX since in the real world, some significant portion of the traffic will need to pass between the slices. But this may be a personality defect on my part." << I guess the real questions are:
Haven't seen the answers in Cisco Live slide deck (or maybe I missed them).