Where Do We Need Smart NICs? « ipSpace.net blog

Wednesday, September 9, 2020 06:44 UTC

Where Do We Need Smart NICs?

We did a number of Software Gone Wild podcasts trying to figure out whether smart NICs address a real need or whether it’s just another vendor attempt to explore all potential markets. As expected, we got opposing views from Luke Gorrie claiming a NIC should be as simple as possible to Silvano Gai explaining how dedicated hardware performs the same operations at lower cost, lower power consumption and way higher speeds.

In theory, there’s no doubt that Silvano is right. Just look at how expensive some router line cards are, and try to figure out how much it would cost to get 25.6 Tbps of forwarding performance that we’ll get in a single ASIC (Tomahawk-4) in software (assuming ~10 Gbps per CPU core). High-speed core packet forwarding has to be done in dedicated hardware.

Back to the network edge. In practice, one has to balance the tradeoffs of increased software complexity caused by smart NICs against the cost of the CPU cores needed for software-based packet forwarding. While software developers yearn for simplicity, NIC vendors would love you to believe you cannot reach the performance you need with software-based packet processing. Even worse, there are still people justifying smart NICs with ancient performance myths. Here’s a sample LinkedIn comment I got in June 2020:

I think you are forgetting one if the major reasons for the rise of smart NICs; that being that ability to process high speed networking packet streams at line rates and to perform operations on the packet streams. You old x86 processor with an average PCIe 4-lane dumb NIC card is not to the task up for 25 Gbps networks or higher.

How about a few facts:

x86 server architecture hasn’t been a limiting factor for ages. Luke Gorrie demonstrated how to push 200 Gbps from an off-the-shelf x86 server in 2013, managed to do that with two CPU cores and 64-byte packets in 2016, and explained his ideas in details in the very first episode of Software Gone Wild.

Luke “cheated” by creating a fixed transmit ring for each NIC and just pointing the NIC to the packets to be sent. He did demonstrate that the architecture of a typical x86-based server is NOT a performance bottleneck though.

In the meantime we did several Software Gone Wild episodes with people pushing the performance envelope of software-based forwarding, including IPv4-over-IPv6 tunnel headend delivering 20 Gbps per x86 core… in March 2016.

I stopped tracking how far they got in the meantime, it was pretty obvious that we need hardware switching in NICs argument was already bogus at that point, and if you want slightly more recent performance figures, check out the fd.io VPP performance tests and Andree Toonk’s blog posts on high-performance packet forwarding with VPP and XDP.

TL&DR: Just because you can’t figure out how to do it doesn’t mean it can’t be done. Do some more research…

So where do we really need smart NICs? There are (large) niche use cases like support for baremetal servers in public clouds or preprocessing stock quotes in High Frequency Trading (HFT) environments. NetApp is also using Pensando NIC as a generic offload engine, but in that case it just happens that the offload hardware also has an Ethernet port (for a minimal amount of additional details please check the Pensando presentations from CFD7). Anything else from real-life production as opposed to conference-talk-generating proof-of-concept? Please let me know!

switching

6 comments:

Sanjay Padubidri 09 September 2020 02:13

Ivan, as always the answer for "SmartNIC or CPU?" is "it depends", and the devil is in the details when talking about high throughput achieved in CPU. If you have many independent flows then of course the packets can be distributed to cores (either using the NIC or a load balancing core). However there are real deployments that have traffic coming on a single IPSec tunnel, so there is no easy way to take advantage of parallelism. If packets are fragmented that causes problems too for distributing the packets. Many protocols are also stateful, and if this shared state is accessed by many cores the throughput reduces significantly.

Yet another aspect is that many cloud providers treat a lot of the packet processing as "infrastructure overhead", and would rather not do it on cores since the cores can be rented out to customers to make money. So even if SmartNICs have higher power/price they may prefer to offload this work and free up all cores to be rented to customers - this is the approach taken by AWS with Nitro.

Bela Varkonyi 09 September 2020 04:12

Relying on the CPU means an undeterministic behavior as more and more features are activated. Even by dedicating some CPU cores to specific tasks such as in the ISR4k family, the performance is not the same as features are added after a certain saturation point. With a hardware based packet processing pipeline you could get a more deterministic behavior even at close to wire speeds. Of course, only if the architecture is designed properly. Better determinism might be important for safety critical networks.

In most applications this is not a real issue, but there are some real-time needs that would be better fulfilled with SmartNICs. A generic example is high accuracy and precision time synchronization. Another example is one-way delay and delay variation measurements with sub-millisecond precision. Critical voice and video streams might be also a use case.

Dipjyoti Saikia 10 September 2020 03:50

While it is possible to do 100 Gbps with CPU alone, there are other things to consider like form-factor, power footprint and latency. Maybe this article will shed more light on why and when smartNICs are needed :

https://www.eweek.com/networking/how-f5-networks-uses-smartnics-to-ease-transition-to-software

A.A 10 September 2020 03:58

For me as a network/security engineer it was a pain using Software-based packet forwarding as there are too many moving part to make high speed packet forwarding really work but just in blogs and forums .last month i tasked to put a simple L2-Bridge (Ubuntu kernel 5.4 with VLAN tagging/Filtering) on top of Vmware DVS with Private VLAN enabled to limit some VM-to-VM talk within same host/port-group/VLAN (we couldn't use NSX for reasons).after trying with many fancy offloading methods (using VMXNET3 , SR-IOV , LRO/TSO , OVS+DPDK) the performance was unpredictable and some solutions were hard or impossible to implement and it didn't worth that much effort to build an unstable solutions.i spent 1 Mount of my working time to make it work.Ivan say that we need doing more research and effort.you are right.but as my main job (and others guys like me) is work as network/security engineer , how can i spent my entire life doing research on thing that are mostly experimental and there is no real solution in Enterprise production network.i am not talking about big Cloud SP or Telecom SP.i am talking about enterprise org with limited resources like mine.i think the packet processing and NFV solutions need to be simplified for guys like me.i spent 20 year to learn networking and security technology and i am not going to spend my remaining career to learn how DPDK is working with some NIC/Library and T-shooting those things and every code in new release break some other thing.i think vanilla Linux kernel data-path should be more and more optimized for packet forwarding as doing things in User-Space is hard and break many thing.if in kernel data-path improved no one need to compile those excremental codes .i know that i am complaining too much.but if NFV is going to have a broad deployment (not just for Cloud SP and Teleco) is need to be simplified .if people like me are not the right guys to mess with NFV pls stop telling others that the FUTURE is NFV .if the performance of NFV is not going to improved (OVS+DPDK is 7 years old, VPP is out but no real and ready to use solutions) it is better to doing things in the old way. A.A

junhui liu 11 September 2020 12:53

CPU is not designed for the purpose of packet forwarding. One example is packet order retaining. It is impossible for a multicore CPU to retain the packet order as is received after parallel processing by multiple cores. Another example is scheduling. Yes CPU can do scheduling, but at a very high tax of CPU cycles.

Silvano Gai 19 September 2020 01:42

There is not much technical discussion on SmartNICs, I welcome Ivan’s post, and I want to provide my perspective in this blog post:

https://silvanogai.github.io/posts/smartnic/

Thanks Ivan

-- Silvano Gai

Add comment

Recent posts in the same categories

switching

6 comments: