After VMware launched DPU-based acceleration for VMware NSX, marketing-focused websites frantically started discussing the benefits of DPUs. Although I’ve been writing about SmartNICs and DPUs for years, it’s time for another closer look at the emperor’s clothes.
What Is a DPU
DPU (Data Processing Unit) is a fancier name for a network adapter formerly known as SmartNIC – a server repackaged into an interface card form factor. We had them for decades (anyone remembers iSCSI offload adapters?)
A DPU has a CPU (these days, usually based on ARM architecture), memory, storage, network interface, and a PCI interface that behaves like a network interface, making the host operating system think it’s dealing with an interface card. Every DPU is running an operating system (often Linux) that cannot be configured from the attached host if you want to retain any semblance of a security boundary – another wonderful upgrade nightmare and attack vector (FWIW, check out Broadpwn). Just ask anyone who had to deal with ILOBleed or other IPMI vulnerabilities.
DPU and Bare-Metal Servers
Sarcasm aside, you must use DPUs to offer bare-metal compute resources in a public cloud; there’s no other way to implement a security boundary between a bare-metal server and a virtual network. That’s how the AWS Nitro project started, and once they solved that particular challenge, they decided to offload networking to a DPU anyway1.
Not surprisingly, that’s not how the marketers are selling DPUs to the unsuspecting CxOs. VMware talks about DPU acceleration but does not yet support NSX running on a DPU attached to a bare-metal server.
Improved Packet Forwarding Performance
What else is there? Improved performance and reduced power utilization are the usual claims. Now let’s assume that the network interface in a DPU has no secret sauce that wouldn’t be available in a regular network interface card2. Where would the perceived performance improvement come from? There are only a few possible answers:
- The networking stack of the host operating system sucks so badly3, that it makes sense to offload packet processing to a more streamlined implementation. Please note we’re talking about bloatware not hardware limitations – when done right, a 2-socket Xeon server can handle 1 Tbps of encrypted traffic (or so fd.io claims).
- The ARM CPU used in the DPU can do the same amount of work while transforming less electricity into heat than the host x86 CPU. While that might be true, I wouldn’t expect drastic savings assuming the comparable quality of software packet forwarding implementations.
- You’re hitting the bandwidth limitations of the server PCI/memory bus, and reducing the number of times the main CPU has to look at a byte in transit (for example, offload encryption to DPU) helps you reach the 700 Gbps-per-server goal. Offloading encryption to DPU is thus a fantastic feature if you’re Netflix (talk, slides); everyone else probably doesn’t care.
Not surprisingly, NVIDIA loves comparing its DPU performance with OVS. It’s easy to excel when you start with a pretty low bar ;), and reading their reports it looks like the primary role of the DPUs is to add another layer of abstraction to hide how much the software packet forwarding of the virtual switch you believed in sucks4.
But wait, there’s more. DPU vendors love to point out that DPUs reduce the number of host CPU cores needed to perform a specific task. What a revelation: you add CPU cores on DPUs to the data center, so you need fewer CPU cores in servers to get the same amount of work done. I would say that’s a classic attempt to shift revenue away from Intel5.
In the end, DPUs aren’t magic. DPUs are additional servers inserted between existing servers and the network. Unless you’re using them as a front-end to bare-metal servers, you’re just shifting the workload and squashing the complexity sausage.
Do they make sense if you’re not at the absolute bleeding edge of the packet forwarding performance? Do the math (ignoring for the moment the increased complexity and exciting new bugs):
- How much does a DPU cost, and how many CPU cores will it free up for other work?
- How much will you save on core-based licensing6 if you deploy DPUs, and how much will the DPU licenses cost?
If you have a hammer, use it whenever you see a nearby nail ↩︎
I have yet to see a vendor with a magic unicorn-smelling bit of silicon that is not packaged in every possible form factor. ↩︎
One of the Linux virtual switch implementations managed to push 1 Gbps of traffic when a VMware virtual switch effortlessly saturated multiple 10 Gbps uplinks. No wonder one can use DPU offload to increase its performance. ↩︎
RFC 1925 Rule 6a is proud of those efforts ;) ↩︎
… and vendors licensing their software based on CPU cores. ↩︎
There’s a reason I’m mentioning CPU core-based licensing, DPUs, and VMware NSX in the same blog post. Caveat emptor. ↩︎