DPU Hype Considered Harmful

The hype generated by the “VMware supports DPU offload” announcement already resulted in fascinating misunderstandings. Here’s what I got from a System Architect:

We are dealing with an interesting scenario where a customer had limited data center space, but applications demand more resources. We are evaluating whether we could offload ESXi processing to DPUs (Pensando) to use existing servers as bare-metal servers. Would it be a use case for DPU?

First of all, congratulations to whichever vendor marketer managed to put that guy in that state of mind. Well done, sir, well done. Now for a dose of reality.

They might be trying to solve the wrong problem. Unless their customer’s workload is a bunch of ginormous VMs running in a one-VM-per-server environment, or a Kubernetes cluster, one cannot replace ESX hosts with bare-metal servers. In most cases, you still need a mechanism to share a single server’s memory and CPU resources across multiple workloads, and you can usually choose between VMs (requiring a hypervisor) or containers.

You cannot move the ESXi hypervisor to a DPU. AWS managed to do that with their Nitro cards, but they had to rewrite a hypervisor from scratch to get there. I would be extremely (pleasantly) surprised if VMware manages to get anywhere close to that in the future – it’s usually impossible to start a clean-slate project in a large company focused on quarterly results and singing the “doing more with less” jingles.

In any case, most of the ESXi hypervisor still runs on the primary server; the only task you can offload to a DPU in vSphere release 8 is vSphere Distributed Switch1. Whether that’s a significant improvement depends on how network-intensive your applications are.

Finally, VMware does not support DPU offload with bare-metal servers in the initial vSphere 8/NSX 4.0.1 release anyway. The whole idea was a non-starter.

Now for an off-topic thought. This particular instance of hype got me thinking about how far System Architects need to understand the underlying technologies used in their solutions. It’s pretty clear one cannot trust vendor marketing or industry press which often does a great job cluelessly rephrasing vendor press releases2. Depending on the in-house experts would be the obvious solution, but we all know how well that works. Unfortunately, I have no good answer and would appreciate your comments.


  1. … and lose Network IO Control, traffic shaping policies, and security intercept at the NIC level (DV filter) while doing that. ↩︎

  2. We’ll ignore sponsored podcasts with technically competent hosts politely avoiding pointed questions for the moment. ↩︎

4 comments:

  1. People should follow your example and read the docs. If the docs don't say you can do it, you can't do it. And if the docs say you can do it, you still need to do a POC to flush out any bugs.

  2. I utterly fail to understand on a deeper level the "how" and "what" a DPU actually offloads and accelerates. If I read the VMware KB articles, they only ever mention mysterious "infrastructure" and "networking" functions that are accelerated.

    Lets follow a frame from the wire to the application socket in a VM

    slow and low, traditional: HW RX_RING -> NIC MEM -> DMA to Kernel RX_RING -> via CPU to VM RX_RING -> via CPU to VM Userspace. (shorter and DMA if NIC has HW queues)

    PMD approach: HW RX_RING -> NIC MEM -> via CPU poll to Userspace (if NIC supports HW queues)

    SRIOV Passthrough: HW RX_RING -> NIC MEM (queues per VF) -> DMA to VM Kernel RX_RING -> via CPU to VM RX_RING -> via CPU to VM Userspace. (PMD also possible)

    DPU? HW RX_RING -> NIC_MEM -> {mysterious DPU things} -> via DMA to VM RX_RING (???)

    Especially that last bit (copy via CPU from VM Kernelspace to VM Userspace or PMD'ing it ) can obviously not be avoided in any scenario, even with DPUs. And that is the most CPU intensive task in all the processing of a frame. So, how do DPUs accelerate frame processing or even offload it significantly?

    And another thing: dVS offloaded to DPU? So Inter-VM traffic on same host has to pass though DPU??? I recon that is a significant overhead.

  3. > I utterly fail to understand on a deeper level the "how" and "what" a DPU actually offloads and accelerates

    There was a pretty good article describing how it works that I can't find anymore, but it said pretty much what this blog is saying: https://cormachogan.com/2022/09/27/vsphere-distributed-services-engine-networking-offload-and-acceleration-preview/

    You could either use SR-IOV or some mechanism that looks like software (kernel) patch cables between VMs and DPU. The "heavy lifting" (VXLAN, DFW) would be done on the DPU.

    > So, how do DPUs accelerate VM frame processing or even offload it significantly?

    Of course they do not. There's no magic.

    > So Inter-VM traffic on same host has to pass though DPU

    Correct.

    > I recon that is a significant overhead.

    Of course there's overhead. I have no idea how significant it is.

  4. Thanks. That link led me to two comprehensible videos that actually explain DPUs at a deeper technical level without the usual marketing kerfuffle. Many mysteries remain, but I am getting closer to understand DPUs.

    https://www.youtube.com/watch?v=Qjbll68I2tk which is basically the TL;DR version of https://www.vmware.com/explore/video-library/video-landing.html?sessionid=1652202889974001lYkd&videoId=6311751939112

Add comment
Sidebar