Most people casually involved with virtual appliances and network function virtualization (NFV) believe that replacing Linux TCP/IP stack with user-mode packet forwarding (example: Intel’s DPDK) boosts performance from meager 1 Gbps to tens of gigabits (and thus makes hardware forwarding obsolete).
Having data points is always better than having opinions; today let’s look at Receiving 1 Mpps with Linux TCP/IP Stack blog post.
2015-07-18: The blog post was updated based on feedback by Kristian Larsson.
Long story short: it’s always been possible to get good packet forwarding performance on Linux, and the solutions have been well known for years.
Before we start
You might miss the bigger picture by focusing solely on packet forwarding performance.
In many cases, 1Gbps of forwarding performance is more than good enough. In others, you cannot use hardware forwarding anyway because the problem cannot be solved in dedicated hardware at reasonable cost (example: large-scale TCP optimization).
Finally, sometimes the amount of processing done on a single packet limits the throughput (example: deep packet inspection), and there’s not much you can do apart from throwing more cores at the problem (Palo Alto has a firewall with 100 Gbps throughput … using 400 cores).
And now let’s see how badly Linux TCP/IP stack did
The author of the blog post I mentioned above used several tricks to achieve the target performance:
- Sending and receiving multiple messages at the same time instead of a single message per system call (which got him to 350 kpps)
- Using multi-queue NICs to spread the load across multiple CPU cores, which increased the throughput to 440 kbpps;
- Multi-threaded application, which finally got him to the 1 Mpps (or 1.4 Mpps on finely-tuned memory architecture).
Where’s the problem?
With all this being said, why don’t we see better forwarding performance in virtual appliances doing simple packet processing?
In most cases, the answer is surprisingly simple: because the vendors ported their existing code to VM format, and replaced direct access to dedicated hardware with calls to Linux kernel (thus making every possible mistake they could). Vendors that spent time optimizing the code (Vyatta, Juniper) got the performance you’d expect (Juniper managed to push 160 Gbps through their vMX).
So what was I trying to say?
Kristian Larsson left a nice comment saying "OK, so what exactly were you trying to say?" Let me try to organize my thoughts at least a bit:
(A) Contrary to what some Software Defined Evangelists think, there’s no magic bullet or universal culprit.
(B) The tricks that people reinvent all the time have been well-known if not exactly well documented for years. See also Scaling in the Linux Networking Stack documentation from kernel.org.
(C) There’s always the weakest link and if you don’t know what it is, you’ll have performance problems no matter what.
(D) Once you work around the weakest link, there’s another one waiting for you.
(E) If you don’t know what you’re doing, you’ll get the results you deserve.
(F) And finally, sometimes good enough is good enough.
And all I need now is a hacker misunderstanding my post and telling me how stupid I am ;)
Interested in virtual forwarding performance?
You’ll find tons of useful information in Software Gone Wild podcast:
- Snabb Switch and NFV on OpenStack;
- Snabb Switch deep dive;
- Ntopng and PF_RING;
- Palo Alto virtual firewall;
- Large-scale TCP optimization;
- L2VPN over IPv6 with Snabb Switch.