Fast Linux Packet Forwarding with Thomas Graf on Software Gone Wild
We did several podcasts describing how one could get stellar packet forwarding performance on x86 servers reimplementing the whole forwarding stack outside of kernel (Snabb Switch) or bypassing the Linux kernel and moving the packet processing into userspace (PF_Ring).
Now let’s see if it’s possible to improve the Linux kernel forwarding performance. Thomas Graf, one of the authors of Cilium claims it can be done and explained the intricate details in Episode 64 of Software Gone Wild.
We started with the basics:
- Are the Linux packet forwarding performance numbers tossed around realistic or biased?
- What can one reasonably expect from a Linux kernel?
- Why Linux kernel suboptimal when it comes to packet forwarding performance?
Just a few minutes into our talk we slid down a rabbit hole into the wonderland of BPF. We started with “what is BPF” which turned into a minute of acronyms and tried to step back and do a one-step-at-a-time controlled descent:
- What is bytecode?
- Why would you use byte code (and BPF) instead of writing a kernel module?
- Why is BPF better (or not) than userspace packet forwarding?
- What are the BPF limitations?
- How would you write programs that would generate BPF code that would then be used to process packets?
- What is P4 and who would use it?
BPF sounds like fun, but where would you use it? Thomas is working on an interesting use case (project Cilium) - using BPF to implement container networking - and obviously we had to explore its details:
- How would you use BPF to implement container networking?
- Upgrading networking behavior while the containers are running
- How do you glue namespaces together with BPF?
Next on our acronym list was XDP (eXpress Data Path), a project with roots within CloudFlare that wanted to improve packet dropping performance when defending against DDoS attack. XDP is a packet processing mechanism implemented within the device drivers with BPF, and we went through the details like:
- Why would you prefer XDP in software instead of programming TCAM available on Intel NICs?
- Why would you batch packet processing and why would you do it in a driver instead of Linux kernel?
- How would you bypass the kernel packet forwarding with XDP?
- What hardware could I use with XDP and when can I expect to have support for more hardware?
- Where can I get XDP and how do I get it running?
- How easy would it be to get communication between userspace control plane (or telemetry) and a BPF program?
Finally we turned to down-to-earth aspects:
- Who is using these technologies?
- What is Cilium project and where could I get it?
- How is Cilium enforcing security between containers across multiple hosts?
- Is Cilium ready for production? Is anyone using it today?
Want even more information about BPF and who's using it? Watch Thomas' presentation from KubeCon 2018 and read his blog post about BPF replacing the kernel part of iptables.
https://github.com/iovisor/bpf-docs/blob/master/Express_Data_Path.pdf
https://lwn.net/Articles/682538/
In few words, eBPF and XDP are kinda hot in Linux networking right now and Cilium is definitely using a lot of the new machinery :)
Regarding origin: I meant to say that the origins of the XDP discussion come from a CloudFlare preso at NetDev0.1 which illustrated the problem and use of BPF to determine how to program the NIC tuple filters to drop. This lead to the discussions evolving into XDP and XDP_DROP. As usual, many have been involved, it wouldn't be fair to say that this is entirely a Facebook development either.