Here’s the question I got from one of my readers:
Do you have any data available to show the benefits of jumbo frames in 40GE/100GE networks?
In case you’re wondering why he went down this path, here’s the underlying problem:
I have a large DC customer, and they complain of slow network performance after hours (when batch jobs and backups run), compared to daytime traffic (when it is more transactional traffic). They been told by their vendors, Jumbo MTU will not give them any benefit.
The vendors are (mostly) right. Jason Boche measured NFS, iSCSI and vMotion performance with jumbo frames a while ago, and got (mostly) the expected results: apart from the minor decrease of overhead it’s not worth the effort.
Michael Webster got slightly more positive results and is thus recommending jumbo frames. Is a few percent increase in performance worth the hassle? Your choice.
This article discusses the viability of jumbo frames from the data center TCP performance perspective. Using jumbo frames to avoid transport network fragmentation or PMTUD problems in tunneling scenarios is probably the least horrible way of solving those issues.
Also, keep in mind that no two TCP stacks are created equal, and there are tons of parameters to tweak (hint: don’t unless you absolutely know what you’re doing). The TCP performance depends heavily on the quality of the TCP stack, which might explain why some stacks work significantly better with jumbo frames while others show little difference beyond the obvious ~3% reduction in TCP/IP header overhead (single-vCPU Linux VM I tested years ago pushed more than 10Gbps at 50% CPU using 1500-byte frames).
Does your experience with jumbo frames differ from what I’ve seen? Write a comment!
What about slow network performance?
Anyway, coming back to slow network performance, you have to figure out what the problem is before randomly tweaking parameters and hoping for a miracle.
Simplest case: network is overloaded (which is easy to measure), and the only solution is to decrease the load, decide which traffic is more important than other (aka QoS), or buy more bandwidth.
If you’re experiencing continuously saturated links, then you might be able to squeeze a bit more out of the network with jumbo frames… but make sure you’re facing this particular problem before twiddling the knobs.
More interesting: the average utilization is low, but you see plenty of packet drops. You might be dealing with an incast problem (many sources sending to a single destination, and queued packets being dropped at the ToR switch), or microbursts overloading the switch buffers… at which point you’re getting close to the black magic la-la-land with vendors telling you crazily disparate stories based on what their gear can do (and don’t even think about huge buffers unless you love bufferbloat).
Buffer sizing and the tolerable amount of drops is a topic I can’t even start to address in a short blog post, but some ipSpace guests did a wonderful job in the past:
- Juho Snellman talked about TCP performance and impact of packet drops;
- I discussed data center TCP with Thomas Graf;
- JR Rivers (CTO at Cumulus Networks) talked about buffers and drops on a free webinar in autumn 2016;
- Terry Slattery (CCIE#1026) talked about network sizing in May 2017.