To Jumbo or Not to Jumbo?
Here’s the question I got from one of my readers:
Do you have any data available to show the benefits of jumbo frames in 40GE/100GE networks?
In case you’re wondering why he went down this path, here’s the underlying problem:
I have a large DC customer, and they complain of slow network performance after hours (when batch jobs and backups run), compared to daytime traffic (when it is more transactional traffic). They been told by their vendors, Jumbo MTU will not give them any benefit.
The vendors are (mostly) right. Jason Boche measured NFS, iSCSI and vMotion performance with jumbo frames a while ago, and got (mostly) the expected results: apart from the minor decrease of overhead it’s not worth the effort.
Michael Webster got slightly more positive results and is thus recommending jumbo frames. Is a few percent increase in performance worth the hassle? Your choice.
This article discusses the viability of jumbo frames from the data center TCP performance perspective. Using jumbo frames to avoid transport network fragmentation or PMTUD problems in tunneling scenarios is probably the least horrible way of solving those issues.
Also, keep in mind that no two TCP stacks are created equal, and there are tons of parameters to tweak (hint: don’t unless you absolutely know what you’re doing). The TCP performance depends heavily on the quality of the TCP stack, which might explain why some stacks work significantly better with jumbo frames while others show little difference beyond the obvious ~3% reduction in TCP/IP header overhead (single-vCPU Linux VM I tested years ago pushed more than 10Gbps at 50% CPU using 1500-byte frames).
Does your experience with jumbo frames differ from what I’ve seen? Write a comment!
What about slow network performance?
Anyway, coming back to slow network performance, you have to figure out what the problem is before randomly tweaking parameters and hoping for a miracle.
Simplest case: network is overloaded (which is easy to measure), and the only solution is to decrease the load, decide which traffic is more important than other (aka QoS), or buy more bandwidth.
If you’re experiencing continuously saturated links, then you might be able to squeeze a bit more out of the network with jumbo frames… but make sure you’re facing this particular problem before twiddling the knobs.
More interesting: the average utilization is low, but you see plenty of packet drops. You might be dealing with an incast problem (many sources sending to a single destination, and queued packets being dropped at the ToR switch), or microbursts overloading the switch buffers… at which point you’re getting close to the black magic la-la-land with vendors telling you crazily disparate stories based on what their gear can do (and don’t even think about huge buffers unless you love bufferbloat).
Buffer sizing and the tolerable amount of drops is a topic I can’t even start to address in a short blog post, but some ipSpace guests did a wonderful job in the past:
- Juho Snellman talked about TCP performance and impact of packet drops;
- I discussed data center TCP with Thomas Graf;
- JR Rivers (CTO at Cumulus Networks) talked about buffers and drops on a free webinar in autumn 2016;
- Terry Slattery (CCIE#1026) talked about network sizing in May 2017.
If you do not use jumbo frames you need to do fragmentation on routers terminating the tunnels or make the source reduce the frame size (PMTUD, etc). Do you see any benefits here?
Jumbo frames are probably the least horrible option in transport networks to avoid customer traffic fragmentation (or worse). Will update the blog post.
- Transport networks : go-go-go ! Lowest denominator that I could identify is 9150/9164 (NCS5000)
- Hosts : IF you have a good reason to do it, don't go beyond 9000.
As for the reasons to have jumbo on hosts, heavy transfers on networks disjoint from the Internet would be the only decent reason... and maybe still not decent enough...
That said, with modern CPU's (...call it Nehalem generation and newer on the x86 side) the benefits of jumbos on 10GE tend to be marginal at best. For much the same reasons as above, though, at 40- or 100- gigabit it's absolutely worthwhile as, again, most implementations end up being bottlenecked by interrupt handling on a single core and a 5X-6X reduction in packets processed can represent a lot more useful throughput on the wire (...especially in bulk-data situations, like storage/backup).
It is, of course, likely that CPU speeds will eventually catch up - or, perhaps more realistically, we'll see practical PCIe limitations, host economics and app methodologies start to drive 25G and 50G (...thus reducing the aggregate need for jumbos from the point of view of CPU capacity). Either way being very conservative about the use of jumbos is a good idea, as is being incredibly strict about making sure their use is both consistent between host and network and fully manageable/repeatable in operation.
Now the Layer 3 (IP) MTU is something entirely different, and I agree that that should be 1500 by default.
But what's the problem with enabling jumbo frames by default on a switch and leave it up to the end hosts/administrators/operators if they want to enable it on their storage or server system NICs or not?
I can't think of any myself, and I've never heard anyone come up with a reason why that Jumbo Frames on by default wouldn't be the most flexible way of setting a switch up. As far as I know there are no adverse effects from having a smaller MTU on the end hosts than the switch in the middle. Nor are there any by enabling Jumbo Frames but not using them. But there certainly are ill effects of enabling Jumbo Frames on end hosts but not on the switches and L2 links in between them.
Defaults: It would make sense to make jumbo frames default on L2 switches in 2017, but having different platforms (or different software releases) with different defaults is a recipe for disaster.
Enabling jumbo frames by default: it's one more parameter that has to be consistent across all devices, and is easy to miss. No problem if you automated configuration deployment. Hope you already did it. Many others didn't.
Having jumbo frames in network, but not on end devices: what happens when a non-jumbo host OS receives a jumbo frame? Sometimes it depends on the protocol (v4 versus v6). Do you really want to know?
Cranking up the MTU a bit solved this issue but we could have decreased the msg size at the cost of an increase in pps and fixed the reordering as well. The box did NOT support changing the multicast hash.