To Jumbo or Not to Jumbo?

Here’s the question I got from one of my readers:

Do you have any data available to show the benefits of jumbo frames in 40GE/100GE networks?

In case you’re wondering why he went down this path, here’s the underlying problem:

I have a large DC customer, and they complain of slow network performance after hours (when batch jobs and backups run), compared to daytime traffic (when it is more transactional traffic). They been told by their vendors, Jumbo MTU will not give them any benefit.

The vendors are (mostly) right. Jason Boche measured NFS, iSCSI and vMotion performance with jumbo frames a while ago, and got (mostly) the expected results: apart from the minor decrease of overhead it’s not worth the effort.

Michael Webster got slightly more positive results and is thus recommending jumbo frames. Is a few percent increase in performance worth the hassle? Your choice.

This article discusses the viability of jumbo frames from the data center TCP performance perspective. Using jumbo frames to avoid transport network fragmentation or PMTUD problems in tunneling scenarios is probably the least horrible way of solving those issues.

Also, keep in mind that no two TCP stacks are created equal, and there are tons of parameters to tweak (hint: don’t unless you absolutely know what you’re doing). The TCP performance depends heavily on the quality of the TCP stack, which might explain why some stacks work significantly better with jumbo frames while others show little difference beyond the obvious ~3% reduction in TCP/IP header overhead (single-vCPU Linux VM I tested years ago pushed more than 10Gbps at 50% CPU using 1500-byte frames).

Does your experience with jumbo frames differ from what I’ve seen? Write a comment!

What about slow network performance?

Anyway, coming back to slow network performance, you have to figure out what the problem is before randomly tweaking parameters and hoping for a miracle.

Simplest case: network is overloaded (which is easy to measure), and the only solution is to decrease the load, decide which traffic is more important than other (aka QoS), or buy more bandwidth.

If you’re experiencing continuously saturated links, then you might be able to squeeze a bit more out of the network with jumbo frames… but make sure you’re facing this particular problem before twiddling the knobs.

More interesting: the average utilization is low, but you see plenty of packet drops. You might be dealing with an incast problem (many sources sending to a single destination, and queued packets being dropped at the ToR switch), or microbursts overloading the switch buffers… at which point you’re getting close to the black magic la-la-land with vendors telling you crazily disparate stories based on what their gear can do (and don’t even think about huge buffers unless you love bufferbloat).

Buffer sizing and the tolerable amount of drops is a topic I can’t even start to address in a short blog post, but some ipSpace guests did a wonderful job in the past:


  1. What about jumbo frames & tunnels e.g. IPIP, GRE, etc?
    If you do not use jumbo frames you need to do fragmentation on routers terminating the tunnels or make the source reduce the frame size (PMTUD, etc). Do you see any benefits here?
    1. That's a totally different story (the question was coming from a data center perspective).

      Jumbo frames are probably the least horrible option in transport networks to avoid customer traffic fragmentation (or worse). Will update the blog post.
    2. I'd say :
      - Transport networks : go-go-go ! Lowest denominator that I could identify is 9150/9164 (NCS5000)
      - Hosts : IF you have a good reason to do it, don't go beyond 9000.

      As for the reasons to have jumbo on hosts, heavy transfers on networks disjoint from the Internet would be the only decent reason... and maybe still not decent enough...
  2. The title was very general but your content was more specific. That's way I added the comment. I am sure you know this side of the network perfectly (like me spending years just in-between tunnels and asymmetry).
  3. One more interesting example of vmxnet3 driver and IPv6 (in case that >1500 is treated as jumbo frame)
  4. The efforts to solve TCP Incast at a higher layer of the stack always felt somewhat misguided. An ethernet problem requires an ethernet solution, right? I have perused a number of research papers over the years and they all seem to overlook switching strategy. I found that cut-through switching is the only thing that has improved the situation.
  5. Wan optimisation techniques using Cisco WAAS and Riverbed would surely be another way to optimise the TCP traffic flows...
    1. Might be hard to do @ 40GE/100GE speeds ;)
  6. Nowadays the relative win with jumbos is effectively zero at 1G, occasionally useful at 10G. When 10G first hit the scene on the host side (practically speaking '04 -'05) jumbos definitely did offer the possibility of allowing the CPU's at the time to avoid interrupt saturation long enough to actually saturate the wire (nb - I recall bigger Sun boxes at the time gaining 30-40% useful throughput by enabling jumbos).

    That said, with modern CPU's ( it Nehalem generation and newer on the x86 side) the benefits of jumbos on 10GE tend to be marginal at best. For much the same reasons as above, though, at 40- or 100- gigabit it's absolutely worthwhile as, again, most implementations end up being bottlenecked by interrupt handling on a single core and a 5X-6X reduction in packets processed can represent a lot more useful throughput on the wire (...especially in bulk-data situations, like storage/backup).

    It is, of course, likely that CPU speeds will eventually catch up - or, perhaps more realistically, we'll see practical PCIe limitations, host economics and app methodologies start to drive 25G and 50G (...thus reducing the aggregate need for jumbos from the point of view of CPU capacity). Either way being very conservative about the use of jumbos is a good idea, as is being incredibly strict about making sure their use is both consistent between host and network and fully manageable/repeatable in operation.
  7. There might not be a lot to be -gained- by enabling jumbo frames, but on the other hand what is there to be lost by enabling them? I have never understood why the default layer 2 MTU on Cisco switches was not to allow the maximum possible frame MTU.

    Now the Layer 3 (IP) MTU is something entirely different, and I agree that that should be 1500 by default.

    But what's the problem with enabling jumbo frames by default on a switch and leave it up to the end hosts/administrators/operators if they want to enable it on their storage or server system NICs or not?

    I can't think of any myself, and I've never heard anyone come up with a reason why that Jumbo Frames on by default wouldn't be the most flexible way of setting a switch up. As far as I know there are no adverse effects from having a smaller MTU on the end hosts than the switch in the middle. Nor are there any by enabling Jumbo Frames but not using them. But there certainly are ill effects of enabling Jumbo Frames on end hosts but not on the switches and L2 links in between them.
    1. You opened a huge can of worms:

      Defaults: It would make sense to make jumbo frames default on L2 switches in 2017, but having different platforms (or different software releases) with different defaults is a recipe for disaster.

      Enabling jumbo frames by default: it's one more parameter that has to be consistent across all devices, and is easy to miss. No problem if you automated configuration deployment. Hope you already did it. Many others didn't.

      Having jumbo frames in network, but not on end devices: what happens when a non-jumbo host OS receives a jumbo frame? Sometimes it depends on the protocol (v4 versus v6). Do you really want to know?
    2. There is a reddit thread on this very blog post. It reveals some strong consensus on the topic.
  8. I just had a multicast publisher pushing messages to a solar flare nic for packaging and the message size being pushed was causing ip fragments to be sent on wire. The fragmentation was causing reordering across some lags because the multicast hash used ip src-dst + src-dst-port if available and ip src-dst if not.

    Cranking up the MTU a bit solved this issue but we could have decreased the msg size at the cost of an increase in pps and fixed the reordering as well. The box did NOT support changing the multicast hash.
Add comment