Bufferbloat Killed my HTTP Session… or not?

Every now and then I get an email from a subscriber having video download problems. Most of the time the problem auto-magically disappears (and there’s no indication of packet loss or ridiculous latency in traceroute printout), but a few days ago Henry Moats managed to consistently reproduce the problem and sent me exactly what I needed: a pcap file.

TL&DR summary: you have to know a lot about application-level protocols, application servers and operating systems to troubleshoot networking problems.

Henry immediately noticed something extremely weird: all of a sudden (in the middle of the transfer), my server sent destination unreachable ICMP reply and stopped responding to TCP packets.

I was totally stumped – the only module on my web server that could generate administratively prohibited ICMP reply seemed to be iptables, so it looked like the web server dropped the TCP session without sending TCP RST or FIN (weird) and the iptables module subsequently rejected all incoming TCP packets of the same session.

The pcap file showed plenty of retransmissions and out-of-order packets (it looks like there really are service providers out there that are clueless enough to reorder packets within a TCP session), but there was no obvious reason for the abrupt session drop, and the web server log files provided no clue: all requests sent by Henry’s web browser executed correctly.

The only weird clue the pcap file provided was the timing: session dropped approximately 17 seconds after the transfer started, which was unpleasantly close to a 15-second timeout I vaguely remembered from one of the web server configuration files. A quick search found the only parameter that seemed to be relevant:

$ ack 15 conf*/*
89:KeepAliveTimeout 15

The KeepAliveTimeout specifies how long a web server keeps an idle HTTP session open, so it might be relevant… but why would it kick in during the data transfer?

I thought the answer could be bufferbloat: excessive buffering performed in various parts of the TCP stack and within the network. It looked like my web server managed to dump the whole video file into some buffers and considered the transfer completed in seconds. When the browser failed to send another command within 15 seconds (because it was still busy receiving the data), the web server decided it was time to close the idle HTTP session.

Based on that assumption it was easy to implement a workaround: increase the KeepAliveTimeout to 60 seconds. Seems like it solved the problem (I also added “send Connection: close header on long downloads” to my bug list).

It’s probably not that simple

I’m still trying to understand what exactly Henry experienced. After all, there are plenty of people all around the world accessing my web site over low-speed lines (thus downloading individual files for minutes) and none of them experience the same symptoms. Henry might have accessed my web site through a transparent web proxy that buffered too much data, or it might have been something completely different.

Have you experienced something similar? Write a comment!

More information


  1. Agree with that parenthetical (weird). I wonder what would happen if the packet filter was sending TCP RST instead of ICMP unreach. (-p TCP REJECT --reject-with tcp-reset).

    One point that does make sense is Linux (by default) ratelimits certain ICMP (including unreach) per target, to 1 per second. Hence why the screenshot doesn't show more than one ICMP for all those incoming packets. http://linux.die.net/man/7/icmp

    I didn't understand the red lines at the bottom either. Doesn't the ICMP quote the TCP/IP headers, get passed to the transport layer, and terminate the connection immediately? . Hmm, it's claimed that at least Linux clients treat ICMPs for established TCP connections as soft errors (retransmit opportunities), to prevent DoS.
    1. Soft error cite: http://www.gont.com.ar/drafts/icmp-attacks/draft-ietf-tcpm-icmp-attacks-01.html#changingharderrors
  2. Why do you say that out of order packets are SP's issue? SP only provides IP connectivity and does not care about top sessions
    1. ... and who could potentially reorder the packets that the server obviously sent in sequence (otherwise there would be many more reorders)? Packet Gnomes?
    2. ...or Data Demons, or Packet Pixies. ...Seriously though, that made me laugh pretty hard.
    3. I also thought that TCP segment reordering is a job of the Transport layer, which is a layer on the client and server, and what in the middle are not aware of.
    4. Different packets in the same session are not guaranteed to take the same path through the internet.

      Due to various path and link load-balancing methods, different packets can take a different path, and some might be queued differently along the path, or the path might have a different latency -- resulting in unpredictable ordering.

      It is still TCP's job to produce a reliable stream from this.

      And if Apache sent all the data, the client should get all the data, before they process the closing of the TCP stream....
    5. Yes, that's why SP only guarantee IP connectivity and it's not aware of the top session
    6. @Ivan : end system's TCP stack ?

      Unless you have very high-speed TCP connections (sensibly more than 100Mbps/TCP connection) out-of-order packets should have no impact. At very high speeds, you may start seeing some performance decrease.

      ..... and then the security guys came with fancy ideas (badly) implemented in fancy firewalls .....
    7. Well, there is definitely something weird going on, more in a follow-up post. As for impact of out-of-order packets, where did you get the 100 Mbps+ number?
    8. I have to recognize that 100Mbps is quite random. On a LAN/MAN (ms-level latency), that limit is definitely higher. It all depends on "how badly out-of order" they are.
  3. It looks as a http server bug. What http server do you have?
    Did you check the http-server logs? Also you can run strace -p to see on system calls arguments and return values. Possible it shows you something interesting.
    1. Apache 2.2 (yeah, I know). No indication of weird behavior in the server logs, and unfortunately I cannot run strace because the problem appears very rarely and intermittently.
  4. If bufferbloat on the web server, implementing bql + sch_fq + pacing would help


    Implementing htb + fq_codel on your home router would help more.

  5. Could be traversing a FWSM in the SP infrastructure...

  6. Are you sure the ICMP unreachable came from your box? What does the TTL look like on the ICMP unreachable versus the other packets from your server?
  7. Wow. Thanks a million! I learn something new ever day ;)
  8. another option to locate the source of the ICMP unreachable (in case its over a wan link ) is to see the original packet encapsulated in the ICMP and calculate the RTT of that packet based on the delay from the original packet
    This way you can understand where in the path the the the ICMP unreachable came from
  9. Good analysis
    Agreed; a transparent proxy might have done this, after buffering the response on a slow connection it might have received close from server which might have lead to this ICMP error towards client.
    I was thinking keepalive-timeout is applied to identify idle connection and triggered after completing the request but apache defines it as time after the last request received!! Is this a gap in our understanding or a bug?

  10. Lot of questions than answer

    we are talking about 2 issues here TCP connection close and TCP-re-order. Is there are relationship between them ?. As per one of the comments on TCP_Sequence_Number_Randomization_and_SACK, how does the end node behave when SACK is invalid ?

    focusing on connection down issue, do not understand few things

    1) If the problem is due to buffer bloat (atleast that is what it seem to indicate due to increase in http keepalive timer), the ICMP packet generated by IPtables seem to reach earlier than actual download packets. Does the ISP throttling based on amount of upload/download ?

    2) The post has packet captures at one end, is there a packet captures from other end ?
  11. We had two issues lately with similar symptoms:
    1. ngnix chunked encoding + content length bug: in this case files were corrupted between backend and frontend servers, but frontend happily passed broken files to clients - fixed by disabling some nginx modules.
    2. TCP offload on hypervisor-NIC received large segments and ACK'ed them on behalf of a VM, but VM's stack was out of buffers and never received this data - patially fixed with server-side shaping.
  12. Tcp offload on network card might do a lot of weird things, that is not possible to capture by wireshark, but SPAN.
Add comment