Building Network Automation Solutions
6 week online course starting in September 2017

To Drop or To Delay, That’s the Question on Software Gone Wild

A while ago I decided it's time to figure out whether it's better to drop or to delay TCP packets, and quickly figured out you get 12 opinions (usually with no real arguments supporting them) if you ask 10 people. Fortunately, I know someone who deals with TCP performance for living, and Juho Snellman was kind enough to agree to record another podcast.

Update 2017-03-31: Added More information section

Spoiler alert: many things we "know" about TCP are not exactly true. For example, packet drops are not a big deal (but selective acknowledgments and default retransmit timeouts are).

Interestingly, we started discussing the reasons some people want to reinvent TCP, do it over UDP, and hide what they're doing from the network, but quickly got back to the fundamental question: to drop or to delay.

The answer is surprising: while you might get better results responding to increased delays (as opposed to packet drops), responding only to drops is a better survival strategy, as latency-sensitive algorithms back off sooner than drop-sensitive ones and thus starve in a congested networks when competing with drop-sensitive algorithms.

For more details, listen to Episode 70 of Software Gone Wild.

More information

3 comments:

  1. Thanks so much for shedding light on this, Ivan. So many people are of the opinion that huge buffers make things "better", when this is very often not the case (& tuning them to achieve good results is incredibly complex).

    I've been reading a lot about the CAKE queue discipline, part of the bufferbloat project: https://www.bufferbloat.net/projects/codel/wiki/CakeTechnical/
    The authors have an incredibly good practical understanding of what makes networks stay fast under load.

    I would love to see this qdisc implemented on platforms like Mellanox Spectrum, that have enough buffer to do it right at 100GE & still remain cost-effective.

    ReplyDelete
  2. Great blog, amazing insights. Finally, got the idea of selective ACKs into my thick skull. In fact I didn't even know about SACKs until earlier this month. Now I know about them and how they work, totally grateful. TCP sure is a wack-a-mole beast and many of the ideas I had about it are now out of date. What is really scary is what Juhu has seen in the field, people actually, without knowing the repercussion, removing SACKs. That's like removing the brake from a car or the instrument panel.

    ReplyDelete
  3. Loss probes seem relevant and I don't believe they were mentioned in the discussion. I believe they were added in Linux v3. https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01#section-2

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.