To Drop or To Delay, That’s the Question on Software Gone Wild
A while ago I decided it's time to figure out whether it's better to drop or to delay TCP packets, and quickly figured out you get 12 opinions (usually with no real arguments supporting them) if you ask 10 people. Fortunately, I know someone who deals with TCP performance for living, and Juho Snellman was kind enough to agree to record another podcast.
Spoiler alert: many things we "know" about TCP are not exactly true. For example, packet drops are not a big deal (but selective acknowledgments and default retransmit timeouts are).
Interestingly, we started discussing the reasons some people want to reinvent TCP, do it over UDP, and hide what they're doing from the network, but quickly got back to the fundamental question: to drop or to delay.
The answer is surprising: while you might get better results responding to increased delays (as opposed to packet drops), responding only to drops is a better survival strategy, as latency-sensitive algorithms back off sooner than drop-sensitive ones and thus starve in a congested networks when competing with drop-sensitive algorithms.
For more details, listen to Episode 70 of Software Gone Wild.
I've been reading a lot about the CAKE queue discipline, part of the bufferbloat project: https://www.bufferbloat.net/projects/codel/wiki/CakeTechnical/
The authors have an incredibly good practical understanding of what makes networks stay fast under load.
I would love to see this qdisc implemented on platforms like Mellanox Spectrum, that have enough buffer to do it right at 100GE & still remain cost-effective.