Category: TCP
Setting Source IP Address on Traffic Started by a Multihomed Host
In the Path Failure Detection on Multi-Homed Servers blog post, I mentioned running BGP on servers as one of the best ways to detect server-to-network failures. As always, things aren’t as simple as they look, as Cathal Mooney quickly pointed out:
One annoyance is what IP address gets used by default by the system for outbound traffic. It would be nice to have a generic OS-level way to say, “This IP on lo0 should be default for outbound IP traffic unless to the connected link subnet itself.”
That’s definitely a tough nut to crack, and Cathal described a few solutions he used in the past:
Worth Reading: MP-TCP in Hybrid Access Networks
Wouldn’t it be nice if your home router (CPE) could use DSL (or slow-speed fibre) and LTE connection at the same time? Even better: run a single TCP session over both links? The answer to both questions is YES, of course it could do that, if only your service provider would be interested in giving you that option.
We solved similar problems with multilink PPP in the networking antiquity, today you could use a CPE with an MP-TCP proxy combined with a Hybrid Access Gateway in the service provider network. For more details, read the excellent Increasing broadband reach with Hybrid Access Networks article by prof. Olivier Bonaventure and his team.
Multipath TCP (MPTCP) Resources
Brian Carpenter published a list of Multipath TCP resources to one of the IETF mailing lists1:
- Modern Multipath Transport Protocols – an ebook by prof. Olivier Bonaventure describing QUIC, multipath TCP and multipath QUIC.
- Multipath TCP Wiki
- Multipath TCP for Linux
- Multipath TCP Python extension module
You might also want to listen to the Multipath TCP podcast we recorded with Apple engineers in 2019.
-
… along with a nice reminder that “it might be wise to look at actual implementations of MPTCP before jumping to conclusions”. Yeah, that’s never a bad advice, but rarely followed. ↩︎
Worth Reading: Unbounded TCP Memory Usage
Another phenomenal detective story published on Cloudflare blog: Unbounded memory usage by TCP for receive buffers, and how we fixed it.
TL&DR: Moving TCP window every time you acknowledge a segment doesn’t work well with scaled window sizes.
The interesting takeaways:
Is It Time to Replace TCP in Data Centers?
One of my readers asked for my opinion about the provocative “It’s Time to Replace TCP in the Datacenter” article by prof. John Ousterhout. I started reading it, found too many things that didn’t make sense, and decided to ignore it as another attempt of a proverbial physicist solving hard problems in someone else’s field.
However, pointers to that article kept popping up, and I eventually realized it was a position paper in a long-term process that included conference talks, interviews and keynote speeches, so I decided to take another look at the technical details.
Worth Reading: QUIC Is Not a TCP Replacement
Bruce Davie makes an excellent point in his QUIC Is Not a TCP Replacement article – QUIC not a next-generation TCP, it’s a reliable RPC transport protocol.
What Bruce forgot to mention is that we had a production-grade RPC transport protocol for years – SCTP (Stream Control Transmission Protocol) – but it had two shortcomings:
- It wasn’t invented by the right people;
- It used a different IP protocol number and thus upset every ossified middlebox in the Internet. QUIC hides on top of UDP (because adding extra headers makes at least as much sense as junk DNA).
Repost: Using MP-TCP to Utilize Unequal Links
In the Does Unequal-Cost Multipathing Make Sense blog post I wrote (paraphrased):
The trick to successful utilization of unequal uplinks is to use them wisely […] It’s how multipath TCP (MP-TCP) could be used for latency-critical applications like Siri.
Minh Ha quickly pointed out (some) limitations of MP-TCP and as is usually the case, his comment was too valuable to be left as a small print at the bottom of a blog post.
Saved: TCP Is the Most Expensive Part of Your Data Center
Years ago Dan Hughes wrote a great blog post explaining how expensive TCP is. His web site is long gone, but I managed to grab the blog post before it disappeared and he kindly allowed me to republish it.
If you ask a CIO which part of their infrastructure costs them the most, I’m sure they’ll mention power, cooling, server hardware, support costs, getting the right people and all the usual answers. I’d argue one the the biggest costs is TCP, or more accurately badly implemented TCP.
Do Packet Drops Matter for TCP Performance?
Approximately two years ago I tried to figure out whether aggressive marketing of deep buffer data center switches makes sense, recorded a few podcasts on the topic and organized a webinar with JR Rivers.
Not surprisingly, the question keeps popping up, so it seems it’s time for another series of TL&DR articles. Let’s start with the basics:
Commentary: We’re stuck with 40 years old technology
One of my readers sent me this email after reading my Loop Avoidance in VXLAN Networks blog post:
Not much has changed really! It’s still a flood/learn bridged network, at least in parts. We count 2019 and talk a lot about “fabrics” but have 1980’s networks still.
The networking fundamentals haven’t changed in the last 40 years. We still use IP (sometimes with larger addresses and augmentations that make it harder to use and more vulnerable), stream-based transport protocol on top of that, leak addresses up and down the protocol stack, and rely on technology that was designed to run on 500 meters of thick yellow cable.
Multipath TCP on Software Gone Wild
I mentioned Multipath TCP (MP-TCP) numerous times in the past but I never managed to get beyond “this is the thing that might solve some TCP multihoming challenges” We fixed this omission in Episode 100 of Software Gone Wild with Christoph Paasch (software engineer @ Apple) and Mat Martineau from Open Source Technology Center @ Intel.
Worth Reading: Discovering Issues with HTTP/2
A while ago I found an interesting analysis of HTTP/2 behavior under adverse network conditions. Not surprisingly:
When there is packet loss on the network, congestion controls at the TCP layer will throttle the HTTP/2 streams that are multiplexed within fewer TCP connections. Additionally, because of TCP retry logic, packet loss affecting a single TCP connection will simultaneously impact several HTTP/2 streams while retries occur. In other words, head-of-line blocking has effectively moved from layer 7 of the network stack down to layer 4.
What exactly did anyone expect? We discovered the same problems running TCP/IP over SSH a long while ago, but then too many people insist on ignoring history and learning from their own experience.
Video: Tools and Knobs to Use when Tweaking TCP Performance
In the second half of his Networks, Buffers, and Drops webinar JR Rivers focused on end systems: what tools could you use to measure end-to-end TCP throughput, or monitor performance of an individual socket or the whole TCP stack?
To Jumbo or Not to Jumbo?
Here’s the question I got from one of my readers:
Do you have any data available to show the benefits of jumbo frames in 40GE/100GE networks?
In case you’re wondering why he went down this path, here’s the underlying problem:
TCP in the Data Center and Beyond on Software Gone Wild
In autumn 2016 I embarked on a quest to figure out how TCP really works and whether big buffers in data center switches make sense. One of the obvious stops on this journey was a chat with Thomas Graf, Linux Core Team member and a founding member of the Cilium project.