TCP Optimization with Juho Snellman on Software Gone Wild
Achieving 40 Gbps of forwarding performance on an Intel server is no longer a big deal - Juniper got to 160 Gbps with finely tuned architecture - but can you do real-time optimization of a million concurrent TCP sessions on that same box at 20 Gbps?
Juho Snellman from Teclo Networks explained how they got there in Episode 25 of Software Gone Wild… and you’ll learn a ton of things about radio networks on the way.
After the handshake the behavior of the system would be hard to distinguish from a transparent proxy that terminated the connection, but just miraculously happened to negotiate the same TCP options / sequence numbers on both sides. It'll for example react in similar ways to incoming packets, and needs to store very similar data as a terminating proxy.
That data includes all the TCP state variables (e.g. sequence numbers, window scale, congestion control state, SACK blocks, etc), RTT measurements, various kinds of timers, all the payload data that has been sent by one endpoint but not yet acknowledged by the other, and so on. And of course you need separate copies of all of this state for each half. So each session from our point of view is really matching the two paired connections that a terminating proxy would create.
The main advantages from not terminating are related to robustness, such as being drop out from optimizing the connection without confusing the endpoints. There aren't really any scalability benefits.
(1M sessions is actually understating things a bit; we expect about 200K concurrent sessions per 1Gbps of traffic in a typical mobile network. So for a 10Gbps deployment you'd be looking at a typical case of 2M concurrent sessions, and needing to dimension for a worst case of at least 5M).
This episode was almost like a godsend to me, since I am in the middle of troubleshooting an issue with a new version of a VPN client no longer working over mobile connections of a certain mobile provider. We could track the problem down to the mobile network sending a SYN/ACK for about everything (obviously they don't use the teclo product ;-)) and the captive portal detection feature of the new VPN client version... I was really in doubt of what I was seeing in the sniffer trace. Does the mobile provider mess with the TCP?!? Apparently they do! Some in a better some in a worse way ;-)