What went wrong: TCP lives in the dial-up world
As expected, my “the socket API is broken” post generated numerous comments, many of them missing the point (for example, someone scolded me for quoting Wikipedia and not the official Linux documentation). I did not want to discuss the intricate technical details of the various incarnations of the API but the generic stupidity of having to deal with low-level networking details in the application.
Fabio was kind enough to provide the recommended method of using the Socket API from man getaddrinfo, effectively proving my point: why should every application use a convoluted function when all we want to do (in most cases) is connect to the server.
Patryk went even further and claimed that the socket API provides “basic functionality” and that libc is not the right place for anything more. Well, that mentality caused most of the IPv4-to-IPv6 application-related issues: obviously the applications developed before IPv6 was a serious consideration had to be rewritten because all the low-level code was embedded in the applications, not isolated in the library. A similar problem has effectively stalled SCTP deployment.
However, these are not the only problems we’re facing today. Even if the application properly implements the “try connecting to multiple addresses returned by DNS” function, the response time becomes unacceptable due to the default TCP timeout values coded in various operating systems’ TCP stacks.
For example, it takes up to three minutes for a TCP connect call to timeout on a Fedora-11 Linux distribution (the connect call aborts immediately if an intermediate router sends back an ICMP unreachable reply and the ARP timeout causes an abort in three seconds). Windows XP is slightly better; the default timeout is set at 20 seconds.
You might wonder what prompted the TCP designers to choose these exceedingly large values. TCP was designed more than 20 years ago when the analog dialup modems were commonly used to connect to the Internet. These modems could take a minute (or longer) to establish the connection and if you wanted to have a reliable TCP session setup, you had to wait significantly longer before aborting the session setup system call. The Internet has changed dramatically in the meantime, but nobody ever bothered changing the defaults.
If you want to rush and write a comment how the default can be changed, you’re yet again missing the point: we cannot implement multihomed IP hosts using more than one IP address due to the crazy default TCP timeout values. As soon as the first address becomes unreachable, the session establishment time (for an average user using out-of-box software) becomes unacceptable.
The value was burned into my memory by a transparent proxy server that couldn't serve its client. 75 seconds after intercepting the server-bound GET request and sending the first SYN packets toward an unreachable server, it abruptly closed the client's connection. Story here: http://tinyurl.com/ylxr6ej
But the 3 minute value certainly checks out:
$ uname -a
Linux gslse 2.6.27-7-server #1 SMP Tue Nov 4 20:16:57 UTC 2008 x86_64 GNU/Linux
$ time telnet 1.1.1.1
Trying 1.1.1.1...
telnet: Unable to connect to remote host: Connection timed out
real 3m8.994s
user 0m0.010s
sys 0m0.000s
The retransmissions come at 3, 6, 12, 24, 48 seconds, then it aborts after an additional 96 seconds. 3+6+12+24+48+98 = 3m9s
...and...
$ uname -a
SunOS 5.11 snv_101b i86pc i386 i86xpv
$ time telnet 1.1.1.1
Trying 1.1.1.1...
telnet: Unable to connect to remote host: Connection timed out
real 3m44.690s
user 0m0.002s
sys 0m0.008s
On Solaris, the retransmissions come at 3.375, 6.75, 13.5, 27, 54, 60, then failure after an additional 60 seconds. 3.375+6.75+13.5+27+54+60+60 = 3m44.625s
My WinXP desktop only retransmits twice, at 3 and 6 seconds, the fails after an additional 8 seconds.
3+6+8 = 17s
Anybody have Sevens Vol II handy? What does the section cited by RFC 5461 (pp828-289) say?
On the other hand who should tweak tcp/ip stack in all this linux distros ? What if it would break some old applications ? That's why I'm not suprised with Solaris it might be bit better in commercial distros like Redhat or Suse.