What went wrong: TCP lives in the dial-up world

As expected, my “the socket API is broken” post generated numerous comments, many of them missing the point (for example, someone scolded me for quoting Wikipedia and not the official Linux documentation). I did not want to discuss the intricate technical details of the various incarnations of the API but the generic stupidity of having to deal with low-level networking details in the application.

Fabio was kind enough to provide the recommended method of using the Socket API from man getaddrinfo, effectively proving my point: why should every application use a convoluted function when all we want to do (in most cases) is connect to the server.

Patryk went even further and claimed that the socket API provides “basic functionality” and that libc is not the right place for anything more. Well, that mentality caused most of the IPv4-to-IPv6 application-related issues: obviously the applications developed before IPv6 was a serious consideration had to be rewritten because all the low-level code was embedded in the applications, not isolated in the library. A similar problem has effectively stalled SCTP deployment.

However, these are not the only problems we’re facing today. Even if the application properly implements the “try connecting to multiple addresses returned by DNS” function, the response time becomes unacceptable due to the default TCP timeout values coded in various operating systems’ TCP stacks.

For example, it takes up to three minutes for a TCP connect call to timeout on a Fedora-11 Linux distribution (the connect call aborts immediately if an intermediate router sends back an ICMP unreachable reply and the ARP timeout causes an abort in three seconds). Windows XP is slightly better; the default timeout is set at 20 seconds.

You might wonder what prompted the TCP designers to choose these exceedingly large values. TCP was designed more than 20 years ago when the analog dialup modems were commonly used to connect to the Internet. These modems could take a minute (or longer) to establish the connection and if you wanted to have a reliable TCP session setup, you had to wait significantly longer before aborting the session setup system call. The Internet has changed dramatically in the meantime, but nobody ever bothered changing the defaults.

If you want to rush and write a comment how the default can be changed, you’re yet again missing the point: we cannot implement multihomed IP hosts using more than one IP address due to the crazy default TCP timeout values. As soon as the first address becomes unreachable, the session establishment time (for an average user using out-of-box software) becomes unacceptable.

2 comments:

  1. I'd always thought that session establishment failed at 75 seconds. RFC 5461 mentions this value, but it's a reference to Stevens Vol II -- Something I don't have on my desk right now.

    The value was burned into my memory by a transparent proxy server that couldn't serve its client. 75 seconds after intercepting the server-bound GET request and sending the first SYN packets toward an unreachable server, it abruptly closed the client's connection. Story here: http://tinyurl.com/ylxr6ej

    But the 3 minute value certainly checks out:
    $ uname -a
    Linux gslse 2.6.27-7-server #1 SMP Tue Nov 4 20:16:57 UTC 2008 x86_64 GNU/Linux
    $ time telnet 1.1.1.1
    Trying 1.1.1.1...
    telnet: Unable to connect to remote host: Connection timed out

    real 3m8.994s
    user 0m0.010s
    sys 0m0.000s

    The retransmissions come at 3, 6, 12, 24, 48 seconds, then it aborts after an additional 96 seconds. 3+6+12+24+48+98 = 3m9s

    ...and...

    $ uname -a
    SunOS 5.11 snv_101b i86pc i386 i86xpv
    $ time telnet 1.1.1.1
    Trying 1.1.1.1...
    telnet: Unable to connect to remote host: Connection timed out

    real 3m44.690s
    user 0m0.002s
    sys 0m0.008s

    On Solaris, the retransmissions come at 3.375, 6.75, 13.5, 27, 54, 60, then failure after an additional 60 seconds. 3.375+6.75+13.5+27+54+60+60 = 3m44.625s


    My WinXP desktop only retransmits twice, at 3 and 6 seconds, the fails after an additional 8 seconds.
    3+6+8 = 17s

    Anybody have Sevens Vol II handy? What does the section cited by RFC 5461 (pp828-289) say?

    ReplyDelete
  2. Hi Ivan I can see now what was your on your mind. I still think that your statememnt about socket API is controversial one. As contrast using socket API is so much easier than talking directly do DNS via raw sockets and it was major step in terms of new functionality but where it should stop ? Taking your thoughts further network should be presistent,secure available everywhere with infinite bandwidth for all but this is not a perfect world ... 8-)
    On the other hand who should tweak tcp/ip stack in all this linux distros ? What if it would break some old applications ? That's why I'm not suprised with Solaris it might be bit better in commercial distros like Redhat or Suse.

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.