The FTP Butterfly Effect

Anyone dealing with FTP and firewalls has to ask himself “what were those guys smokingthinking?” As we all know, FTP is seriously broken interestingly-designed:

  • Command and data streams use separate sessions.
  • Layer-3 addresses and layer-4 port numbers are carried in layer-7 messages.
  • FTP server opens a reverse session to a dynamic port assigned by the FTP client.

Once upon a time, there was a very good reason for this weird behavior. As Marcus Ranum explained in his Internet nails talk @ TEDx (the title is based on the For Want of a Nail rhyme), the original FTP program had to use two sessions because the sessions in the original (pre-TCP) Arpanet network were unidirectional. When TCP was introduced and two sessions were no longer needed (or, at least, they could be opened in the same direction), the programmer responsible for the FTP code was simply too lazy to fix it.

The list of problems created by someone saving a few hours of coding is long. The original sin was the widespread acceptance of the stupid idea that it’s OK to use server-to-client sessions and embedded layer-3 addresses in application data stream. As the programmers are usually not too versed in networking protocols, they looked at past examples whenever coding a new application and decided they can do the same thing; we’ve thus ended with numerous broken applications (including SIP) that need stateful firewall inspection and application-level gateways (ALG) to work with NAT.

Just imagine how much simpler our life would be if we would only have to deal with client-to-server TCP sessions with no embedded addresses ... or if the TCP/IP protocol stack would have a session layer that would solve the peer-to-peer issues once and for all in a central piece of code.

16 comments:

  1. I ran into a recently-coded version of this problem just the other day with a proprietary, vertical market VoIP appliance. It wouldn't function through a simple NAT, so I captured a few packets and searched for the hex encoding of the device's IPv4 address in the packet payload... and there it was. You would think that vendors in 2010 would at least try to use SIP or H.323 or something that firewalls can deal with, but in this case they decided to re-invent the square wheel.
  2. IPv6 will arrive when FTP will dead.
    It's better the worst standard wide accpeted than the best propietary solution ever created.
    But FTP is my worst nighmare from a long long time ago. Last problem with a load balancer was related to the balancing of this protocol.
    I agree all the post. From tip to toe.
  3. I think you don't get the idea of FTP protocol. It was possible to copy files between two computers using third "control" computer. ex: there is a fast line between computers A and B. You are sitting on computer C with very slow line. Is there a way how to copy files from A to B fast? Guess what - FTP can do it! ;-)

    And why FTP can do it? There are 2 reasons - it uses 2 ports (20,21) AND it has destination IP address embedded in protocol! (ok, there is third reason - it uses server iniciated file transfer)
  4. > Command and data streams use separate sessions.

    There are very good reasons for having separate TCP sessions for data and control messages. It allows cancelling transfers without having to tear down and reestablish the entire connection; doing this with a single stream requires a significantly more complex protocol.

    > Layer-3 addresses and layer-4 port numbers are carried in layer-7 messages.

    It's hard to take anyone seriously who talks about "layer 3" and "layer 7" as if they're something other than academic trivia. It's perfectly normal to pass ports and IPs through control sessions.

    And as davro pointed out, you completely ignored (and were probably ignorant of) the benefit of FXP.

    > FTP server opens a reverse session to a dynamic port assigned by the FTP client.

    All sane, modern FTP clients default to PASV transfers.

    This post is nothing but attacking a mature protocol based on the notion that "it's been around for a long time, therefore it must be horribly broken". Very poor.
  5. Thanks for the feedback. The server-to-server copy is awesome engineering. Would you have any indication that it was done by design (or was someone clever enough to figure out how to use existing functionality)?
  6. #1 - Agreed. However, introducing chunking would not make for a significantly more complex protocol. It's easier, though, to think about opening two sessions if your model is "end-to-end connectivity with no limitations". Once that model breaks down (due to security restrictions or NAT), we're all in deep trouble.

    #2 - You might have a problem with ivory-tower "professionals" (so do I), but sometimes it helps to spend some time thinking about the bigger picture and have a structured model of what you're trying to do. Most "senior" disciplines do (for example, the architects have learned their lessons after "a few" broken bridges and/or buildings), in the networking world some people still think doing just-in-time hacks beat proper engineering. It does in the short term ... and causes everyone needlesss pain in the long term.

    As for FXP: it's awesome. Thanks for pointing me to this little gem. Do you have any real-life usage examples?

    #3: The problems with FTP (and my grudges against it) are 20+ years old. At the time when we had to implement the first firewalls and the first NAT devices, PASV transfers were not available everywhere. Anyhow, PASV transfer just shifts the burden to the other end.

    Last but not least, I don't care about FTP per-se; I am in this industry long enough to have some perspective (I would hope) ... you did read my CV before writing that last line, did you? My point was that the bad practices promoted by FTP were picked up by numerous other protocols and made firewalls and NAT devices way more complex than they would have to be if some people would listen to "academic trivia" every now and then.
  7. It is by design, see RFC 959 figure 2 :-)
  8. Well, the original RFC 114 has neither PASV nor FXP functionality. Both appeared (approximately) nine years later in RFC 765, so the server-to-server copy was definitely not in the minds of people designing FTP.

    As for the later chicken-and-egg problem, it could be that someone figured out the PASV functionality is needed (to make packet filters work) and then someone else said "wait, with a nice hack, we could have server-to-server transfers" ... or someone wanted to have server-to-server transfers, implemented PASV and then other people figured out it could be used to make packet filters simpler.

    I have no background information whatsoever (it would be nice to have it), but I suspect the FXP was there first, as many FTP clients did not support PASV when I started implementing firewalls (you know, the "traditional" two-routers-and-a-host-in-between type of firewalls).
  9. Sorry, but RFC 114 doesn't describe same protocol as RFC 765. This is completely different protocol for file transfer.

    And as for me, I believe, that FXP was feature authors carried in mind when creating that protocol, because fast links were too rare, so file copy server-client-server was a bad idea. Bad luck we can't ask Jon Postel. But Jon was one of the fathers of internet and I do believe, he was smart enough to see the need of server to server file copy.
  10. Oh my ... if only I would have waited for a day, then I could claim it was a joke :-E

    You're absolutely correct. And with FXP being in the first incarnation of FTP, I agree with you it was there to support server-to-server copies. It was probably also easier to implement in those days than starting a telnet session to the host and doing FTP on the host.
  11. Damned, my comment disapeared in Null0 :-/

    But you are right that FTP has evolved from RFC 114 somehow. See RFC 430, 438 and discussions on server to server copy.

    But we must take care of history, I think that FTP we are talking about, can't be older than TCP/IP :)
  12. With academic point of view, I would drop NAT idea as nonsense :) Internet was a synonym of end-to-end connectivity and I hope someday it will be again.
  13. Climbing the steps to my ivory tower ... just a moment ... only a few steps left ... oh, this is getting harder every year ... OK, here I am: NAT does not break the end-to-end connectivity, the sessions are still established directly between the client and the server, it just destroys unique end-to-end addressing 8-) PAT interferes with port numbers, but the sessions are still end-to-end.

    Jumping through the ivory-framed window ... falling ... OUCH, reality hurts.

    I absolutely agree with you, NAT was a bad idea. It's yet another proof how quick hacks can proliferate when people refuse to consider the long-term view ... which brings us back to the IP addresses embedded in application data stream (you see, I've learned how to use the proper engineering words). :-P
  14. Climbing the steps to my ivory tower ... just a moment ... only a few steps left ... oh, this is getting harder every year ... OK, here I am: NAT does not break the end-to-end connectivity, the sessions are still established directly between the client and the server, it just destroys unique end-to-end addressing 8-) PAT interferes with port numbers, but the sessions are still end-to-end.

    Jumping through the ivory-framed window ... falling ... OUCH, reality hurts.

    I absolutely agree with you, NAT was a bad idea. It's yet another proof how quick hacks can proliferate when people refuse to consider the long-term view ... which brings us back to the IP addresses embedded in application data stream (you see, I've learned how to use the proper engineering words). :-P
  15. NAT is a partial end-to-end. Ex: connection between 192.168.1.2 in company A and 192.168.1.2 in company B is a bit problem. You have to have 1:1 mapping between private and public addresses. So, that way of using NAT is also academic, because it doesn't solve IP depletion.

    But embeding L3 address in L7 from today's point of view causes problems when you switch from IPv4 to IPv6. Nowadays I would use TLV representation of address to easily switch between different kinds of network addressing :) (as I don't see any other chance to do server-to-server copy)
  16. I've actually used it fast server to server FTP transfer, while being on the end of a slow dial up line, many years ago in the early 1990s ... before NAT came along and broke it :(
Add comment
Sidebar