The FTP Butterfly Effect
Anyone dealing with FTP and firewalls has to ask himself “what were those guys smokingthinking?” As we all know, FTP is seriously broken interestingly-designed:
- Command and data streams use separate sessions.
- Layer-3 addresses and layer-4 port numbers are carried in layer-7 messages.
- FTP server opens a reverse session to a dynamic port assigned by the FTP client.
Once upon a time, there was a very good reason for this weird behavior. As Marcus Ranum explained in his Internet nails talk @ TEDx (the title is based on the For Want of a Nail rhyme), the original FTP program had to use two sessions because the sessions in the original (pre-TCP) Arpanet network were unidirectional. When TCP was introduced and two sessions were no longer needed (or, at least, they could be opened in the same direction), the programmer responsible for the FTP code was simply too lazy to fix it.
The list of problems created by someone saving a few hours of coding is long. The original sin was the widespread acceptance of the stupid idea that it’s OK to use server-to-client sessions and embedded layer-3 addresses in application data stream. As the programmers are usually not too versed in networking protocols, they looked at past examples whenever coding a new application and decided they can do the same thing; we’ve thus ended with numerous broken applications (including SIP) that need stateful firewall inspection and application-level gateways (ALG) to work with NAT.
Just imagine how much simpler our life would be if we would only have to deal with client-to-server TCP sessions with no embedded addresses ... or if the TCP/IP protocol stack would have a session layer that would solve the peer-to-peer issues once and for all in a central piece of code.
It's better the worst standard wide accpeted than the best propietary solution ever created.
But FTP is my worst nighmare from a long long time ago. Last problem with a load balancer was related to the balancing of this protocol.
I agree all the post. From tip to toe.
And why FTP can do it? There are 2 reasons - it uses 2 ports (20,21) AND it has destination IP address embedded in protocol! (ok, there is third reason - it uses server iniciated file transfer)
There are very good reasons for having separate TCP sessions for data and control messages. It allows cancelling transfers without having to tear down and reestablish the entire connection; doing this with a single stream requires a significantly more complex protocol.
> Layer-3 addresses and layer-4 port numbers are carried in layer-7 messages.
It's hard to take anyone seriously who talks about "layer 3" and "layer 7" as if they're something other than academic trivia. It's perfectly normal to pass ports and IPs through control sessions.
And as davro pointed out, you completely ignored (and were probably ignorant of) the benefit of FXP.
> FTP server opens a reverse session to a dynamic port assigned by the FTP client.
All sane, modern FTP clients default to PASV transfers.
This post is nothing but attacking a mature protocol based on the notion that "it's been around for a long time, therefore it must be horribly broken". Very poor.
#2 - You might have a problem with ivory-tower "professionals" (so do I), but sometimes it helps to spend some time thinking about the bigger picture and have a structured model of what you're trying to do. Most "senior" disciplines do (for example, the architects have learned their lessons after "a few" broken bridges and/or buildings), in the networking world some people still think doing just-in-time hacks beat proper engineering. It does in the short term ... and causes everyone needlesss pain in the long term.
As for FXP: it's awesome. Thanks for pointing me to this little gem. Do you have any real-life usage examples?
#3: The problems with FTP (and my grudges against it) are 20+ years old. At the time when we had to implement the first firewalls and the first NAT devices, PASV transfers were not available everywhere. Anyhow, PASV transfer just shifts the burden to the other end.
Last but not least, I don't care about FTP per-se; I am in this industry long enough to have some perspective (I would hope) ... you did read my CV before writing that last line, did you? My point was that the bad practices promoted by FTP were picked up by numerous other protocols and made firewalls and NAT devices way more complex than they would have to be if some people would listen to "academic trivia" every now and then.
As for the later chicken-and-egg problem, it could be that someone figured out the PASV functionality is needed (to make packet filters work) and then someone else said "wait, with a nice hack, we could have server-to-server transfers" ... or someone wanted to have server-to-server transfers, implemented PASV and then other people figured out it could be used to make packet filters simpler.
I have no background information whatsoever (it would be nice to have it), but I suspect the FXP was there first, as many FTP clients did not support PASV when I started implementing firewalls (you know, the "traditional" two-routers-and-a-host-in-between type of firewalls).
And as for me, I believe, that FXP was feature authors carried in mind when creating that protocol, because fast links were too rare, so file copy server-client-server was a bad idea. Bad luck we can't ask Jon Postel. But Jon was one of the fathers of internet and I do believe, he was smart enough to see the need of server to server file copy.
You're absolutely correct. And with FXP being in the first incarnation of FTP, I agree with you it was there to support server-to-server copies. It was probably also easier to implement in those days than starting a telnet session to the host and doing FTP on the host.
But you are right that FTP has evolved from RFC 114 somehow. See RFC 430, 438 and discussions on server to server copy.
But we must take care of history, I think that FTP we are talking about, can't be older than TCP/IP :)
Jumping through the ivory-framed window ... falling ... OUCH, reality hurts.
I absolutely agree with you, NAT was a bad idea. It's yet another proof how quick hacks can proliferate when people refuse to consider the long-term view ... which brings us back to the IP addresses embedded in application data stream (you see, I've learned how to use the proper engineering words). :-P
Jumping through the ivory-framed window ... falling ... OUCH, reality hurts.
I absolutely agree with you, NAT was a bad idea. It's yet another proof how quick hacks can proliferate when people refuse to consider the long-term view ... which brings us back to the IP addresses embedded in application data stream (you see, I've learned how to use the proper engineering words). :-P
But embeding L3 address in L7 from today's point of view causes problems when you switch from IPv4 to IPv6. Nowadays I would use TLV representation of address to easily switch between different kinds of network addressing :) (as I don't see any other chance to do server-to-server copy)