Are stateless ACLs good enough?
In one of his Open Networking Summit blog posts Jason Edelman summarized the presentation in which Goldman Sachs described its plans to replace stateful firewalls with packet filters (see also a similar post by Nick Buraglio).
These ideas are obviously not new – as Merike Kaeo succinctly said in her NANOG presentation over three years ago “stateful firewalls make absolutely no sense in front of servers, given that by definition every packet coming into the server is unsolicited.” Real life is usually a bit more complex than that.
The ideal world
Assuming an ideal world where servers are just that – software listening on TCP or UDP ports – it makes no sense to use stateful firewalls in front of them. All you need is a simple packet filter permitting traffic to TCP and UDP ports you want to have reachable from the outside world … assuming you’re using a recent regularly patched operating system that will not fall over when receiving overlapping or out-of-bounds IP fragments.
In these environments, a stateful firewall is nothing more than a crutch. It must be those ancient apps nobody is allowed to touch ;) … or maybe the stateful firewalls are just CYA devices.
Getting closer to real life
In the real life, each server is a client. For example, in most scale-out application deployments web servers need information from the database servers, authentication servers, cache servers, name servers … and in all those cases they act as clients using unpredictable source TCP/UDP port numbers.
You can still use ACLs in a mixed client/server environment, but you can never control the traffic as tightly as a stateful firewall would.
ACLs with automatic reverse rules: Some products, like XenServer vSwitch Controller, add reverse inbound ACL rules to outbound ACL and vice versa.
For example, you might want to permit inbound (toward the server) traffic to TCP port 80 and outbound (from the server) traffic to TCP port 3306 (MySQL) on the database server. In reality, what you get is an outbound ACL that permits traffic to TCP port 3306 on the database server and all traffic coming from TCP port 80.
I would love to pwn a system “protected” with such an ACL. You can open TCP sessions to anywhere you want – just kill httpd and open your own TCP session from port 80 (you do have to be root to do that).
ACLs matching established sessions: If your ACLs support matching on TCP flags (the famous established keyword of Cisco IOS), you can tighten the security quite a bit. Using the previous example, inbound ACL would match TCP traffic to port 80 and established sessions, while the outbound ACL would match TCP traffic to port 3306 on the database server and established sessions.
Two subnets: In more complex environments you could use servers with two interfaces. For example, you could permit inbound TCP traffic to port 80 and outbound established sessions from port 80 on the user-facing interface, and have no restrictions on the inside-facing interface.
Obviously you’d have to protect the inside servers, so you might want to use private VLANs on the inside subnet (to prevent hopping between web servers in case one gets pwned) and protect the servers residing in the common part (primary VLAN) of the private VLAN with rigorous ACLs.
Is any one of the above solutions good enough for your environment? Would it pass your security audit? Are you using it? Please share your thoughts – write a comment.
Reflexive ACLs?
No, I haven’t missed them. They are no better than stateful firewalls from scalability perspective, and they are (almost) impossible to implement in hardware:
- First packet of each session has to be punted to the CPU to create the reflexive per-flow ACL entry;
- The per-flow ACL entries will quickly overflow the switch TCAM.
Just two data points: Arista’s 7150 has 20,000 ACL entries (~300 per port); Nexus 5500 has 1664 ingress and 2048 egress ACL entries (in total, not per port). Not much for a high-end server environment, is it?
-mike
We're a webserver in this scenario, right? We'd just need to feed tcpdump into a web-accessible file, and have our partner system scrape the log for our SYN packets that would have been dropped.
SYN/ACK reply traffic (initiated by the partner spoofing our address) passes the established test, allowing 3-way handshake to complete. TCP setup timeout is usually 75 seconds, I think. Plenty of time to pull off this attack.
And I still don't get your idea. If the attacker has access to tcpdump on a server, allowing him to gain knowledge of the server's ISN, why bother with spoofing? He's already rooted the server, he can do anything.
Even after establishing the TCP session, he won't be able to see return packets by means other than continuous tcpdump updates, and to get them, he has to connect to the server from a real address. So why the heck would you need spoofing in the first place?
The spoofing comment is not a whole-hog attack approach, just an interesting tidbit about circumventing ACLs with 'established' keyword. Because 'established' passes traffic with ACK bit set, the only thing it drops is the first SYN in the handshake. As long as we have a way to lob that single segment over the ACL (like with a telephone and a friend running scapy, perhaps?), we can communicate through an ACL where 'established' would otherwise have blocked us. No need for tcpdump nonsense after the first segment.
The problem is getting the first SYN through. I've honestly no idea how to do that. Spoof from behind the stateless firewall? Once again, if you can do that, you probably don't need to penetrate the firewall.
Traditional stateful firewalls are going to be susceptible to DoS attack by exhausting their state tables.
Batshit-crazy TCP-proxy firewalls are resistant to this particular DoS attack, but have other problems relating to the MiTM attack they perform on every incoming connection.
There's a great NANOG thread from January 2010: "I don't need no stinking firewall!"
It explains well why stateful firewalls shouldn't frontend a server farm.
Yes, the first SYN is the problem. But it doesn't need to go *through*. That's what makes it interesting. Whether or not an attacker has this need is not for us to say. Maybe it's for data exfiltration? Maybe it's post-compromise lateral movement? I just mention it to illustrate that "established" might not be as robust as folks assume.
However, people tend to forget about spoofed datagrams.
We should put more simple anti-spoofed ACLs on every gateway.