Stateful Firewalls: When You Get to a Fork in the Road, Take It

If you’ve been in networking long enough you’d probably noticed an interesting pattern:

  • Some topic is hotly debated;
  • No agreement is ever reached even though the issue is an important one;
  • The debate dies after participants diverge enough to stop caring about the other group.

I was reminded of this pattern when I was explaining the traffic filtering measures available in private and public clouds during the Designing Infrastructure for Private Clouds workshop.

Almost a decade ago I was observing a heated “I don’t need no stinking firewalls” debate. Fast forward to 2019 and you’ll find a huge gap: enterprise environments insist on using stateful firewalls in front of application stacks, sometimes deploying complex deep packet inspection, while all you can get from public cloud providers like AWS, Azure or GCP is packet filters with reflexive ACL functionality.

It’s obvious which approach scales. It’s also obvious that the scalable approach (reflexive ACLs) makes perfect sense and works well assuming you have a vague idea what you’re doing.

The memcrashed crowd proved how clueless some people deploying workloads in public cloud could be.

The laws of physics haven’t changed in the meantime: any volume-based DoS attack will crash a stateful firewall before affecting the servers the firewall is protecting. Considering all this, why do we still see stateful firewalls deployed as the weakest link in the application stack?

The only reason I can see is the CYA mentality: security team in an enterprise environment is strictly separate from the application team which is not responsible for the security aspects of whatever they deploy. The security team, in turn, doesn’t trust the application developers and deploys the most complex firewall possible in front of the application stack hoping that Santa Claus and his $vendor elves will save the day.

The real fun starts when the CIO of a dysfunctional organization I just described decides it’s time to move into the cloud. You’ll either see developers that never had to care about security deploying their workloads directly into the public cloud, or security team trying to replicate enterprise environment in a public cloud and desperately trying to figure out how to push all traffic through a firewall running in a virtual machine. Fun times either way ;)

While it’s hard to fix siloed teams (and I stopped trying or caring a long while ago), there’s something we can try to do: educate people who might care. Matthias Luft already covered the big picture in DevOps and Security for Enterprise Environments webinar, will start the journey into cloud security with Cloud Security Basics webinar on April 16th, and continue with deep dive into individual topics throughout 2019.

You can access both webinars and attend the live session on April 16th with Standard Subscription.


  1. Doesn't a device/system with a reflexive ACL capability maintain some sort of state? How do the $cloud_providers handle that state?
    1. Try for example

      Here's one of the first results:

      For details of a specific implementation see

      You might also want to investigate iptables.
    2. Thanks for the hints. Wasn't the debate about stateless or stateful firewalls? Because with reflexive ACLs you have to maintain a state. I thought that stateless firewalls (router ACLs) would scale.
    3. Hey, you're right - missed that bit. Apologize for that.

      Microsegmentation done right or security groups available in AWS, Azure, GCP, OpenStack are stateful. The real reason they scale is because they're implemented at the edge (in compute infrastructure - hypervisors). Have to write another blog post on the topic ;)
    4. And even their scale has limitations. Take AWS for example: look under Network ACLs and Security Groups. I don't think every firewall guy would be happy about those limits :D
  2. It get's even funnier when an enterprise implements micro segmentation, assuming the "contracts" aren't of the allow any any to everywhere type, and still insists on pushing _all_ traffic through physical firewall clusters for inspection, having to break open all encrypted traffic in the process. And let's not forget the fact that your typical enterprise cluster technology sucks, especially if it's a two member cluster (split brain anyone?) and/or $deity forbid a stretched variation.
    If there's only 1 thing you're going to pick from cloudy architectures, at least think about loosely coupled individual components: small failure domains/blast zones (data/control _and_ config domains). It might involve a bit more thinking, but deploying (at least) two individual L3 connected components beats any cluster technology in my view. Granted, depending on your L3 design, for firewalls you might need some state/session syncing, but that's fundamentally a far less complex issue.
  3. The thing we've been struggling with, as you hinted in the post, is the Security team wanting every feature in the world to be in front of every application. The whole "no-trust" thing has really taken hold and they want scrubbing, sandboxing, A/V, etc in front and in-between even layer of the application. And the firewall vendors are really drumming up the FUD and Infosec dept are really drinking that cool-aid.

    It is so hard to work through and it just makes things so much more complicated than it needs to.
  4. I don't see any reasons to put NGFW or classic Stateful L3/4 firewall in front of 3-tiers application as those devices just add unnecessary latency and can't provide Multi-Terabit bandwidth that are necessary for East-West traffic in the DC .i once checked the Firepower Oracle IPS rules ( they are basically snort rules) and 90% of those rules are about the buffer overflow of older oracle versions that simply can be ignored if you regularly patch your oracle software.i think the best place to place the NGFW/Stateful firewalls are North-South traffics and for East-West it is better to add visibility via IDS sensors and Flow-Based monitoring tools to detect a breach.i like the idea of ARISTA DirectFlow assist (offload some traffic from FW and let the Switch handle it) . maybe someone argue about the fragmented traffic can bypass the stateless ACLs but simply deny any fragment packets via ACLs and use the TCP Flags filtering via extended ACLs can mitigate the risk.the only problem with stateless ACLs is the number of rules could be more than ACLs on stateful firewall as you need to handle the returning traffic (also can be mitigated with ESTABLISHED statement on ACL).i don't like the reflexive ACLs as they put to much tax on TCAM for some older devices like C6500 and can easily overwhelm TCAM with just a few ACL.there are some vendors claim that their FW used some sort of NP/ASIC to handle multi-gigiabit traffics without any performance penalty but again when some further inspection for some L3/4 traffic needed they handle that types of traffic via software.i see that firewall with NP/ASIC can only handle 30% of traffic in hardware and the remains punt to the CPU for inspection.
    Thanks you Ivan to shed some light on this topic.
Add comment