Stateful Firewall Cluster High Availability Theater

Dmitry Perets wrote an excellent description of how typical firewall cluster solutions implement control-plane high availability, in particular, the routing protocol Graceful Restart feature (slightly edited):

Most of the HA clustering solutions for stateful firewalls that I know implement a single-brain model, where the entire cluster is seen by the outside network as a single node. The node that is currently primary runs the control plane (hence, I call it single-brain). Sessions and the forwarding plane are synchronized between the nodes.

Therefore, in the event of HA failover, all the existing sessions are preserved, and user traffic can just keep flowing. You can get a subsecond failover, delayed only by failure detection (based on HA keepalives sent back-to-back between the nodes and link failure detection).

Since it is a single-brain solution, the BGP daemon runs only on the primary node. Upon HA failover, it starts from scratch on the ex-secondary (new primary). This is where Graceful Restart comes into play. It allows your peers to keep their forwarding state, believing that your HA clustering solution successfully did the same on your side. Hence, you get your Non-Stop Forwarding and don’t bother the rest of your network with BGP convergence while the new HA primary re-establishes its BGP control plane.

Let’s start with a diagram to illustrate our discussion; handwaving should be reserved for academic discussions and podcasts.

┌────────┐   ┌────────┐       
│   X1   │   │   X2   │  ▲    
└────┬───┘   └────┬───┘  │    
     │            │      │    
─────┼────────────┼────  │ BGP
     │            │      │    
┌────┴───┐   ┌────┴───┐  ▼    
│Firewall│   │Firewall│       
└────┬───┘   └────┬───┘  ▲    
     │            │      │    
─────┼────────────┼────  │ BGP
     │            │      │    
┌────┴───┐   ┌────┴───┐  │    
│   C1   │   │   C2   │  ▼    
└────────┘   └────────┘       

The two firewalls act as a single control plane. Both inside switches and outside routers have a BGP session with that single control-plane instance.

The firewalls also share an inside- and an outside IP address, forcing us to build a VLAN linking the two firewall boxes and a pair of switches (or routers). Am I allowed to mention that a single VLAN is also a single failure domain?

The “single firewall control plane that is restarted on the other instance” idea results in a single point of failure with non-negligible downtime, which makes the Graceful Restart the only viable option for non-stop forwarding.

Now that we know what we’re talking about, let’s analyze the various failure scenarios:

  • Firewall power supply failure. The firewall cluster can deal with that.
  • Firewall forwarding table corruption. The only people who might be able to comment on the impact of this one are the engineers writing the code. From an end-user perspective, we might be left with an expensive Schroedinger packet destroyer.
  • Session table corruption. See the previous bullet.
  • Firewall software failure. See the previous bullet. However, I have seen redundant clusters that failed to switch over the control-plane functionality when the primary node refused to die completely.
  • LAN failure. Dig deep into the firewall documentation to see how it reacts to keepalive failure on one of the interfaces, and hope the code works as described.

In any case, the only failure scenario this design protects us against is a hardware failure. That approach might have been the right choice in the 1980s (I’ve seen my share of failed power supplies), but I’m pretty sure we’re seeing more software crashes and weird, hard-to-explain bugs in the 2020s than power supply failures.

As always, software developers believe in the quality of their code, create solutions that cope with what they could imagine other people’s problems to be1, and keep solving yesterday’s problems.

Is there an alternative? Of course. Don’t believe in the magic of firewall clusters. Instead, use two independent firewalls and run BGP with them. Even better, run BGP across them to determine which one has a working data plane.

Want to know more? Read the BGP as a High-Availability Protocol article by Nicola Modena.

  1. That’s why crappy software developers use hardcoded IP addresses. They have no problem imagining DNS failing more often than their code. ↩︎

Latest blog posts in High Availability Service Clusters series


  1. World's by far not taht simple especially if sessions can be asymmetric ;-) HA in such scenario is extremely hard to build, especially since it's not a 2 controller through backplane scenario but "arbitrary distrance, arbitrary delay, may be just a piece of wet string" ;-) For two boxes working solutions from reputable vendors are available since long time, more than two leads into weird stuff like Paxos quickly. There are other, clever ways to deal with HA and asymmetry as well you may want to dig in as well ;-)

    1. You run BGP. You can configure your routing policies that half your traffic is on one firewall and the rest on the other. Hence you can get your symmetric traffic with a bit of work in BGP and still don't need to rely on the SPOC in a HA-Cluster. But yes, you need to establish the sessions new if something fails.

      But I get you and I think that this approach is only viable when you need really high availbability. Not your standard enterprise where they whine a bit when something doesn't work. Stuff that really needs to stay up all the time, depends on this kind of designs. They don't stack, they don't cluster some stuff together. They design around that and make it always work. Which costs quite a bit more, but they know why they pay the premium.

      For most enterprises its just a cluster and everything is fine.

  2. Dear Ivan,

    you are of course allowed to mention in your blog post that a single VLAN is also a single failure domain. See, how very gracious we, your readers, are. :-)

    And because you are correct, the best is to avoid shared L2 even for single control plane HA clusters. I know it can be done with Juniper firewalls, where the trick is to have local bridging on each upstream or downstream router, but not between the routers. In that case, node0:if0+node1:if0=reth0 is connected to the bridge at R1 router and node0:if1+node1:if1=reth1 is connected to the bridge at R2 router. So no matter which node is active, reth0 and reth1 points to both R1 and R2 routers. Each node needs to be connected to both routers, true, but if in your picture X1+C1 and X2+C2 are collapsed to R1 and R2, then you can have the equivalent setup without any more ports in use.

    Similarly, on the management side, OoB interfaces (fxp0 for Juniper) don't have to share the same L2. The so called master-only IP address can be made available with BFD, because the standby node does not run the BFD.

    So you truly don't need to tie your routers with any shared L2 just because of the firewall clusters. And then there is also Multinode HA, which I think Dmitry also mentioned in his post, where nodes are running their own control plane protocols.

    Cheers, Martin

  3. Pick your poison. Firewall vendors suck at implementing BGP even more than HA. And if we're having an honest discussion, network layer firewalls are in general nothing but theater, providing an illusion of "security".

  4. Integrating stateful devices in a network design is always painful. Ideally these devices are avoided by integrating network security in the OS/application, something that is likely feasible with a hyperscaler but unlikely to happen in a typical enterprise, due to the variety of workloads in the enterprise.

    Using dynamic routing with individual firewalls works well for north-south flows, or in a VRF sandwich design. I'm most familiar with PANW firewalls, which have a feature where you can synchronize the stateful sessions between independent firewalls, ensuring the second firewall is ready to take over. Another challenge is to keep the configuration in sync, this should be trivial nowadays with firewall management platforms or various automation capabilities.

    Something what is harder to address is deeper datacenter segmentation. Hypervisor integrated firewalls or host based solutions are the ideal solution, but often cannot address 100% of the workloads. When trying to address this with traditional stateful firewalls in a independent (non-HA cluster) design, things can get complicated quiet fast. Things such as PBR or ePBR can be leveraged, but require a thoughtful design around probing and keeping redirect ACL's sane. Opting for a HA-cluster where the stateful firewall is your default gateway is less complex from a operational perspective.

Add comment