Stretched Firewalls across Layer-3 DCI? Will the Madness Ever Stop?
I got this question from one of my readers (and based on these comments he’s not the only one facing this challenge):
I was wondering if you can do a blog post on Cisco's new ASA 5585-X clustering. My company recently purchased a few of these with the intent to run their cross data center active/active firewalls but found out we cannot do this without OTV or a layer 2 DCI.
A while ago I expressed my opinion about these ideas, but it seems some people still don’t get it. However, a picture is worth a thousand words, so maybe this will work:
On a more serious note…
Whenever someone proposes a stupidity like “let’s turn our L3 DCI into L2 DCI so we can run stretched firewall cluster on top of it”, politely ask “and what happens when (not if) the DCI link fails?” because asking “what were you smoking” might sound offensive.
Fortunately for everyone who has to work with real-life networks, Cisco engineers (even those working in marketing) tend to be pretty honest when it comes to how things really work, so it was really easy to answer that question by reading the documentation, design guides, and ASA Clustering Deep Dive Cisco Live session:
- A failure in communication between different members of the cluster will result in ejection of that firewall from the cluster;
- CCL (Cluster Communication Link) loss forces the member out of the cluster
- CCL link loss causes unit to shut down all data interfaces and disable clustering. Clustering must be re-enabled manually after such an event
For those who still don’t get it: if you lose the communication between cluster members (which would happen after DCI link failure), the firewalls in one data center shut down and cut that data center off the net.
Do keep in mind that if you have two data centers with L3 DCI between them, they could work independently after DCI link failure (apart from the potential need to synchronize data between them). Building a firewall cluster on top of L3 DCI is thus a huge step back in terms of failure resiliency.
Finally, here’s my message to the vendor sales engineers promoting such stupidities:
Keep it up. Cheers!
MattN: It's what we are aiming for, but we will leave it at 2 firewall cluster local to each DC minus the need to cluster across DCI.
I get where you're coming from. You're an old school, disciplined networking leader that architects networks based on rock-solid, time-tested designs. But it seems that the prevailing fashion in network design and availability go against your traditional design principles: inter-site firewall clustering, inter-site vMotion, DCI, etc
This isn't the first time that readers have asked you about these technologies, and it won't be the last. Vendors will continue to market them despite their shortcomings, and customers will continue to eat them up. I'd like to think that vendors will also continue to work out the kinks and over time the technology will become rock solid and time-tested.
I sincerely ask this out of curiosity and respect, though I will word it very directly: Are you too stuck on past, traditional designs and not being open to new ways of building IT? I get that IT is very cyclical, and these new trends may die in the future...or thrive, and the customers may either fail...or succeed.
For the record, I see the same mindset reflected on the blog posts at netcraftsmen.net, another site comprised of old-school, disciplined professionals. Maybe there's a common theme here...but that won't stop me from asking :-)
In other words: It's not about what you think, it's about how the technology was / is designed to work and how much risk you are willing to take. If you use a tool for a task it's not supposed to work for, it might work for some time, but it will fail at point...
Basically, to do it you will need rock-solid, highly available, high capacity, low latency layer2 between the sites. Not just for the sync between the 2 firewalls, but also for pretty much all of the VLANs that the firewalls connect to, and for everything else around the firewalls as well - think perimeter routers, load balancers, etc. Unless you've got your own tunnel between the buildings, it's not worth the cost and pain of what happens if there's a break.
You can also make your DCI link redundant.
At last, there is an updated version of the ASA Clustering Deep Dive session PDF: https://www.ciscolive.com/online/connect/sessionDetail.ww?SESSION_ID=83709&backBtn=true
As for "making the link redundant" - it's just a question of how much money you're willing to throw at the problem to reduce the risk of failure, but you can never eliminate that risk.
As I wrote several times (note that the first blog post is from 2011), in the end you have to decide how badly you want to fail.