The March 2019 Packet Pushers Virtual Design Clinic had to deal with an interesting question:
Our server team is nervous about full-scale DR testing. So they have asked us to stretch L2 between sites. Is this a good idea?
The design clinic participants were a bit more diplomatic (watch the video) than my TL&DR answer which would be: **** NO!
Let’s step back and try to understand what’s really going on:
An anonymous (for reasons that will be obvious pretty soon) commenter left a gem on my Disaster Recovery Test Faking blog post that is way too valuable to be left hidden and unannotated.
Here’s what he did:
Once I was tasked to do a DR test before handing over the solution to the customer. To simulate the loss of a data center I suggested to physically shutdown all core switches in the active data center.
That’s the right first step: simulate the simplest scenario (total DC failure) in a way that is easy to recover (power on the switches). It worked…
After publishing the Disaster Recovery Faking, Take Two blog post (you might want to read that one before proceeding) I was severely reprimanded by several people with ties to virtualization vendors for blaming virtualization consultants when it was obvious the firewall clusters stretched across two data centers caused the total data center meltdown.
Let’s chase that elephant out of the room first. When you drive too fast on an icy road and crash into a tree who do you blame?
- The person who told you it’s perfectly OK to do so;
- The tire manufacturer who advertised how safe their tires were?
- The tires for failing to ignore the laws of physics;
- Yourself for listening to bad advice
For whatever reason some people love to blame the tires ;)
Several engineers formerly working for a large virtualization vendor were pretty upset with me when I claimed that the virtualization consultants promote “disaster recovery using stretched VLANs” designs instead of alternatives that would implement proper separation of failure domains.
Guess what… it’s even worse than I thought.
Here’s a sequence of comments I received after reposting one of my “disaster recovery doesn’t need stretched VLANs” blog posts on LinkedIn sometime in late 2019: