Designing Active-Active and Disaster Recovery Data Centers
A year ago I was a firm believer in the unlimited powers of Software-Defined Data Centers and their ability to simplify workload migrations. After all, if you can use an API to create any data center object, what’s stopping you from moving the workload running in a data center to another location.
As always, there’s a huge difference between theory and reality.
Reality Distortion Field Has Failed
Being a slightly skeptical eternal optimist, I created a workshop description for Interop Las Vegas 2015 which still sounded pretty positive and mentioned SDDC as a potential solution.
In December 2014, the reality hit… hard. I was running a workshop for a global organization that was sold on a simple idea: using SDDC (from the vendor that created the acronym) it’s easy to pick up your toys (= application workload), pack them in a large bag, walk away to a different sandbox (= public cloud), drop them out of the bag and continue playing.
During the workshop we identified numerous obstacles and missing orchestration components, and concluded that it’s totally impossible to achieve what they planned to do. The best they could do at that time was to manually recreate network infrastructure (= subnets) and services (= firewalls and load balancers) in a second virtualized environment (disaster recovery data center or public cloud), and afterwards restart VMs from the failed data center in that cloud.
The only approach that would do what my customer wanted at that time was automated application deployment using tools like Cloudify, but that solution was further away from their grasp than Alpha Centauri – they were a traditional enterprise IT shop with manual non-repeatable server creation and application deployment processes.
After three days we had to conclude that there’s nothing SDDC could do for them to solve their immediate workload migration problems, and that they should focus on automating application development and deployment processes (yeah, I know I sound like Captain Obvious).
Adjusting to Reality
Based on that traumatic experience, I decided to refocus my Interop presentation on what works now in real life and not surprisingly the best answer is “proper application architecture”.
Anyway, the Interop workshop documented numerous challenges you might encounter on your journey (including finite bandwidth, non-zero latency, unpredictable failures, bad application architectures, vendors promoting obviously-stupid things), and resulted in a fantastic experience for the attendees even though the workshop was just before the evening party and I ran way overtime.
An updated version of that workshop is now available as a webinar with a more appropriate title: Designing Active-Active and Disaster Recovery Data Centers webinar.
Fixed, thank you!
I've noticed that a lot of blog posts I've done targeted at beginners don't get a lot of views (like https://jayswan.github.io/2013/10/16/java-is-to-javascript-as-car-is-to/).
Do you think this is because blog audiences aren't the ones who need those topics? I've noticed that your posts have gotten steadily more advanced over the years, and wondered if that was part of the reason.