Build the Next-Generation Data Center
6 week online course starting in spring 2017

Designing Active-Active and Disaster Recovery Data Centers

A year ago I was a firm believer in the unlimited powers of Software-Defined Data Centers and their ability to simplify workload migrations. After all, if you can use an API to create any data center object, what’s stopping you from moving the workload running in a data center to another location.

As always, there’s a huge difference between theory and reality.

Reality Distortion Field Has Failed

Being a slightly skeptical eternal optimist, I created a workshop description for Interop Las Vegas 2015 which still sounded pretty positive and mentioned SDDC as a potential solution.

In December 2014, the reality hit… hard. I was running a workshop for a global organization that was sold on a simple idea: using SDDC (from the vendor that created the acronym) it’s easy to pick up your toys (= application workload), pack them in a large bag, walk away to a different sandbox (= public cloud), drop them out of the bag and continue playing.

During the workshop we identified numerous obstacles and missing orchestration components, and concluded that it’s totally impossible to achieve what they planned to do. The best they could do at that time was to manually recreate network infrastructure (= subnets) and services (= firewalls and load balancers) in a second virtualized environment (disaster recovery data center or public cloud), and afterwards restart VMs from the failed data center in that cloud.

The only approach that would do what my customer wanted at that time was automated application deployment using tools like Cloudify, but that solution was further away from their grasp than Alpha Centauri – they were a traditional enterprise IT shop with manual non-repeatable server creation and application deployment processes.

After three days we had to conclude that there’s nothing SDDC could do for them to solve their immediate workload migration problems, and that they should focus on automating application development and deployment processes (yeah, I know I sound like Captain Obvious).

It seems that NSX 6.2 and SRM 6.1 might be a step in the right direction, but I have to read the documentation to find out the potential “minor” details.

Adjusting to Reality

Based on that traumatic experience, I decided to refocus my Interop presentation on what works now in real life and not surprisingly the best answer is “proper application architecture”.

Anyway, the Interop workshop documented numerous challenges you might encounter on your journey (including finite bandwidth, non-zero latency, unpredictable failures, bad application architectures, vendors promoting obviously-stupid things), and resulted in a fantastic experience for the attendees even though the workshop was just before the evening party and I ran way overtime.

An updated version of that workshop is now becoming a webinar with a more appropriate title: Designing Active-Active and Disaster Recovery Data Centers. I split the webinar in two live sessions due to its length: the first one on October 14th, the second one on November 11th.

To register for the live webinar session, go to the webinar description page (subscribers can obviously register free of charge).

Keeping Past Promises

I promised the people who bought the Designing Private Cloud Infrastructure webinar in the past (before October 1st 2015) access to the contents of this webinar. If you’re one of them, you’ll get a notification that you got access to the new webinar and will be able to register for the live session.

12 comments:

  1. "In December 2015, the reality hit…hard"? Welcome back to the present, what is the future like? not so bright for SDDC I take it.

    ReplyDelete
    Replies
    1. Well, obviously I don't own a DeLorean yet, but unfortunately I don't expect the results to be much different in December 2015.

      Fixed, thank you!

      Delete
  2. non-zero latency? theory vs reality or just stating obvious?

    ReplyDelete
    Replies
    1. You wouldn't believe how non-obvious that is to most people.

      Delete
    2. Unfortunately I do. I've seen and worked with many of those people.

      Delete
    3. Ivan, do you think there's a way to talk or write about glaring but nontheless common technical misconceptions (like non-zero latency) in a way that's both not condescending, and will reach the right people?

      I've noticed that a lot of blog posts I've done targeted at beginners don't get a lot of views (like https://jayswan.github.io/2013/10/16/java-is-to-javascript-as-car-is-to/).

      Do you think this is because blog audiences aren't the ones who need those topics? I've noticed that your posts have gotten steadily more advanced over the years, and wondered if that was part of the reason.

      Delete
    4. I just wanted to add that I don't think anything in Ivan's post is condescending -- I'm asking a more generalized question about how to help educate the IT population as a whole about common technical misunderstandings.

      Delete
  3. Will these webinars sessions be recorded? I will be able to attend only one session live.

    ReplyDelete
    Replies
    1. All live webinars are recorded, and the recordings (in form of downloadable MP4 videos) are made available within 48 hours of the live session.

      Delete
  4. Here's a short video demo on a related topic: Overlay network design and route optimization for multi-DC applications in the context of vMotion. https://www.youtube.com/watch?v=CFBm3EFFdCY. Ivan: Any thoughts on the principle of associating subnets with particular DC sites, and only announcing /32 host routes for VMs that are "away from home"?

    ReplyDelete
    Replies
    1. Obviously it works... as far as someone is willing to accept /32 from the data center. The fundamental question remains, though: WHY do we need to move VMs between data centers, and WHY does someone claim that broken applications deserve this level of complexity and perpetual technical debt.

      Delete
    2. I agree - and in particular WHY web front-end VMs, which are supposedly stateless. Still, I believe it is worthwhile to have a solution in place which can support vMotion between DCs when needed, with optimized routing even for individual VMs. But just because you can, doesn't mean you should

      Delete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.