Follow-the-sun workload mobility? Get lost!

A week ago I was writing about the latency and bandwidth challenges of long-distance vMotion and why it rarely makes sense to use it in disaster avoidance scenarios (for a real disaster avoidance story, read this post ... and note no vMotion was used to get out of harm’s way). The article I wrote for SearchNetworking tackles an idea that is an order of magnitude more ridiculous: using vMotion to migrate virtual machines around the world to bring them close to the users (follow-the-sun workload mobility). I wonder which islands you’d have to use to cross the Pacific in 10ms RTT hops supported by vMotion?

As described in my Data Center Interconnects webinar (and the Scalability Rules book), the only solution that really works is a scale-out application architecture combined with load balancers.

Read and enjoy the article ...

6 comments:

  1. You say no VMotion was used, but the article states:

    "Thursday night I completely failed the core datacenter operations over to the recovery servers using a combination of Veeam Replication and VMware migrations that in the end, really didn’t need to happen."

    Of course you can't tell at the time that it didn't need to happen, but VMotion was part of the DR plan and COULD have been necessary if conditions were slightly worse.

    Also, I have a question. I don't know exactly how VMotion operates. Is it possible that the 10 ms RTT restriction might be relaxed in, say, three or four years?

    ReplyDelete
  2. 'VMware migration' doesn't necessarily mean VMotion, and I expect in the context of the article it means a cold migration of a powered-off VM to a new host and new datastore.

    ReplyDelete
  3. He also says he has 100Mb circuit between sites. vMotion requires at least 600 Mbps (1 Gbps is recommended). It might work over lower-speed links, but not likely (page change rate is too high).

    As for RTT, it's actually the bandwidth-delay-product problem. You have to copy memory pages to the other vSphere host faster than the VM changes them and that's hard to do if you have low-bandwidth or high-latency link. Can it be done? Sure. Will they do it? I hope not ;)

    ReplyDelete
  4. vMotion is so cool to see, that it really has set some rough expectation management issues.

    While I do have customers, deploying 10G networks <100KM who plan to do live vMotion the truth is that for most customers pause/stop->sync->resume in new location is MORE than enough to meet the business need. And by accepting a 30, 60, even 300s window the complexity level goes WAY down and the distances supported go WAY up. And no unicorns are harmed ;-)

    ReplyDelete
  5. I think they're playing with it. I have a nice (Cisco internal) slide talking about VMotion over 250ms (!) link. Sure, 10Gbit and so on.

    ReplyDelete
  6. The root cause of the problem is the bandwidth-delay product: how fast can you push memory image across the WAN link while the VM changes its memory.

    The actual results depend on the BW, delay and VM page change rate. You could probably get reasonable results with WAN acceleration (optimizing TCP and/or compressing vMotion data like F5 is doing) if the VM is not doing anything (in which case, why would you want to move it at all ;) ).

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.