Follow-the-sun workload mobility? Get lost!
A week ago I was writing about the latency and bandwidth challenges of long-distance vMotion and why it rarely makes sense to use it in disaster avoidance scenarios (for a real disaster avoidance story, read this post ... and note no vMotion was used to get out of harm’s way). The article I wrote for SearchNetworking tackles an idea that is an order of magnitude more ridiculous: using vMotion to migrate virtual machines around the world to bring them close to the users (follow-the-sun workload mobility). I wonder which islands you’d have to use to cross the Pacific in 10ms RTT hops supported by vMotion?
As described in my Data Center Interconnects webinar (and the Scalability Rules book), the only solution that really works is a scale-out application architecture combined with load balancers.
"Thursday night I completely failed the core datacenter operations over to the recovery servers using a combination of Veeam Replication and VMware migrations that in the end, really didn’t need to happen."
Of course you can't tell at the time that it didn't need to happen, but VMotion was part of the DR plan and COULD have been necessary if conditions were slightly worse.
Also, I have a question. I don't know exactly how VMotion operates. Is it possible that the 10 ms RTT restriction might be relaxed in, say, three or four years?
As for RTT, it's actually the bandwidth-delay-product problem. You have to copy memory pages to the other vSphere host faster than the VM changes them and that's hard to do if you have low-bandwidth or high-latency link. Can it be done? Sure. Will they do it? I hope not ;)
While I do have customers, deploying 10G networks <100KM who plan to do live vMotion the truth is that for most customers pause/stop->sync->resume in new location is MORE than enough to meet the business need. And by accepting a 30, 60, even 300s window the complexity level goes WAY down and the distances supported go WAY up. And no unicorns are harmed ;-)
The actual results depend on the BW, delay and VM page change rate. You could probably get reasonable results with WAN acceleration (optimizing TCP and/or compressing vMotion data like F5 is doing) if the VM is not doing anything (in which case, why would you want to move it at all ;) ).