Long-Distance Workload Mobility in Perspective

In a recent blog post, Chuck Hollis described how some of EMC customers use long-distance workload mobility. Not surprisingly, he focused on the VPLEX Metro part of the solution and didn’t even mention the earth-flattening requirements this idea imposes on the network. I guess you already know my views on that topic, but regardless of my personal opinions, he got me curious.

Here are a few interesting facts I gathered from his blog post:

We were able to move a running virtual machine in about 15-20 seconds.

Sounds about right. Data center interconnect link is the major bottleneck in the whole story and based on the numbers it appears they have a 10GE link or really small virtual machines.

Four engineers moved 30 to 40 virtual machines the first weekend ...

Please read this sentence again. And again. And again. Now please tell me why I should be excited. At 20:1 virtualization ratio, that’s two full vSphere servers. Over a weekend.

Also, since they worked on weekends, I don’t understand the need for live VM mobility. They could have shut down the VMs, move them over, and start them in the other data center, maybe even combining that with the regular monthly patching.

... and then gradually moved over 250 systems during the next three weeks.

Here the press release is getting imprecise. Is it 250 application stacks, 250 servers or 250 virtual machines? Assuming they moved 250 virtual machines, that’s less than 15 physical servers, or two UCS chassis (= half a rack).

Anyhow, I hope they cleaned the network configuration afterwards – having two data centers in a single failure domain is usually a bad idea.

Finally, one must admire EMC’s marketing. They managed to produce a revolutionary headline out of the above-mentioned facts: “EMC VPLEX Enables Major Law Firm to Migrate 250 Live Systems and Data in 15 to 20 Seconds with Zero Downtime” Congratulations, you just won the exaggeration-of-the-month award.

4 comments:

  1. So, 25 mile long virtual Ethernet cables to move "processing and storage" to the second datacenter. OTV used to extend the subnets to the second datacenter, meaning that traffic is still gatewaying through the first datacenter. The VMs are thus tethered to the first datacenter and now network failure of either datacenter will cause application disruption. This also causes traffic tromboning because traffic would have to flow through the first datacenter in order to pass between the Presentation, Application, and Database tiers of the application thus loading up the Metro Ethernet link and introducing latency between the tiers which would likely negatively impact application performance.

    This is only a tool. It is not a solution. Until such time as we re-home the VMs to the second datacenter using LISP or automating the re-IPing and firewall policies and such that the traffic gateways through the second datacenter, this long distance VMotion is of little use.

    ReplyDelete
    Replies
    1. I would only presume that the first hop/default gateway was using HSRP localization. This basically allows you to have your gateway IP addres exist at both sites even though they're in the same stretched broadcast domain. I can't speak for load balancers, NAS devices or other application dependencies that can't be usually just be vmotioned.

      Delete
  2. Hang on just a second there, Ivan... There's an exaggeration-of-the-month award?

    ReplyDelete
    Replies
    1. There definitely should be one, and the competition would be fierce ;) Anyhow, these guys deserved it.

      Delete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.