The Myth of Lossless vMotion

As a response to my Live vMotion into VMware-on-AWS Cloud blog post Nico Vilbert pointed me to his blog post explaining the details of cross-Atlantic vMotion into AWS.

Today I will not go into yet another rant pointing out all the things that can go wrong, but focus on a minor detail: “no ping was dropped in the process.

The vMotion is instantaneous and lossless myth has been propagated since the early days of vMotion when sysadmins proudly demonstrated what seemed to be pure magic to amazed audiences… including the now-traditional terminal window running ping and not losing a single packet.

The reality has always been a bit murkier. While vMotion (and other live VM migration technologies) do most of their work in the background (the process copiously explained in various VMware knowledge base articles and in my vSphere webinar), there comes a time when you have to do the switch and move the running VM to another hypervisor. The high-level view of that process goes along these lines:

  • Freeze the VM;
  • Collect the remaining data that has to be transferred;
  • Copy the data to target hypervisor over a TCP session;
  • Thaw the VM on target hypervisor;
  • Do a few more magic tricks like sending RARP broadcasts (because it would be too hard to find out VM’s IP address and send gratuitous ARP like everyone else), and let dynamic MAC learning do its job.

It was never clear to me how you could make this process lossless… and it turns out you can’t. Spirent did a great demonstration during a Networking Field Day a long while ago - they used continuous stream of UDP packets to measure the VM responsiveness, and were able to show exactly how long the outage usually lasts. Don’t remember the results, but 50-100 msec sounds about right… and unfortunately they didn’t allow that demonstration to be recorded. Sometimes it’s better not to poke a 400-pound gorilla with a sharp stick (or take selfies with a sleeping elephant).

Considering all this, how do the look Ma, no lost ping demonstrations work so beautifully? A bit of statistics and a bit of luck ;) Usual ping implementations started with default parameters send a packet every second, so you have to be somewhat unlucky to hit the short window when the VM is frozen… but even if you do, the default ping timeout is a few seconds, which means that by the time the timeout waiting for the lost packet expires the moved VM had enough time to get ready to do some more business, and it looks like you lost a single packet.

How about doing the same process across Atlantic (at 100 msec latency)? It probably takes a few RTTs to complete the VM handover between source and target hypervisor, so the probability of losing a ping should be much higher. Either Nico was lucky, or “he only dropped a couple of packets”, which could mean “the VM was not operational for seconds”.

Finally, how is this angels dancing on a pin discussion relevant to your environment? As long as you’re moving VMs within a single data center (like everyone else is doing for the last decade or two), and those VM run traditional TCP workloads, you’ll do just fine … but if you bought into the idea of running VoIP gateways or packet forwarding devices (aka Network Function Virtualization) in VMs, and you start moving those VMs around the data center, you might make a few customers a bit unhappy. Moving those VMs across continents or oceans would make quite a few people extremely unhappy (more so when the latency increases by orders of magnitude).

As always, understanding the fundamentals and the limitations of the technologies you’re using is probably still a good idea.

1 comments:

  1. Great article for the train ride in this morning.
Add comment
Sidebar