Hot and Cold VM Mobility

Another day, another interesting Expert Express engagement, another stretched layer-2 design solving the usual requirement: “We need inter-DC VM mobility.”

The usual question: “And why would you want to vMotion a VM between data centers?” with a refreshing answer: “Oh, no, that would not work for us.”

The Confusion

There are two different mechanisms we can use to move VMs around a virtualized environment: hot VM mobility where a running VM is moved from one hypervisor host to another and cold VM mobility where a VM is shut down, and its configuration moved to another hypervisor, where the VM is restarted.

Some virtualization vendors might offer a third option: warm VM mobility where you pause a VM (saving its memory to a disk file), and resume its operation on another hypervisor.

Why do we care?

You might not care about the mechanisms hypervisors use to move VMs around the data center, but you probably do care about the totally different networking requirements of hot and cold VM moves. Before going there, let’s look at the typical use cases.

Where would you need one or the other?

Hot VM mobility is used by automatic resource schedulers (ex: DRS) that move running VMs between hypervisors in a cluster to optimize their resource (CPU, RAM) utilization. It is also heavily used for maintenance purposes: for example, you have to evacuate a rack of servers before shutting it down for maintenance or upgrade.

You’ll find cold VM mobility in almost every high-availability (ex: VMware HA restarts a VM after the server failure) and disaster recovery solution (ex: VMware’s SRM). It’s also the only viable technology for VM migration into the brave new cloudy world (aka cloudbursting).

Hot VM move

VMware’s vMotion is probably the best-known example of hot VM mobility technology. vMotion copies memory pages of a running VM to another hypervisor, repeating the process for pages that have been modified while the memory was transferred. After most of the VM memory has been successfully transferred, vMotion freezes the VM on source hypervisor, moves its state to another hypervisor, and restarts it there.

A hot VM move must not disrupt the existing network connections (why else would you insist on moving a running VM?). There are a number of elements hat have to be retained to reach that goal:

  • VM must have the same IP address (obvious);
  • VM should have the same MAC address (otherwise we have to rely on hypervisor-generated gratuitous ARP to update ARP caches on other nodes in the same subnet);
  • After the move, the VM must be able to reach first-hop router and all other nodes in the same subnet using their existing MAC addresses (hot VM move is invisible to the VM, so the VM doesn’t know it should purge its ARP cache).

The only mechanisms we can use today to meet all these requirements are:

  • Stretched layer-2 subnets, whether in a physical (VLAN) or virtual (VXLAN) form;
  • Hypervisor switches with layer-3 capabilities. Hyper-V 3.0 Network Virtualization is pretty good, and the virtual switch used by Amazon’s VPC would be perfect.

You might also want to keep in mind that:

Corollary: Keep the hot VM mobility domain small.

Cold VM move

Cold VM move is a totally different beast – a VM is shut down and restarted on another hypervisor. It could easily survive a change in its IP and MAC address were it not for the enterprise craplications written by programmers that have never heard of DNS. Let’s thus assume we have to deal with a broken application that relies on hard-coded IP addresses.

IP address of the first-hop router is usually manually configured in the VM (yeah, I’m yearning for the ideal world where people use DHCP to get network-related parameters) and thus cannot be changed, but nothing stops us from configuring the same IP address on multiple routers (a trick used by first-hop localization kludges).

We can also use routing tricks (ex: host routes generated by load balancers) or overlay networks (ex: LISP) to make the moved VM reachable by the outside world – a major use case promoted by LISP enthusiasts.

The last time I was explaining how cold VM mobility works with LISP in an ExpertExpress WebEx session, I got a nice question from the engineer on the other end: “And how exactly is that different from host routes?” The best summary I’ve ever heard.

However, there’s a gotcha: even though the VM has moved to a different location, it left residual traces of its presence in the original subnet: entries in ARP caches of adjacent hosts and routers. Routers are usually updated with new forwarding information (be it a routing protocol or LISP update), adjacent hosts aren’t. These hosts would try to reach the moved VM using its old MAC address … and fail unless there’s a L2 subnet between the old and the new location.

Does all this sound like complex spaghetti mess with loads of interdependencies and layers of kludges? You’re not far away from the truth. But wait, there’s more … eventually LISP will be integrated with VXLAN for a seamless globe-spanning overlay network. It just might be easier to fix the applications, don’t you think so?

More information

If you need to ...

9 comments:

  1. "the virtual switch used by Amazon’s VPC would be perfect."

    I hear you mention Amazon's VPC as a solution for a problem from time to time.

    You're referring to the fact that the consumer would need to outsource to AWS right? They are not providing their VPC solution as a software package I can install and manage myself in my own DC right?

    ReplyDelete
    Replies
    1. Of course they're not providing their secret sauce as a software package ;)

      Delete
  2. Excellent blogpost, as a Datacenter network guy I am still implementing L2 stretched solutions. Not only VM's need failover, also other solutions such as loadbalancers, MPLS solutions, redundant routers, ... are designed redundantly across datacenters. VXLAN seems promising for VM hot move. I guess there is no industry standard solution for this problem because of the easy setup of a strechted layer2 solution. It might also be the most economical solution too ...

    ReplyDelete
  3. Good post!

    Might to good to point out that cold migration has two categories:
    1) high availability
    2) disaster recovery

    In the case of high availability there is no need for a new IP as restart happens in the same domain / same cluster.

    With disaster recovery a restart could be anywhere, this is indeed where a new IP could be required and the nightmare of a lot of app owners.

    ReplyDelete
    Replies
    1. Whether the VM can retain the same or gets a new IP does not depend on HA versus SRM, but on how what the requirements are (and consequently how you set up the network).

      There are Apps people (not too many, but they do exist) that understand how to use DNS and don't care that the VM IP address changes after an inter-DC HA restart event. Amazing, isn't it ;)

      Delete
  4. What's the difference between cold and warm VM mobility? only the boot time?

    ReplyDelete
    Replies
    1. Primarily boot time, although do keep in mind that it might confuse some badly broken applications if they wake up with a different IP address.

      Delete
    2. Uhm... I don't see the difference between the different type of migration (replications) in regard to the change or not of the IP address...
      However, the change of the IP address is always a problem, I think a Server can't change it's IP address, never. I don't know the more deep tricks about DNS, but I think that it doesn't exist a valid trick to solve on-the-fly a problem related to a migration of a VM to different DC, thinking of the cache of the PCs, the record replication time etc...

      Delete
  5. I guess this is part of the promise of SDN. You have a pool of resources that are dynamically defined and recreated anywhwere in the domain with the correct network, storage, ADC and FW policies applied for that compute need. Great dream but there is a lot of moving parts to get there.

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.