Another day, another interesting Expert Express engagement, another stretched layer-2 design solving the usual requirement: “We need inter-DC VM mobility.”
The usual question: “And why would you want to vMotion a VM between data centers?” with a refreshing answer: “Oh, no, that would not work for us.”
There are two different mechanisms we can use to move VMs around a virtualized environment: hot VM mobility where a running VM is moved from one hypervisor host to another and cold VM mobility where a VM is shut down, and its configuration moved to another hypervisor, where the VM is restarted.
Some virtualization vendors might offer a third option: warm VM mobility where you pause a VM (saving its memory to a disk file), and resume its operation on another hypervisor.
Why Do We Care?
You might not care about the mechanisms hypervisors use to move VMs around the data center, but you probably do care about the totally different networking requirements of hot and cold VM moves. Before going there, let’s look at the typical use cases.
Where Would You Need One or the Other?
Hot VM mobility is used by automatic resource schedulers (ex: DRS) that move running VMs between hypervisors in a cluster to optimize their resource (CPU, RAM) utilization. It is also heavily used for maintenance purposes: for example, you have to evacuate a rack of servers before shutting it down for maintenance or upgrade.
You’ll find cold VM mobility in almost every high-availability (ex: VMware HA restarts a VM after the server failure) and disaster recovery solution (ex: VMware’s SRM). It’s also the only viable technology for VM migration into the brave new cloudy world (aka cloudbursting).
Hot VM Move
VMware’s vMotion is probably the best-known example of hot VM mobility technology. vMotion copies memory pages of a running VM to another hypervisor, repeating the process for pages that have been modified while the memory was transferred. After most of the VM memory has been successfully transferred, vMotion freezes the VM on source hypervisor, moves its state to another hypervisor, and restarts it there.
A hot VM move must not disrupt the existing network connections (why else would you insist on moving a running VM?). There are a number of elements hat have to be retained to reach that goal:
- VM must have the same IP address (obvious);
- VM should have the same MAC address (otherwise we have to rely on hypervisor-generated gratuitous ARP to update ARP caches on other nodes in the same subnet);
- After the move, the VM must be able to reach first-hop router and all other nodes in the same subnet using their existing MAC addresses (hot VM move is invisible to the VM, so the VM doesn’t know it should purge its ARP cache).
The only mechanisms we can use today to meet all these requirements are:
- Stretched layer-2 subnets, whether in a physical (VLAN) or virtual (VXLAN) form;
- Hypervisor switches with layer-3 capabilities. Hyper-V 3.0 Network Virtualization is pretty good, and the virtual switch used by Amazon’s VPC would be perfect.
You might also want to keep in mind that:
- Stretched layer-2 domains are not the best idea ever invented (server/OS engineers that understand networking usually agree with that).
- Layer-2 subnet with BUM flooding represents a single failure domain and a scalability roadblock.
Corollary: Keep the hot VM mobility domain small.
Cold VM Move
Cold VM move is a totally different beast – a VM is shut down and restarted on another hypervisor. It could easily survive a change in its IP and MAC address were it not for the enterprise craplications written by programmers that have never heard of DNS. Let’s thus assume we have to deal with a broken application that relies on hard-coded IP addresses.
IP address of the first-hop router is usually manually configured in the VM (yeah, I’m yearning for the ideal world where people use DHCP to get network-related parameters) and thus cannot be changed, but nothing stops us from configuring the same IP address on multiple routers (a trick used by first-hop localization kludges).
We can also use routing tricks (ex: host routes generated by load balancers) or overlay networks (ex: LISP) to make the moved VM reachable by the outside world – a major use case promoted by LISP enthusiasts.
However, there’s a gotcha: even though the VM has moved to a different location, it left residual traces of its presence in the original subnet: entries in ARP caches of adjacent hosts and routers. Routers are usually updated with new forwarding information (be it a routing protocol or LISP update), adjacent hosts aren’t. These hosts would try to reach the moved VM using its old MAC address … and fail unless there’s a L2 subnet between the old and the new location.
Does all this sound like complex spaghetti mess with loads of interdependencies and layers of kludges? You’re not far away from the truth. But wait, there’s more … eventually LISP will be integrated with VXLAN for a seamless globe-spanning overlay network. It just might be easier to fix the applications, don’t you think so?
If you need to: