How many VM moves do you see in a medium and how many in a large data center environment per second and per minute? What would be a reasonable maximum?
Obviously the answer to the first part is it depends (please share your experience in the comments), so we’ll focus on the second one. It’s time for another Fermi estimate.
- A hypervisor host has two 2 10GE uplinks and uses one of them for vMotion;
- vMotion has reasonably small overhead;
- Typical VM size is 4GB = 32 Gbit.
Based on those assumptions, moving that VM across the 10GE uplink would take approximately 4 seconds.
The expected VM move frequency of a hypervisor doing its best to get all VMs off it (meaning you placed it into maintenance mode) is thus ~ 0.25 Hz.
Now for the worst-case scenario: all hypervisors on a ToR switch panic and start shuffling VMs around (within a ToR switch). A typical ToR switch would have ~40 hosts connected to it, resulting in ~10 VM moves per second.
Evacuating a ToR switch
Now for another scenario: a ToR switch fails and even though all the servers connected to it have a redundant 10GE uplink going to another ToR switch you decide to shut them down (just in case the other switch fails as well).
A typical ToR switch has 160 Gbps of uplink capacity (4 x 40GE), and you don’t want to use it all for vMotion. Let’s assume vMotion can consume 100 Gbps of that capacity, resulting in 2.5 VM moves per second (or 4-5 VM moves per second if you’re willing to consume all the uplink bandwidth).
Conclusion: I would be highly surprised if anyone is seeing more than a dozen or so moves per second in a typical enterprise environment with a few thousand VMs. Have you seen more?