VMware announced several vMotion enhancements in vSphere 6, ranging from “finally” to “interesting”.
vMotion across virtual switches. Finally. The tricks you had to use previous were absolutely bizarre.
A reader sent me this question:
My company will have 10GE dark fiber across our DCs with possibly OTV as the DCI. The VM team has also expressed interest in DC-to-DC vMotion (<4ms). Based on your blogs it looks like overall you don't recommend long-distance vMotion across DCI. Will the "Data Center trilogy" package be the right fit to help me better understand why?
Unfortunately, long-distance vMotion seems to be a persistent craze that peaks with a predicable period of approximately 12 months, and while it seems nothing can inoculate your peers against it, having technical arguments might help.
A while ago I wrote “vMotion over VXLAN is stupid and unnecessary” in a comment to a blog post by Duncan Epping, assuming everyone knew the necessary background details. I was wrong (again).
Moving a running VM into a foreign subnet is Mission Impossible due to stale ARP entries (anyone telling you otherwise is handwaving over a detail or two - maybe their VM doesn't communicate with other VMs in the same subnet), but it's entirely feasible to migrate a cold VM into a foreign subnet if you can fix IP routing. Here's how you can do the trick with Enterasys switches.
Every time someone mentions how awesome new technologies solve live VM mobility across WAN networks, I start muttering unmentionables. Live VM mobility across disjoint layer-2 subnets works great in demos, but usually fails in real life due to stale ARP caches. The only way to solve this problem for good is to implement EC2-like layer-3 forwarding in hypervisor soft switches.
Update: LISP Host Mobility seems to be a potential exception; see the comment from Nico.
For more details, watch the VM Mobility Requirements video (part of Enterasys-sponsored DCI webinar), read the Hot and Cold VM Mobility blog post or watch the recording of NFD4 session with Cisco’s Victor Moreno.
Another day, another interesting Expert Express engagement, another stretched layer-2 design solving the usual requirement: “We need inter-DC VM mobility.”
The usual question: “And why would you want to vMotion a VM between data centers?” with a refreshing answer: “Oh, no, that would not work for us.”
During a recent vMotion-over-VXLAN discussion Chris Saunders made a very good point: “Folks should be asking a better question, like: Can I use VXLAN and vMotion together to meet my business requirements.”
Yeah, it’s always worth exploring the actual business needs.
Based on a true story ...
A while ago I was sitting in a roomful of extremely intelligent engineers working for a large data center company. Unfortunately they had been listening to a wrong group of virtualization consultants and ended up with the picture-perfect disaster-in-waiting: two data centers bridged together to support a stretched VMware HA cluster.
As far as I understand, VXLAN, NVGRE and any tunneling protocol that use global ID in the data plane cannot support PVLAN functionality.
He’s absolutely right, but you shouldn’t try to shoehorn VXLAN into existing deployment models. To understand why that doesn’t make sense, we have to focus on the typical cloud application architectures.
Straight from RFC 6670 (section 3.4):
[...] as is usually the case with communications technologies, simplification in one element of the system introduces an increase (possibly a non-linear one) in complexity elsewhere. This creates the "squashed sausage" effect, where reduction in complexity at one place leads to significant increase in complexity at a remote location.
This is probably the most concise description of the great idea of using long-distance vMotion for “mission-critical” craplications, and applies equally well to the kludges used to compensate the simplicity of virtual switches.
In a recent blog post, Chuck Hollis described how some of EMC customers use long-distance workload mobility. Not surprisingly, he focused on the VPLEX Metro part of the solution and didn’t even mention the earth-flattening requirements this idea imposes on the network. I guess you already know my views on that topic, but regardless of my personal opinions, he got me curious.
You’re probably sick and tired of me writing and talking about networking requirements for VM mobility (large VLAN segments that some people want to extend across the globe), but just in case you have to show someone a brief summary, here’s a video taken from the Data Center Fabric Architectures webinar.
A week ago I was writing about the latency and bandwidth challenges of long-distance vMotion and why it rarely makes sense to use it in disaster avoidance scenarios (for a real disaster avoidance story, read this post ... and note no vMotion was used to get out of harm’s way). The article I wrote for SearchNetworking tackles an idea that is an order of magnitude more ridiculous: using vMotion to migrate virtual machines around the world to bring them close to the users (follow-the-sun workload mobility). I wonder which islands you’d have to use to cross the Pacific in 10ms RTT hops supported by vMotion?
The proponents of inter-DC layer-2 connectivity (required by long-distance vMotion) inevitably cite disaster avoidance (along with buzzword-bingo-winning business agility) as one of the primary requirements after they figure out stretched clusters might not be such a good idea (and there’s no way to explain the dangers of split subnets to some people). When faced with the disaster avoidance “requirement”, ask them to do some basic math first.
If you’re not working for a data center fabric vendor (in which case please read the other today’s post), you’ll probably enjoy the excellent analogy Ethan Banks made after reading my TRILL-over-WAN post:
Think of a network topology like a road map. There's boulevards, major junction points, highways, dead ends, etc. Now imagine what that map looks like after it's been nuked from orbit: flat. Sure, we blew up the world, but you can go in a straight line anywhere you want.
... and don’t forget to be nice to the people asking for inter-DC VM mobility ;)
I’ve already written about the stupidities of risking the stability of two data centers to enable live migration of “mission critical” VMs between them. Now let’s take the discussion a step further – after hearing how critical the VM the server or application team wants to migrate is, you might be tempted to ask “and how do you ensure its high availability the rest of the time?” The response will likely be along the lines of “We’re using VMware High Availability” or even prouder “We’re using VMware Fault Tolerance to ensure even a hardware failure can’t bring it down.”