Live vMotion into VMware-on-AWS Cloud

Thursday, February 20, 2020 07:35 +0100

Live vMotion into VMware-on-AWS Cloud

Considering VMware’s enrapturement with vMotion the following news (reported by Salman Naqvi in a comment to my blog post) was clearly inevitable:

I was surprised to learn that LIVE vMotion is supported between on-premise and Vmware on AWS Cloud

What’s more interesting is how did they manage to do it?

A few basics first:

VMware-on-AWS has little to do with AWS or public cloud. It’s a vSphere/NSX-T/VSAN cluster managed by VMware and running on AWS bare-metal servers. The only difference between VMware-on-AWS and a vSphere cluster running in any collocation facility is that they had to cope with the fact that AWS doesn’t care about layer-2 tricks.
VMware-on-AWS runs NSX-T, so the question really becomes “how do you do a vMotion between two NSX-T instances or between a traditional vSphere cluster and an NSX-T instance?”

Since VMware introduced cross-vCenter vMotion, it’s easy to migrate a running virtual machine across multiple vCenter deployments… all you need is a matching port group name on both ends, and end-to-end layer-2 connectivity if you expect the virtual machine to keep communicating with its environment.

The “magic” of live vMotion to VMware-on-AWS thus boils down to “how do I get layer-2 connectivity into AWS?”. Well, you don’t - AWS is not stupid enough to allow you to bring your flooding challenges into their environment… but there’s always the duct-tape-of-networking.

NSX-T includes L2VPN - bridging across an IPsec-protected GRE tunnel. Hooray, problem solved. You might experience MTU issues I was told (and that’s probably why they want you to have a DirectConnect link into AWS), but who cares about such minor inconveniences when you can do what everyone always wanted: move a running server into AWS.

You probably know my opinion on the feasibility of long-distance vMotion, long-distance bridging and L2 DCI, and the latest VMware’s trick didn’t change it a bit. If anything, the L2 DCI features they’re offering in NSX-T release 2.5 are worse than what Cisco had in OTV a decade ago

I’m not saying that you should do L2 DCI or endorsing OTV, but if you absolutely want to do crazy things, at least do them right.

Long story short: the only advice I can give you regarding this marketing gimmick is what James Mickens told people enchanted with the promise of ML/AL:

In three words: Think before deploying.
In two words: Think first.
In one word: Don’t.

On the other hand, if you want to know how stuff works behind the scenes check out:

Latest blog posts in High Availability in Private and Public Clouds series

Recent posts in the same categories

vMotion

cloud

AWS

15 comments:

Piotr Jablonski 20 February 2020 15:32

Ivan,

Just a few comments:
1. "VMware-on-AWS has little to do with AWS or public cloud". What about the native AWS dedicated host/instance or EC2/IaaS offering? VMC on AWS can be treated like these services. You can say that native services do not involve typical L2 mechanisms like in NSX. Yes, but this is not the primary public cloud factor. There is an AWS orchestrator integration with VMC. In the case of a failure, you can have a spare physical host in 10 minutes. You can connect from VMC to a growing list of AWS services natively without L2 or the Internet access.

2. “how do you do a vMotion between two NSX-T instances or between a traditional vSphere cluster and an NSX-T instance?” You can use HCX not only between NSX-v and NSX-T, but also from vSphere, KVM, Hyper-V sites. More info:
https://cloud.vmware.com/vmware-hcx/faq#technical-information

3. "If anything, the L2 DCI features they’re offering in NSX-T release 2.5 are worse than what Cisco had in OTV a decade ago". I deployed both NSX and OTV, and let me disagree. OTV was a real step forward in Data Center Architectures, but the control-plane and data-plane capabilities of NSX-T regarding L2 features even L2 DCI are prevailing. First of all, NSX can better protect against L2 loops as VXLAN tunnels are hosted on servers. This minimizes the risk or unexpected bridging of backend VLAN which can cause a total failure in OTV. BUM is better contained in NSX. With NSX your underlay xSTP instance can be limited more than in OTV. Tunnels are better distributed/load balanced in NSX because in OTV tunnels are aggregated on a pair of edge platforms. It is a way easier to do multihoming in NSX. In A/A scenario, no need to manually filter HSRP VMAC addresses.

Don't treat my comments as a person who have to defend VMware. I can also see gaps and threats related to NSX-T. There is a space for improvement as everywhere but don't you think that what you said is an exaggeration? ;)

All the best!

Replies

Ivan Pepelnjak 20 February 2020 15:37

Thanks for the comments. Much appreciated (as always). The one thing I'd love to hear your take on is "active-active NSX multihoming".

I probably missed something, but I can't see how to make local egress work in NSX-T 2.5

Piotr Jablonski 20 February 2020 15:58

There is no local egress as of now in NSX-T 2.5. You can do an active-active (A/A) scenario (per DC, not per VLAN) with two sets of VLANs and active-standby + standby-active way. You can then say that in OTV we have a local egress. That's true. But I don't perceive this feature as a critical one. Why? Because local egress requires also local ingress which is much more difficult to do. If we don't have a local ingress by using NAT/DNS/LISP/host route injection then we end up with a requirement for a stretched FW cluster at the edge which is usually part of a DC architecture. Then we will have a complex fate sharing solution.

So in terms of A/A I would say that NSX-T and OTV are equal. In OTV the traffic don't have to travel across DCI (which may be an advantage) but local egress requires local ingress (which may be a disadvantage). In any case, in OTV L2 must be stretched anyway even in the A/A scenario and fate sharing still exists in the context of BUM traffic.

In terms of NSX-T, I would advise to have workloads active per VLAN in one site as much as possible. In the case of a disaster of maintenance move all related VLAN workloads to the other site.

Piotr Jablonski 20 February 2020 16:01

*disaster or maintenance

Ivan Pepelnjak 20 February 2020 16:04

Nice to see we agree on the technical details. Now imagine a VM moved into VMware-Cloud-on-AWS. How will the traffic from that VM reach the clients?

Piotr Jablonski 20 February 2020 16:46

Every solution has its purpose. For example, VMC may be useful as a cold/warm DR site or in reverse VMC can be a primary site and a DC as a DR site. VMC can be used as on-demand and scalable resources for dev/stage environments.

If there is a layered app where some components must be on-prem and in VMC then workloads from a specific subnet or layer should be contained either on-prem or in VMC. For example, a front-end in VMC, backend on-prem, DB master on-prem with replication to VMC or AWS native service. Stretching workload belonging to the same traffic group and a subnet should be avoided as much as possible. That's the architectural advice which can be taken regardless of NSX-T in any hybrid-cloud scenario with Istio, Google Anthos, etc.

Going to your question, how clients reach this VM in VMC?
If this is a front-end VM then through DNS and the Internet as other front-end workloads.
If this is a backend VM then through a routing between VMC and DC as other backend-end workloads.
If this is a front-end VM which is moved out from the rest front-end workloads, then "Huston, we have a problem". We can resolve it in a less or more complex way. IMHO, this is not an issue with NSX-T but with the architecture and some admin habits. :)

Ivan Pepelnjak 20 February 2020 16:49

Yet again we're in agreement, but the "Houston, we have a problem" scenario is exactly what happens when you stretch a VLAN and vMotion a VM from on-premises vSphere cluster into VMC... and as you wrote it's an architectural problem. QED ;)

Piotr Jablonski 20 February 2020 16:50

The question was for egress, not ingress, so similarly here. With contained workloads, the traffic is going outside to the clients via the local exit. Like we discussed this hygiene option for A/A data centers. If one workload is moved out of the group to the other site, then again we have a L2 DCI dependency.

Ivan Pepelnjak 20 February 2020 16:53

... and if we don't "move one workload out of the group" then why do we need live vMotion in the first place? Wouldn't it be simpler to shut down VMs and restart them on the other end?

Piotr Jablonski 20 February 2020 16:56

Yes the architectural problem of managing the whole stack, not the NSX-T architecture. Agree, the fact that a car can float in the sea for a while doesn't mean we should use it as a boat.

I perceive this L2 stretch option useful for temporary, migration, maintenance, fast scale out option. Maybe for some tests, maybe staging, risky for production.

Ivan Pepelnjak 20 February 2020 16:58

"I perceive this L2 stretch option useful for temporary, migration, maintenance" << I could agree to that. Scale-out into the cloud is a myth.

"Maybe for some tests, maybe staging, risky for production." << and how do you think it will be marketed, sold and used?

Piotr Jablonski 20 February 2020 17:05

"why do we need live vMotion in the first place? Wouldn't it be simpler to shut down VMs and restart them on the other end?"

If it takes time to move all workloads, the client doesn't know what are the layers, what are the dependencies, then - WITH A CARE - he can use live vMotion. Then in a maintenance window he need to reconfigure routing/DNS instead of spending time for copying data. Eventually, what I would do, is to copy disks or do incremental snapshots and then cold start VM instances on the other side as you suggest. But there is also a risk that my disk data is not so accurate with the latest on-prem data. We need more time to stop workloads and copy the latest version. So the focus may be required in this part -> storage replication. With live vMotion all data from disk and memory is transferred.

Piotr Jablonski 20 February 2020 17:18

"Scale-out into the cloud is a myth."

What about a use case for development/staging where the company want to test a new app on 10 servers and they have 2 on-prem and they don't need to wait for new hardware? They can run 8 servers or more in the cloud. For a production use case, if workloads are contained, then scaling-out a particular app layer is a viable option. Do you think a VPN/interconnect/DCI kills benefits of the scale-out?

"and how do you think it will be marketed, sold and used?"

It is a duty of a pre-sales, consultant, vendor representative to inform about a risk. So this can be sold as an added customer value. :)

chandrasekaran 25 January 2021 09:14

Hi Guys , Has this issue been solved in NSX 3.0 ? I mean is it possible to do a true Active / Active setup in version 3.0 and if so can i have a link or url which explains this setup ?

Ivan Pepelnjak 25 January 2021 12:43

@chandrasekaran: It's always possible to have true active/active setup with good application architecture and decent network design. Will you get there with silver bullets? Probably not... but studying this https://my.ipspace.net/bin/list?id=AADesign (including Additional Resources) might bring you a bit closer to that goal.

On a more serious note, NSX-T 3.0 adds NSX Federation, but that's currently not supported on vSphere-on-AWS.

Add comment