Circular Dependencies, VMware NSX-T Edition

Thursday, November 25, 2021 07:41 UTC

Circular Dependencies, VMware NSX-T Edition

A friend of mine sent me a link to a lengthy convoluted document describing the 17-step procedure (with the last step having 10 micro-steps) to follow if you want to run NSX manager on top of N-VDS, or as they call it: Deploy a Fully Collapsed vSphere Cluster NSX-T on Hosts Running N-VDS Switches¹.

You might not be familiar with vSphere networking and the way NSX-T uses that (in which case I can highly recommend vSphere and NSX webinars), so here’s a CliffsNotes version of it: you want to put the management component of NSX-T on top of the virtual switch it’s managing, and make it accessible only through that virtual switch. What could possibly go wrong?

Well, even VMware technical marketing didn’t risk NOT describing the biggest caveat:

In a single cluster configuration, management components are hosted on an N-VDS switch as VMs. The N-VDS port to which the management component connects to by default is initialized as a blocked port due to security considerations. If there is a power failure requiring all the four hosts to reboot, the management VM port will be initialized in a blocked state².

To “solve” that, you have to follow the ten substeps of step 17. Those steps involve writing JSON documents in a text editor and executing curl. Yeah, that’s definitely the best they way to configure any product. The last time I’ve seen something like that was the Wellfleet Technician Interface where you had the privilege of configuring the box by writing values into SNMP OIDs.

For those of you who still don’t appreciate how ridiculous the whole idea is: imagine migrating from EIGRP to OSPF, but with the following minor limitations:

There is no out-of-band interface (pretty normal so far).
You cannot run the routing protocols in parallel (WTx?)
You have to type in individual commands (OK).
You cannot do cut-and-paste, and you definitely cannot cheat by copying commands into a file and executing them locally (OMG).
Every command is executed immediately (as usual) and immediately stored into the permanent configuration file (WTx???).
Reloading or power-cycling the box cannot bring the box to a previous state (I’m out of here)
While there is some rollback capability, you have to be able to reach the box to execute it (fun times).

Now don’t get me wrong. If someone has a desperate urge to participate in Red Bull Flying Contest, I’m all for it, but describing the resulting Hero’s Journey as part of official product documentation is a bit over the top. I can only hope there was a ginormous purchase order behind this requirement; bravado for bravado’s sake never made much sense to me.

Careful readers might point out that Nutanix uses a similar trick: a VM exposing an iSCSI or NFS target is running on top of a hypervisor using that same target³. There’s just a slight difference between the two: Nutanix comes as a prepackaged solution, not as a loose collection of hard-to-fit parts and IKEA-like instructions.

Finally, let’s assume VMware does care enough about customers who want to deploy NSX-T on a 4-node cluster. The only sane way to meet that requirement would be to create a prepackaged one-click solution (aka “automate it”, but with proper rollbacks when an error is encountered), not a Rube Goldberg machine. Come on, VMware, we know you CAN do better than that.

Release History

2021-11-25

Footnote: Nutanix does not create circular dependencies (based on the comment by Erik Auerswald).

I saved the documentation in PDF format just in case that masterpiece gets removed ;) ↩︎
Let me translate that into less complex language: you’ll be dealing with a bricked cluster. Congratulations. Oh, and a power failure would never cause all hosts to reboot, would it? ↩︎
Although, as Erik Auerswald explained in his comment, that does not create a circular dependency as the Nutanix VM runs off local storage in the ESXi host, and only then offers NFS/iSCSI target as an additional data store for other VMs. ↩︎

2 comments:

Erik Auerswald 25 November 2021 10:08

There is no circular dependency on storage services in Nutanix, because the CVM providing storage services to the hypervisor is not using those virtual storage services.

The CVM runs from additional dedicated local storage on each hypervisor host, and then provides access to the other storage resources (SSDs and/or HDDs available on each host) to use for other VMs. Each hypervisor host has both the local and the virtual storage mounted.

David Lehner 27 November 2021 01:04

I think with vsphere 7.0 and NSX-T 3.1 ist changed quite a bit. Now you can run NSX-T on a VDS in parallel with legacy DVS port groups and make use of the same uplinks.

Ivan Pepelnjak 28 November 2021 10:05

That is absolutely true, which makes it even more ridiculous to include such a convoluted and now-unnecessary procedure in the official NSX-T 3.1 installation documentation instead of saying "For this setup you need NSX-T 3.1 running on vDS which is available in vSphere release 7. Contact VMware Professional Services if you want to run this on N-VDS."

Stuart Charlton 01 December 2021 11:03

It's a legacy doc for sure, I remember having to do this procedure with NSX-T 2.x and it was painful.

The docs now have a "Note" at the top that say "Alternatively, you can deploy the configuration described in this topic by using vSphere Distributed Switches. With vSphere Distributed Switches configured on hosts, the procedure is simple...".

I suppose the reason this whole thing still needs documenting is the sheer amount of ESXi 6.7 still out there.

Release History

Recent posts in the same categories

NSX

2 comments: