Brief History of VMware NSX
I spent a lot of time during this summer figuring out the details of NSX-T, resulting in significantly updated and expanded VMware NSX Technical Deep Dive material… but before going into those details let’s do a brief walk down the memory lane ;)
You might remember a startup called Nicira that was acquired by VMware in mid-2012… supposedly resulting in the ever-continuing spat between Cisco and VMware (and maybe even triggering the creation of Cisco ACI).
Nicira’s Network Virtualization Platform ran on KVM and Xen and used their own OpenFlow-based virtual switch (OVS). Not exactly what the mainstream VMware customers were looking for, so someone at VMware decided to go for another “doing more with less” exercise and “leveraged the investments” they made in the past, resulting in NSX for vSphere (NSX-V) launched in 2013. NSX-V was really a conglomerate of:
- Nicira’s controller;
- VMware’s existing ESXi virtual switch (vDS) and VXLAN kernel module;
- Modified vShield Manager GUI/API (now called NSX Manager);
- Open-source software with disabled configuration CLI running in virtual machines and a glaze of GUI/API on top (can’t tell you how much fun it is to configure HAProxy URL rewrite rules through a GUI).
- A few new components like proper in-kernel distributed firewall (instead of the sidecar VM approach they used in vShield that resulted in an ESXi server being limited to 1 Gbps throughput).
VMware also decided to keep supporting the existing customers using NVP (rebranded into NSX for Multiple Hypervisors).
After almost six years, NSX-V is a stable platform used by numerous customers to implement either scalable virtual networks or microsegmentation, and it would be easy to recommend it to an enterprise customer looking for the networking component of a private cloud solution… but of course life couldn’t be as simple as that.
VMware decided to go for Great Unifying Theory and merged NSX-V and NSX-MH, resulting in NSX Transformers (now NSX-T). They did the right thing and rewrote tons of NSX-V components (including a new ESXi virtual switch), offloaded all network services into multi-tenant NSX Edge nodes (you don’t have to run several per-tenant VMs to implement network services any more), and ported most of NSX-V functionality into the new product.
Having a stable shipping product, and a long-term strategy sounds like a great idea, but it costs money to support two parallel products, and eventually most vendors decide to neglect or outright kill the working product in favor of pie-in-the-sky future (while at the same time telling you that would never happen). NSX-V seems to be experiencing the same fate - everyone is talking about NSX-T, there hasn’t been a major release in over 18 months, and while the maintenance releases do add new functionality, it’s mostly polishing and GUI enhancements (with the exception of IP Multicast support added in 6.4.2).
I wouldn’t mind that, and would happily recommend NSX-T for new deployments, but unfortunately there are still a few things missing in NSX-T:
- There’s no real federation capability (you cannot extend a unified control/management plane across two or more NSX-T deployments);
- Active-active multi-site deployment is a joke and works almost as well as stretched data center fabric control plane - when you lose the inter-site link in an active-active setup, one of the sites shuts down.
- Security vendors are telling me that there’s no sidecar service insertion architecture (where you’d run firewalls on the same hypervisor as the virtual machines they’re protecting), although at least Checkpoint is already certified for E-W service insertion functionality (so it might be vendor-specific challenge). Whether we really need that or whether the security vendors lament the ability to sell more licenses is obviously a different story ;)
- NSX-T uses Geneve encapsulation and as of today there are no hardware gateways.
I've probably missed something, in which case please write a comment.
As expected, we’re supposed to hear great news during VMworld 2019, and if they result in a shipping product, I’ll describe the new functionality in the November NSX-T update webinar. Till then, please don’t ask me which version of NSX to use in a new deployment ;)
Now for Data Plane using Geneve is an interesting choice to be made and while the approach has it's own Pros and Cons, I would like to stick to VxLAN if I were to recommend to someone for few good reasons.
Not sure about how many Server NICs can handle Geneve in HW and state of SR-IOV and DPDK in reference to Geneve. Also need to get my head around how Multicast will be handled in control and data plane.
You need Underlay Networks anyways, so from Business perspective until someone has very specific use cases (And not just want SDN ready DC) that NSX delivers in particular, It would be hard to convenience Business to throw extra money. And of course I personally would like to see operational model to get rid of finger pointing that will go on between Underlay provider (Cisco, Arista, Juniper) and Overlay provider (NSX) once something breaks.
How would you correlate underlay vs. overlay stats for visibility, performance mgmt. & troubleshooting ?
Ever tried to benchmark a DCN solution from Operator perspective from CX lens ? The planning team only has to take one time pain, OPS has to operate it for next 5 years or so.
But then Vendors and Industry seems to be solving the wrong problem IMHO. But that's just my opinion :) ... so who cares.
- OSPF (no really .... NSX-T doesn't support OSPF ... How many customers run OSPF in their DC ? :) )
- Hardware VTEP integration (I'd recommend this only for migration purpose... Life isn't all about rainbows and unicorns)
- Advanced Microsegmentation ? (Maybe I am wrong and would have to check on latest versions)
Hardware VTEP integration sounds great... until you figure out that most data centers don't have more than a few gigabits of E-W traffic between virtual and physical world (unless they have a huge baremetal SAP HANA database or something similar), so a VM implementation is more than good enough. I'm not THAT upset about this one.
Advanced microsegmentation? I compared NSX-V and NSX-T microsegmentation and they seemed very similar. What am I missing?
That would be even more complicated for multi site wouldn't be ?
Thanks for this interesting discussion.
Also, if you faithfully drink the Kool-Aid every morning, you already deployed EVPN instead of Fabric Path, VCS Fabric or whatever other now-obsolete technology, and so you already have BGP in your data center ;)
And finally, PLEASE, PLEASE, PLEASE, don't redistribute BGP into OSPF ;))
- should probably have mentioned vCDNI for completeness ;)
- not sure about mention of nginx, since NSX uses (used?) HAProxy
- NSX-T uses at least two different vSwitches, depending on hypervisor
Re: service insertion, from what I saw in Release Notes they've gone the way of NSH with remote appliances, so you don't have to have a copy on each hypervisor.
Regarding use cases for hardware VTEPs, I wrote a couple posts a few years back: https://telecomoccasionally.wordpress.com/2016/05/04/serving-bandwidth-hungry-vms-with-dc-fabrics-and-nsx-for-vsphere/ and https://telecomoccasionally.wordpress.com/2016/04/14/do-i-need-a-hardware-vtep-for-my-nsx-for-vsphere/
As for service insertion, as I wrote above, it's one of the most underdocumented features I've seen in a long while :((