Brief History of VMware NSX

I spent a lot of time during this summer figuring out the details of NSX-T, resulting in significantly updated and expanded VMware NSX Technical Deep Dive material… but before going into those details let’s do a brief walk down the memory lane ;)

We’re running an NSX Deep Dive workshop in Zurich in early September, followed by NSX-T update webinar in mid-November.

You might remember a startup called Nicira that was acquired by VMware in mid-2012… supposedly resulting in the ever-continuing spat between Cisco and VMware (and maybe even triggering the creation of Cisco ACI).

Nicira’s Network Virtualization Platform ran on KVM and Xen and used their own OpenFlow-based virtual switch (OVS). Not exactly what the mainstream VMware customers were looking for, so someone at VMware decided to go for another “doing more with less” exercise and “leveraged the investments” they made in the past, resulting in NSX for vSphere (NSX-V) launched in 2013. NSX-V was really a conglomerate of:

  • Nicira’s controller;
  • VMware’s existing ESXi virtual switch (vDS) and VXLAN kernel module;
  • Modified vShield Manager GUI/API (now called NSX Manager);
  • Open-source software with disabled configuration CLI running in virtual machines and a glaze of GUI/API on top (can’t tell you how much fun it is to configure HAProxy URL rewrite rules through a GUI).
  • A few new components like proper in-kernel distributed firewall (instead of the sidecar VM approach they used in vShield that resulted in an ESXi server being limited to 1 Gbps throughput).

VMware also decided to keep supporting the existing customers using NVP (rebranded into NSX for Multiple Hypervisors).

If you want to know more about what VMware NSX did in 2013, watch the VMware NSX Architecture Overview webinar.

After almost six years, NSX-V is a stable platform used by numerous customers to implement either scalable virtual networks or microsegmentation, and it would be easy to recommend it to an enterprise customer looking for the networking component of a private cloud solution… but of course life couldn’t be as simple as that.

VMware decided to go for Great Unifying Theory and merged NSX-V and NSX-MH, resulting in NSX Transformers (now NSX-T). They did the right thing and rewrote tons of NSX-V components (including a new ESXi virtual switch), offloaded all network services into multi-tenant NSX Edge nodes (you don’t have to run several per-tenant VMs to implement network services any more), and ported most of NSX-V functionality into the new product.

Having a stable shipping product, and a long-term strategy sounds like a great idea, but it costs money to support two parallel products, and eventually most vendors decide to neglect or outright kill the working product in favor of pie-in-the-sky future (while at the same time telling you that would never happen). NSX-V seems to be experiencing the same fate - everyone is talking about NSX-T, there hasn’t been a major release in over 18 months, and while the maintenance releases do add new functionality, it’s mostly polishing and GUI enhancements (with the exception of IP Multicast support added in 6.4.2).

I wouldn’t mind that, and would happily recommend NSX-T for new deployments, but unfortunately there are still a few things missing in NSX-T:

  • There’s no real federation capability (you cannot extend a unified control/management plane across two or more NSX-T deployments);
  • Active-active multi-site deployment is a joke and works almost as well as stretched data center fabric control plane - when you lose the inter-site link in an active-active setup, one of the sites shuts down.
  • Security vendors are telling me that there’s no sidecar service insertion architecture (where you’d run firewalls on the same hypervisor as the virtual machines they’re protecting), although at least Checkpoint is already certified for E-W service insertion functionality (so it might be vendor-specific challenge). Whether we really need that or whether the security vendors lament the ability to sell more licenses is obviously a different story ;)
  • NSX-T uses Geneve encapsulation and as of today there are no hardware gateways.

I've probably missed something, in which case please write a comment.

As expected, we’re supposed to hear great news during VMworld 2019, and if they result in a shipping product, I’ll describe the new functionality in the November NSX-T update webinar. Till then, please don’t ask me which version of NSX to use in a new deployment ;)

11 comments:

  1. A close friend of mine (VCIX) suggested that things are not necessarily too bad as long as everything is functional, but once something break there is no other way to fix it but to get VMware TAC on call. The troubleshooting commands they run to find the issue is even beyond VCIX skills. For example how would you troubleshoot a BGP peering issue in NSX. Getting the compute sizing right is another challenge with NSX I hear commonly from few friends those deal with NSX in real life.



    Now for Data Plane using Geneve is an interesting choice to be made and while the approach has it's own Pros and Cons, I would like to stick to VxLAN if I were to recommend to someone for few good reasons.



    Not sure about how many Server NICs can handle Geneve in HW and state of SR-IOV and DPDK in reference to Geneve. Also need to get my head around how Multicast will be handled in control and data plane.



    You need Underlay Networks anyways, so from Business perspective until someone has very specific use cases (And not just want SDN ready DC) that NSX delivers in particular, It would be hard to convenience Business to throw extra money. And of course I personally would like to see operational model to get rid of finger pointing that will go on between Underlay provider (Cisco, Arista, Juniper) and Overlay provider (NSX) once something breaks.



    How would you correlate underlay vs. overlay stats for visibility, performance mgmt. & troubleshooting ?



    Ever tried to benchmark a DCN solution from Operator perspective from CX lens ? The planning team only has to take one time pain, OPS has to operate it for next 5 years or so.



    But then Vendors and Industry seems to be solving the wrong problem IMHO. But that's just my opinion :) ... so who cares.

    HTH...
    Evil CCIE
    Replies
    1. I know people running (and troubleshooting) NSX deployments, so I don't think it's THAT hard... at least on the NSX-V side, no idea how convoluted NSX-T is. It is true, however, that those people have decades of networking experience ;)) More about other excellent points you raised in a separate blog post.
    2. Why dont you run routing protocols over a NSX Controller and the edge device. It is easy to deploy and design. U figure out the sizing of the network and take a cal based on that.
  2. You have been able to use service insertion since NSX-T 2.3. In my opinion, it is the vendors that have been slow to deploy images that can utilize it and not the fact that the product does not support it.
    Replies
    1. I have been pointed to VMware HCL ;) where it's clear that Checkpoint is certified for E-W service insertion. However, looking at Checkpoint blog posts, it seems the firewall(s) run in a service cluster not on the hypervisor, so I obviously need to spend some more trying to figure the details out... Thank you!
  3. Here is what is also missing from my experience :
    - OSPF (no really .... NSX-T doesn't support OSPF ... How many customers run OSPF in their DC ? :) )
    - Hardware VTEP integration (I'd recommend this only for migration purpose... Life isn't all about rainbows and unicorns)
    - Advanced Microsegmentation ? (Maybe I am wrong and would have to check on latest versions)
    Replies
    1. As I wrote a long time ago, I would never run OSPF between a VM with unknown software on it and my ToR switch. OSPF is a single failure domain, and a single bug in the VM software could impact your whole data center. At least I can filter the stuff NSX is sending me with BGP (and looks like I'm not the only one thinking along these lines based on what they implemented in NSX-T).

      Hardware VTEP integration sounds great... until you figure out that most data centers don't have more than a few gigabits of E-W traffic between virtual and physical world (unless they have a huge baremetal SAP HANA database or something similar), so a VM implementation is more than good enough. I'm not THAT upset about this one.

      Advanced microsegmentation? I compared NSX-V and NSX-T microsegmentation and they seemed very similar. What am I missing?
    2. Totally agree with you that with BGP you can treat NSX-T as a different entity and control what you receive (and relay) from it. But that would introduce some complexity for the network team if they had to run OSPF internally in the DC then BGP just for the NSX-T Fabric and then implement redistribution.

      That would be even more complicated for multi site wouldn't be ?

      Thanks for this interesting discussion.

      Nic
    3. Well, you'd actually get to the proper network design where OSPF takes care of transport fabric reachability (and fast convergence) and BGP takes care of endpoint reachability, security of routing information exchange, and routing policies.

      Also, if you faithfully drink the Kool-Aid every morning, you already deployed EVPN instead of Fabric Path, VCS Fabric or whatever other now-obsolete technology, and so you already have BGP in your data center ;)

      And finally, PLEASE, PLEASE, PLEASE, don't redistribute BGP into OSPF ;))
  4. A few minor nits:

    - should probably have mentioned vCDNI for completeness ;)
    - not sure about mention of nginx, since NSX uses (used?) HAProxy
    - NSX-T uses at least two different vSwitches, depending on hypervisor

    Re: service insertion, from what I saw in Release Notes they've gone the way of NSH with remote appliances, so you don't have to have a copy on each hypervisor.

    Regarding use cases for hardware VTEPs, I wrote a couple posts a few years back: https://telecomoccasionally.wordpress.com/2016/05/04/serving-bandwidth-hungry-vms-with-dc-fabrics-and-nsx-for-vsphere/ and https://telecomoccasionally.wordpress.com/2016/04/14/do-i-need-a-hardware-vtep-for-my-nsx-for-vsphere/
    Replies
    1. Fixed the HAProxy reference. Thank you!

      As for service insertion, as I wrote above, it's one of the most underdocumented features I've seen in a long while :((
Add comment
Sidebar