Let’s Pretend We Run Distributed Storage over a Thick Yellow Cable

One of my friends wanted to design a nice-and-easy layer-3 leaf-and-spine fabric for a new data center, and got blindsided by a hyperconverged vendor. Here’s what he wrote:

We wanted to have a spine/leaf L3 topology for an NSX deployment but can’t do that because the Nutanix servers require L2 between their nodes so they can be in the same cluster.

I wanted to check his claims, but Nutanix doesn’t publish their documentation (I would consider that a red flag), so I’m assuming he’s right until someone proves otherwise (note: whitepaper is not a proof of anything ;).

Update 2017-11-22: VSAN release 6.6 no longer needs IP multicast.

Anyway, VMware VSAN had the same limitations, then relaxed that to IP multicast within the cluster and finally got it right in VSAN release 6.6. Not everyone can upgrade the moment new software releases come out; I happen to know someone who's running NSX (with VXLAN) on top of another layer of VXLAN (on Nexus 9000) just to meet the stupid physical L2 requirements.

Interestingly, at least some comparable open-source solutions work happily without layer-2 connectivity or IP multicast (or you wouldn’t be able to deploy them in AWS).

Speaking of leaf-and-spine fabrics and VXLAN: hundreds of networking engineers watched webinars describing them in details, and you’ll find tons of background information, designs, and even hands-on exercises in the new Designing and Building Data Center Fabrics online course. If you want to know whether hyperconverged infrastructure and distributed storage makes sense, there’s no better source than Howard Marks’ presentation from the Building Next-Generation Data Center online course.

Back to thick yellow cable devotees. My friend couldn’t help but wonder:

The overall question would be: why would hyperconverged manufacturers have to rely on L2 to build clusters…?

Because they don't understand networking (or don’t care) and don’t trust DNS? Because they think autodiscovery with IP multicast or proprietary broadcast-like protocols is better than properly configuring storage cluster?

Their main selling quote is that they are “ahead” of the game with their solution but I only see drawback from a networking standpoint …

Keep in mind that they don't talk to networking people when selling their solution. Once the solution is sold and the networking engineer asks "what were they smoking when they were designing this stuff" and “why didn’t you involve the networking team before making the purchase” (after taming the MacGyver reflex), he's the bad guy hindering progress.


  1. Nutanix has "published" that it can be used over NSX. The option to not need L2 in the underlay ist called "Scenario 2 - NSX for the Nutanix CVM and User VMs" in http://next.nutanix.com/t5/Nutanix-Connect-Blog/VMware-NSX-on-Nutanix-Build-a-Software-Defined-Datacenter/ba-p/7590

    The two scenarios above are published as "validated" by Nutanix at http://next.nutanix.com/t5/Nutanix-Connect-Blog/Nutanix-Validates-Two-Crucial-Deployment-Scenarios-with-VMware/ba-p/7580

    1. I would consider that "just because you could doesn't mean that you should" scenario. In any case, it's just moving the problem by creating another layer of indirection. RFC 1925 has plenty to say about that as does RFC 6670.
    2. "Scenario 2" is not supported by VMware because it places a storage vmkernel adapter on a NSX Logical Switch (VXLAN).
    3. So, I am the guy who talked about this with Ivan and we had that option but VMware would heavily recommend against it .. :)
    4. Nutanix doesnt need storage VMK adapters - There is lots of misinformation / ignorance on what Nutanix needs vs what NSX needs here.
  2. Nutanix also uses IPV6 for cluster discovery on that shared L2 segment.
  3. As of the 6.6 release, vSAN no longer requires multicast https://pubs.vmware.com/Release_Notes/en/vsan/66/vmware-virtual-san-66-release-notes.html
    1. That's what you get for relying on VMware Technical Marketing documents :( Thank you - fixed.
  4. When I built a L3 leaf-spine pod with one subnet per rack, L2 still worked within each rack. Since hyperconverged clusters are unlikely to be larger than one rack, most deployments might not have to worry about these L2/L3 issues.
  5. >The overall question would be: why would hyperconverged manufacturers have to rely on L2 to build clusters…?
    Two weeks ago I was part of a so called "design" session with a major VMware and storage guy of the biggest system house in our country. They told me we need layer 2 connectivity for VMware Vmotion. So, now I bought two Trident 2+ stackable switches for each rack of those 2 datacenters to get VXLAN up and running
  6. Btw vMotion does not require L2 since vSphere 6, if you provide the proper routing config for the vMotion vmkernel interfaces between hosts that need to move VMs around.
  7. Curious to what the use case is for creating a cluster spanning a L3 domain? How big clusters are you planning on building?
Add comment