Which virtual networking technology should I use?

After I published the Decouple virtual networking from the physical world article, @paulgear1 sent me a very valid tweet: “You seemed a little short on suggestions about the path forward. What should customers do right now?” Apart from the obvious “it depends”, these are the typical use cases (as I understand them today – please feel free to correct me).

Small(er) data centers: If you have a few hundred physical servers and a few thousand virtual machines, every-VLANs-on-every port design seems to work well ... assuming a solid MLAG-based dual snowflake L2 design and no multicast applications. As long as the amount of background broadcast noise (primarily ARP) stays low, you’ll do just fine.

The moment you deviate from a pure (dual) snowflake design, you might get asymmetrical traffic flows, unknown unicast flooding (more so if you have a mismatch between ARP timers and MAC address aging timers), and an overloaded network.

Warning: Implementing a whole data center as a single bridged domain is never a good idea. You should split your infrastructure into several failure domains (aka availability zones) so you don’t lose everything if you have a bad hair day.

Large(r) data centers with more than a few hundred servers in a single broadcast domain will probably need VM-aware networking to reduce the amount of flooded traffic sent to each server. Obviously, it would be better to split the data center into numerous smaller broadcast domains, but that’s a discussion for another blog post.

Reasonably small compute pools with numerous tenants might be well-served by vCDNI. If you have a few hundred physical servers (a reasonable number for a single broadcast domain like vCDNI – which is limited to 350 vSphere hosts due to vDS configuration maximums anyway), but need thousands of virtual networks, vCDNI seems like a reasonable solution ... more so if your physical switches support only a few hundred VLANs.

Compute pools distributed across a data center for redundancy/maintenance reasons are a perfect use case for VXLAN. Instead of implementing large-scale bridged domains with FabricPath, TRILL or 802.1aq, you could build virtual segments across L3 infrastructure.

GRE-based solutions with OpenFlow control plane are a good fit for large-scale operations (public IaaS clouds). They might eventually trickle down to enterprise data centers, but might not be worth the added complexity if you only have a few hundred physical servers.

Everything I write about Open vSwitch (which uses GRE tunnels) and Nicira’s semi-stealth solutions (which add the control plane to OVS) is pure speculation. You want to know more – go talk to them.

Anything else?

All the other solutions I mentioned are not production-ready. The only shipping EVB implementation seems to be in programmed in PowerPoint, and the Q-in-Q/PBB/VPLS solutions, while interesting, require significant amount of integration/orchestration development efforts.

More information

If you’re new to virtualized networking, consider my Introduction to Virtualized Networking webinar. Check out the VMware Networking Deep Dive webinar for in-depth information on networking in VMware’s ecosystem.

You’ll get more details on scalability issues, VXLAN, NVGRE and OpenFlow-based virtual networking solutions in my Cloud Computing Networking – Under the Hood webinar.

5 comments:

  1. What exactly is a dual snowflake L2 design? I understand the MLAG, but I'm a bit thrown by the snowflake part.
  2. I assume "dual snowflake" is a design such as the attached image, which looks like a snowflake when rendered with the "neato" tool (part of graphviz package). Ivan can you confirm?
  3. Exactly. If you'd add aggregation switches between ToR and core switches (for example, using FEX, NX5K, NX7K), and draw six "child" switches per "parent" switch you'd get a perfect snowflake.
  4. I assumed by placement in the "Small(er) data centers" section that you were likely talking about a two-layer design. Depends on your definition of small, I suppose.
  5. ... and on the port density of the switches you use and the desired oversubscription factor.
Add comment
Sidebar