In one of my vCloud Director Networking Infrastructure rants I wrote “if they had decided to use IP encapsulation, I would have applauded.” It’s time to applaud: Cisco has just demonstrated Nexus 1000V supporting MAC-over-IP encapsulation for vCloud Director isolated networks at VMworld, solving at least some of the scalability problems MAC-in-MAC encapsulation has.
Nexus 1000V VEM will be able to (once the new release becomes available) encapsulate MAC frames generated by virtual machines residing in isolated segments into UDP packets exchanged between VEMs.
The MAC-in-IP encapsulation seems to be based on the VXLAN draft (released just a few days ago). The VXLAN packet header includes a 24-bit segment ID, allowing you to create 16 million virtual segments. Using pseudo-random source UDP ports (probably hash-generated based on original MAC frame), you can get very good load balancing between the Nexus 1000V VEM and the physical switch using the 5-tuple-based load balancing while still preserving inter-VM packet order.
IP multicast is used to handle layer-2 flooding (broadcasts, multicasts and unknown unicasts). Support for layer-2 flooding allows everyone involved to pretend they’re still dealing with a traditional L2 broadcast domain (and use dynamic MAC learning); not an ideal solution (I would like to see Amazon-like prohibition of flooding with ARP caching) but still much better than what vCDNI offers today. If a VM running in a MAC-over-IP virtual segment goes bonkers, the damage will be limited to the ESX servers hosting VMs in the same virtual segment and the multicast path between them; with MAC-in-MAC encapsulation, the whole data center is affected.
As one would expect from a Nexus-based product, the new Nexus 1000V has a decent range of QoS features, allowing you to define per-tenant SLA. With full support for 802.1p and DSCP markings, you can extend the per-tenant QoS into the physical network, giving the cloud providers the ability to offer differentiated IaaS services.
More good news: the new encapsulation is fully integrated with vCloud Director. Finally we’ll be able to roll out scalable vCloud Director-based networks.
Even more good news: good bye, large-scale bridging and EVB, we don’t need you for VM mobility anymore; we can go back to time-tested large-scale IP+multicast designs that kept the Internet running for the last few decades.
However, all is not rosy in the vCloud land. Cisco has implemented scalable virtual layer 2 segments, but the communication between segments still requires multi-NIC VMs (like vShield Edge) and traverses the userland, the traffic trombones still wind their way around the data center, and you cannot terminate the virtual segments on physical switches or tie them to physical VLANs.
Even with the remaining drawbacks, the MAC-in-IP encapsulation is way better than VLANs or MAC-in-MAC encapsulation we had so far, and I’m positive Cisco will eventually make the next logical steps.
You’ll find in-depth description of VMware networking and (currently shipping) Nexus 1000V in my VMware Networking Deep Dive (recording or live session) webinar. Data center architectures and basics of virtual networking are also described in Data Center 3.0 for Networking Engineers (recording).
All three webinars are available as part of the yearly subscription.