Control Plane Protocols in Overlay Virtual Networks
Multiple overlay network encapsulations are nothing more than a major inconvenience (and religious wars based on individual bit fields close to meaningless) for anyone trying to support more than one overlay virtual networking technology (just ask F5 or Arista).
The key differentiator between scalable and not-so-very-scalable architectures and technologies is the control plane – the mechanism that maps (at the very minimum) remote VM MAC address into a transport network IP address of the target hypervisor (see A Day in a Life of an Overlaid Virtual Packet for more details).
Overlay virtual networking vendors chose a plethora of solutions to solve this problem, ranging from Ethernet-like dynamic MAC address learning to complex protocols like MP-BGP. Here’s an overview of what they’re doing:
The original VXLAN as implemented by Cisco’s Nexus 1000V, VMware’s vCNS release 5.1, Arista EOS, and F5 BIG-IP TMOS release 11.4 has no control plane. It relies on transport network IP multicast to flood BUM traffic and uses Ethernet-like MAC address learning to build mapping between virtual network MAC address and transport network IP addresses.
Unicast VXLAN as implemented in Cisco’s Nexus 1000V release 4.2(1)SV2(2.1) has something that resembles a control plane. VSM distributes segment-to-VTEP mappings to VEMs to replace IP multicast with headend unicast replication, but the VEMs still use dynamic MAC learning.
VXLAN MAC distribution mode in Nexus 1000V is a proper control plane implementation in which the VSM distributes VM-MAC-to-VTEP-IP information to VEMs. Unfortunately it seems to be based on a proprietary protocol, so it won’t work with hardware gateways from Arista or F5.
Hyper-V Network Virtualization uses PowerShell cmdlets to configure VM-MAC-to-transport-IP mappings, virtual network ARP tables and virtual network IP routing tables. The same cmdlets can be implemented by hardware vendors to configure NVGRE gateways.
Nicira NVP (part of VMware NSX) uses OpenFlow to install forwarding entries in the hypervisor switches and Open vSwitch Database Management Protocol to configure the hypervisor switches. NVP uses OpenFlow to implement L2 forwarding and VM NIC reflexive ACLs (L3 forwarding uses another agent in every hypervisor host).
Midokura Midonet doesn’t have a central controller or control-plane protocols. Midonet agents residing in individual hypervisors use shared database to store control- and data-plane state.
Contrail (now Juniper JunosV Contrail) seems to be using MP-BGP to pass MPLS/VPN information between controllers and XMPP to connect hypervisor switches to the controllers.
IBM SDN-VE (SDN for Virtual Environments) uses a hierarchy of controllers and appliances to implement NVP-like control plane for L2 and L3 forwarding using VXLAN encapsulation. I wasn’t able to figure out what protocols they use from their whitepapers and user guides.
Nuage Networks is using $Something and PLUMgrid is using $SomethingElse. I will tell you what the values of these two variables are when I manage to get product documentation from these vendors. PowerPoint and whitepapers clearly get way more attention in the startup world than something an actual user might find useful when deploying the product.
(1) - "MAC distribution works only for static MAC addresses. If dynamic MAC addresses are found on ports that use VXLANs that operate in MAC distribution mode, syslogs are generated to indicate that MAC distribution does not work with dynamic MAC addresses."
(2) - MAC distribution mode is a feature under Unicast-only mode.
(3) - ASA 1KV, vShield, and VXLAN GW are the only three supported gateways. This is oddly interesting since the first two (or any other L3 VM) should be able to be used as a L3 VXLAN gateway. What are your thoughts on this?
(4) For those who only preach VXLAN for scalability above 4k VLANs: "The Cisco Nexus 1000V supports a total of 4096 VLANs or VXLANs (or a maximum of 2048 VLANs or 2048 VXLANs in any combination that totals 4096)."
-Jason (@jedelman8)
(3) The keyword is "supported". You can do anything you like as long as you don't expect TAC to fix your stupidity.
(4) Yeah, I'm well aware of that ... but even I tend to be semi-diplomatic sometimes ;) Anyhow, as NX1KV supports up to 128 hosts, that's a few thousand VMs, so 2000 VXLANs should be enough.
Thanks!
Ivan
I think you very clearly made the point that overlay networks are in their infancy when it comes to actually deploy them at true production scale. There is a huge amount of growing up that needs to happen, which as always will happen with small incremental deployments that will ultimately shape what the (set of?) control plane mechanisms will become the industry or defacto standard for overlay networks. It is a shame however that as an industry we once again promised the world in what overlay networks can solve, but kind of forgot it needs a very robust and scalable control plane to go along with it. Transport is always much easier than control.
@mbushong has written about vendor lock on the Plexxi blog this week, and everyone reading the above should carefully consider the current 'open-ness' of overlay solutions.
Also, some commercially available solutions have impressive scalability: http://networkheresy.com/2013/05/30/scale-sdn-and-network-virtualization/
As for shameful practices of marketing departments ... well, I guess you know my opinion on those, starting with switching versus bridging.
Overlay networks are not new and certainly some large providers have constructed large scale server based overlay solutions to provide VPC like services, but every enterprise customer I talk to (many of them very large and very progressive) does not believe there exist a truly viable solution for them at this time. Part of that is the lack of hw based gateways (very few vendors have that today) and no clear control plane that can manage the multitude of devices, OS's and hypervisors they have in place...
Another one bites the dust..
http://networkheresy.com/2013/08/15/network-virtualization-gets-physical/