In the VMware vSwitch – the baseline of simplicity post I described simple layer-2 switches offered by most hypervisor vendors and the scalability challenges you face when trying to build large-scale solutions with them. You can solve at least one of the scalability issues pretty easily: VM-aware networking solutions available from most data center networking vendors dynamically adjust the list of VLANs on server-to-switch links.
What’s the problem
Let’s briefly revisit the problem: vSwitches have almost no control plane. A vSwitch thus cannot tell the adjacent physical switches what VLANs it needs to support VMs connected to the vSwitch. Lacking that information, you have to configure a wide range of VLANs on the server-facing ports on the physical switch to allow free movement of VMs between the servers (hypervisor hosts).
The following diagram illustrates the problem:
- Two tenants are running in a vSphere cluster (ESX-A and ESX-B);
- Red tenant is using VLAN 100, Blue tenant is using VLAN 200;
- With the VM distribution shown in the diagram, ESX-A needs access to VLAN 100; ESX-B needs access to VLAN 200.
- As the networking gear cannot predict how the VMs will move between vSphere hosts, you have to configure both VLANs on all server-facing ports (Ge1/0 on SW-1 and Ge2/3 on SW-2) as well as on all inter-switch links.
The list of VLANs configured on the server-facing ports of the access-layer switches thus commonly includes completely unnecessary VLANs.
The wide range of VLANs configured on all server-facing ports causes indiscriminate flooding of broadcasts, multicasts and unknown unicasts to all the servers, even when those packets are not needed by the servers (because the VLAN on which they’re flooded is not active in the server). The flooded packets increase the utilization of the server uplinks; their processing (and dropping) also increases the CPU load.
The standard solution
VSI Discovery Protocol (VDP), part of Edge Virtual Bridging (EVB, 802.1Qbg) would solve that problem, but it’s not implemented in any virtual switch. Consequently, there’s no support in the physical switches, although HP and Force10 keep promising EVB support; HP for more than a year.
The closest we’ve ever got to a shipping EVB-like product is Cisco’s VM-FEX. The Virtual Ethernet Module (VEM) running within the vSphere kernel uses a protocol similar to VDP to communicate its VLAN/interface needs with the UCS manager.
The real-world solutions
Faced with the lack of EVB support (or any other similar control-plane protocol) in the vSwitches, the networking vendors implemented a variety of kludges. Some of them are implemented in the access-layer switches (Arista’s VM Tracer, Force10’s HyperLink, Brocade’s vCenter Integration), others in network management software (Juniper’s Junos Space Virtual Control and ALU’s OmniVista Virtual Machine Monitor).
Have I missed a VM-aware networking solution? Please write a comment ... and note that I haven’t forgotten VM-FEX; it uses a completely different architecture.
In all cases, a VM-aware solution has to discover the network topology first. Almost all solutions send CDP packets from access-layer switches and use CDP listeners in the vSphere hosts to discover host-to-switch connectivity. The CDP information gathered by vSphere hosts is usually extracted from vCenter using VMware’s API (yes, you usually have to talk to the vCenter if you want to communicate with the VMware environment).
talk to someone, they have to talk to me.
Have you noticed I mentioned VMware API in the previous paragraph? Good. Because no hypervisor vendor bothered to implement a standard protocol, the networking vendors have to implement a different solution for each hypervisor. Almost all of the VM-aware solutions support vSphere/vCenter, a few vendors claim they also support Xen, KVM or Hyper-V, and I haven’t seen anyone supporting anything beyond the big four.
After the access-layer topology has been discovered, the VM-aware solutions track VM movements between hypervisor hosts and dynamically adjust the VLAN range on access-layer switch ports. Ideally you’d combine that with MVRP in the network core to further trim the VLANs, but only a few vendors implemented MVRP (and supposedly only a few customers are using it). QFabric is a shining (proprietary) exception: because its architecture mandates single ingress lookup which should result in a list of egress ports, it also performs optimum VLAN flooding.
Have I missed another MVRP-like solution? Please write a comment!
Does it matter?
Without VM-aware networking you have to configure every VM-supporting VLAN on every switch-facing port, reducing the whole data center network to a single broadcast domain (effectively a single VLAN from the scalability perspective). If your data center has just a few large VLANs, you probably don’t care (most hypervisor hosts have to see most of the flooded VLAN traffic anyway); if you have a large number of small VLANs, VM-aware networking makes perfect sense.
Using the rough estimates from the RFC 5556 (section 2.6), implementing VM-aware networking moves us from around 1,000 end-hosts in a single bridged LAN to 100,000 end-hosts inside 1,000 VLANs. While I wouldn’t run 100,000 VMs in a purely bridged environment, the scalability improvements you can gain with VM-aware networking are definitely worth the investment.
You’ll find a lot more information about virtualized networking in my webinars:
- Start with Introduction to Virtualized Networking;
- Learn everything there is to know about VMware’s vSwitch and other VMware-related networking solutions in VMware Networking Deep Dive.
- Cloud networking-specific topics are the focus of Cloud Computing Networking – Under the Hood webinar.
- Generic data center technologies and designs are described in Data Center 3.0 for Networking Engineer.
- You’ll find large-scale bridged network designs (including leaf & spine and Clos network architectures) in the Data Center Fabric Architectures webinar.
And don’t forget: you get access to all these webinars (and numerous others) if you buy the yearly subscription.