Getting More Bang for Your VXLAN Bucks

A little while ago I explained why you can’t use more than 4K VXLAN segments on a ToR switch (at least with most ASICs out there). Does that mean that you’re limited to a total of 4K virtual ethernet segments?

Of course not.

You could implement overlay virtual networks in software (on hypervisors or container hosts), although even there the enterprise products rarely give you more than a few thousand logical switches (to use NSX terminology)… but that’s a product, not technology limitation. Large public cloud providers use the same (or similar) technology to run gazillions of tenant segments.

Want to know more? Watch our NSX, AWS and Azure networking webinars.

Alternatively, using the fact that VLAN-to-VNI mappings have local significance you could build larger deployments as long as you never use more than 4K segments per edge switch. You could go a step further and use VLAN mapping on individual edge ports, giving you the ability to (A) make VLAN address space tenant-significant (example: each tenant can use VLAN 100) and (B) quickly lose your sanity.

The locally significant VLAN ID idea works great if you totally automate the deployment and leave the details to a good orchestration system (until you have to troubleshoot the morass you created). Listen to the PacketFabric podcast for more details (and write a comment if you’re aware of other similar implementations).

You could use the switch-significant VLANs trick on every single data center switch out there, and many of them support port-significant VLANs with VLAN mappings. This is how Nolan Leake explained that idea in context of Cumulus Linux:

Most switch vendors essentially force you to think of VLANs as global across an entire network, but Cumulus’s (Linux’s really) traditional bridges don’t.

If you think instead of VLANs as just a 12 bit header that specifies a specific L2 network on a single physical L2 link, you can be more flexible. If you route to the ToR switches, and use them as the VTEP, for each ToR you only need to deal with VXLAN IDs (and the MACs that exist on that network) that actually terminate on that ToR. If a VXLAN ID is not present on that ToR, you ignore it and any MACs associated with it.

Furthermore, VLAN 100 on ToR1’s swp1 doesn’t have to map to the same VXLAN ID as VLAN 100 on ToR2’s swp1. It can get confusing sometimes, but VLAN 100 on swp1 and swp2 on the same ToR aren’t required to bridge to each other, and can each bridge to different VXLAN IDs.

So in practice, a given VLAN is local to a specific switch port on a specific switch, though it is probably easier to think of it as local to specific switch. If you’re using Cumulus ToRs and a Linux based hypervisor/container solution (KVM, Xen, docker, k8s, etc) then you can configure this with the same agent running on the ToR, bridging VXLANs to VLANs on a set of switch ports, and on the hypervisors, bridging VLANs on NICs to VM’s vNICs. If you’re using VMware or HyperV, you need 2 agents, one that knows how to configure Linux, and one that knows how to configure your proprietary hypervisor vswitch.

The upshot of all of this is that any given ToR’s ASIC only needs to know about the MACs for any VXLAN that it currently terminates, which given that there are a limited number of hypervisors you can plug into a single ToR, and a limited number of VMs you can run on each hypervisor, in practice you could only fill up your MAC table if you have only one or a few VXLANs with truly massive numbers of VMs bridged into them on many, many ToRs.

Add comment