Can We Really Use Millions of VXLAN Segments?

One of my readers sent me a question along these lines…

VXLAN Network Identifier is 24 bit long, giving 16 us million separate segments. However, we have to map VNI into VLANs on most switches. How can we scale up to 16 million segments when we have run out of VLAN IDs? Can we create a separate VTEP on the same switch?

VXLAN is just an encapsulation format and does not imply any particular switch architecture. What really matters in this particular case is the implementation of the MAC forwarding table in switching ASIC.

Ignoring for the moment that we have to map VXLAN segments into physical segments (= VLANs) at some point in time, each VXLAN segment is a separate bridging (L2 forwarding) domain, which means that it needs a separate L2 forwarding table.

Multiple forwarding tables could be implemented with multiple data structures - a perfect solution for a software-based forwarding. There might be some scalability snag hiding deep within Linux network stack implementation (like realistic limits on the number of devices), but it SHOULD be possible to create a gazillion Linux bridges and place each one of them in a different VXLAN segment.

For more details on packet forwarding in Linux kernel listen to Packet Forwarding on Linux and Linux Interfaces Software Gone Wild podcasts.

It’s hard to use the same trick in hardware. We could create separate data structures, but they would still sit in the same TCAM, so we need an extra field to differentiate them - instead of looking up a MAC address, the hardware lookup operation would search for a combination of (Table-ID,MAC). The number of bridging domains supported by switching ASIC is thus limited by the size of Table-ID part of the lookup table… and for historic reasons that happens to be 12 bits.

If you have information on recent ASICs that you can share you’re most welcome to write a comment.

Long story short: No matter how many bits we assign to segment ID in the packet header, we’re still limited by how many bits hardware manufacturers allocate to bridging table ID in their ASIC lookup tables. Creating another VTEP won’t help - in the end all bridged packets use the same MAC lookup table.

We covered VXLAN in details in VXLAN and EVPN webinars. I also described how it’s used in VMware NSX.

4 comments:

  1. Dear Joe, Ivan Pepelnjak, VxVLAN are good to expand from the 12 bit IEEE 1-4094 Vlan space to a greater space. VxVlans using MP iBGP and appropriate new VPLS adrresss family are the best tool to perform multiple DCI interconnects going beyond the use of VPLS even VPLS with iBGP autodiscovery and LDP (cisco variant I have used in University of Padova). I tried also to make co-exist VPLS with LDP and BGP autodiscovery with VPLS with MP BGP and MP BGP autodiscovery (Juniper version seen in Alicante in 2011. I am happy that Nicola Modena is working with you at IPSpace.

    Best Regards
    Giuseppe Larosa CCIE SP # 14802
  2. So, to cut it short Giuseppe, are you suggesting not to use commodity hardware to fully exploit the available indexing space of the vxlan vni ? Which is also I guess Ivan's message/indication at the end of the day I reckon.
    Cheers
    Andrea
  3. What is the use case of millions of vxlans?
    Replies
    1. Large multi-tenant environments offering Ethernet-like service. Large private clouds, public clouds, service providers who prefer IP over MPLS (see https://blog.ipspace.net/2017/06/packet-fabric-on-software-gone-wild.html)
Add comment
Sidebar