Response: The Usability of VXLAN

Monday, August 12, 2024 07:50 +0200

Response: The Usability of VXLAN

Wes made an interesting comment to the Migrating a Data Center Fabric to VXLAN blog post:

The benefit of VXLAN is mostly scalability, so if your enterprise network is not scaling… just don’t. The migration path from VLANs is to just keep using VLANs. The (vendor-driven) networking industry has a huge blind spot about this.

Paraphrasing the famous Dinesh Dutt’s Autocon1 remark: I couldn’t disagree with you more.

While it’s true that VXLAN allows you to build scalable layer-2 networks (while remaining limited to 4K VLANs per edge device) and that the network industry is on a VXLAN/EVPN lemming run, I forcefully disagree that the path from VLANs is more VLANs.

The VLAN-only data center fabrics have two major problems:

Unless you want to deploy something like TRILL or SPB (in which case VXLAN is a better option anyway), you have to use STP to detect loops. We learned the hard way how brittle STP is, and I wouldn’t want to use that landmine in a new fabric.
You’re usually forced to use MLAG between leaf and spine switches to avoid the STP limitations (and a peer link between switches in an MLAG cluster).

While MLAG might not have been a big deal more than a decade ago when everyone recommended using LAG everywhere, it’s a huge drawback in a world where most devices attached to the fabric don’t need LAG uplinks (with some vendors actively discouraging LAG). It’s also another ticking bomb waiting to explode; I’ve seen more than one data center meltdown caused by an MLAG bug.

Both limitations disappear if you use VXLAN as the fabric transport mechanism:

While you have to run STP on the fabric edges, you can configure bpdu guard on the edge ports and use it as a canary.
Fabric transport uses IP routing, which means that ECMP load balancing works as soon as equal-cost paths exist.

If you still have devices that require LAG uplinks, concentrate them on a single pair of leaf switches to contain the MLAG abomination.

Please note that I’m not telling you to use EVPN. Small VXLAN-based fabrics do not need a VPN control plane, and I wouldn’t use EVPN-based MLAG in small deployments.

Not surprisingly, I wrote about the same topics in the past:

2 comments:

Jakub 12 August 2024 03:47

"forcefully disagree that the path from VLANs is more VLANs" link is broken

Ivan Pepelnjak 12 August 2024 05:08

Thank you, fixed.

Christoph 12 August 2024 09:56

Well, while the statement from Wes is not correct (obviously...), the answer to the question "To VXLAN or not to VXLAN..." is "It depends...", imho, RFC 1925 does also apply here (in many ways).

In general, from my experience with many enterprise customers, the biggest issue with a VXLAN- or any other fabric is today still the knowledge of the staff that would have to run it, actually & sadly.

With that in mind, if I have to provide maybe 100-200 server ports in a DC network for a medium-sized enterprise, taking all dependencies into account, VLAN-only could be the better solution if it's done right. For anything bigger than that, I'd clearly vote for a VXLAN-based solution (and educating the staff, if necessary), not necessarily because of scale, but because of resiliency.

Ivan Pepelnjak 13 August 2024 06:48

While I completely agree with your "knowledge of the staff" caveat, do keep in mind that you don't need EVPN. All you need to deploy a VXLAN fabric is:

Core IP routing (they should know OSPF anyway)
A VXLAN interface on every edge switch
A mapping from VLAN to VNI for every fabric-wide VLAN
Ingress replication list which can be preset.

I would say that the configuration burden is comparable to what you have to do to get an MLAG-based fabric running (particularly if you're not running LAG with servers).

Even the VXLAN-related show commands are almost identical (at least on Arista EOS) to the VLAN-related ones.

Recent posts in the same categories

VXLAN

data center

fabric

2 comments: