Are VXLAN-Based Large Layer-2 Domains Safer?

One of my readers was wondering about the stability and scalability of large layer-2 domains implemented with VXLAN. He wrote:

If common BUM traffic (e.g. ARP) is being handled/localized by the network (e.g. NSX or ACI), and if we are managing what traffic hosts can send with micro-segmentation style filtering blocking broadcast/multicast, are large layer-2 domains still a recipe for disaster?

There are three major (fundamental) problems with large L2 domains:

First two are caused by the Ethernet forwarding semantics and could be solved with enough effort and clever tricks, but then you'd effectively replace bridging with routing based on MAC addresses (effectively reinventing CLNS and using MAC addresses as globally-unique network-layer addresses). That would also break the IEEE 802.1/801.3 forwarding semantics, so why bother.

This is exactly what the original TRILL proposal tried to do before it got bogged down with “we have to support flooding and VLANs and…” garbage.

It's better to stop pretending that we live in the age of long coax cable and start doing proper longest-prefix IPv4/IPv6 routing with L2 domain being limited to where it belongs: adjacent L3 nodes.

The third fundamental problem is a killer. We're designing our networks based on long tradition of subnet-based forwarding: one VLAN = one subnet = one IP summarization entity. If a summarization entity becomes partitioned (fancy language for falls apart), all bets are off, you’re dealing with interesting forwarding blackholes, and quite often you lose all sites involved in the debacle not just one of them.

The claims that you can solve the problem with redundant connectivity just prove that some people don’t understand statistics, and that RFC 1925 Rule 4 is still relevant. As anyone who has enough operational experience knows, it's not the matter of IF but WHEN the brown substance hits the rotating blades.

Having stretched layer-2 domains is very similar to deploying OSPF inter-area summarization with ABRs being far apart - yet again, I'm guessing there are people out there with hands-on experience of how wonderful that idea is.

One could solve the summarization problem by inserting host routes into the wider network, and it might eventually work (see also: Mobile ARP, EVPN, LISP…) but definitely not across stateful network services (see also: stretched firewall clusters) and toward the Internet. Oh, and by now you probably realized we’re about to reinvent CLNS using IP addresses instead of MAC addresses.

Alternatively, one could realize that it doesn't make sense to try to find the solution to world peace in the networking layer (while keeping in mind that that's exactly what the engineers working for networking vendors are paid to do and promote) and spend his time working with business owners to figure out how to get out of the mess we allowed the networking and virtualization vendors to entice us into.

Latest blog posts in Disaster Recovery series


  1. Ivan, in some industries we suffer from a chronic abundance of massive legacy applications (think healthcare). There are a significant number of complex business constraints which prevent or delay addressing critical issues like BC/DR and HA at the application layer. These powerful forces push us to “find the solution to world peace in the networking layer”. I appreciate your point that there are problems with stretched overlay “solutions"/hacks, but in the real world do organizations really have any better choices at this point in time?

    1. Iain,

      Stretching L2 domains within a DC is a bad practice but manageable, doing so between DC's is asking for troubles...
      Keep you vendors under pressure - they should fix their stuff rather you working around it.
      Have boundaries, interworking points rather than stretched flat L2 networks, think L2 or L2oL3(ala VxLAN) -> EVPN between DC's, gives you BUM control, potential localization of failure domains
    2. Not an easy sell when you're dealing with major healthcare application vendors who are still using protocols like SMBv1. "If the mountain will not come to Mohammed, Mohammed will go to the mountain"
  2. with control in picture, vxlan is not much different than l2vpn, I think people never afraid of stretch l2vpn across wan.

    failure domain is when device hit it's bottle neck, usually vender don't tell their limit, and people will have to learn these in a hard way and unfortunately apply the so called experience to wrong place.

    now a days, we have network automation, rich telemetry, and even data analytics, may be vxlan everywhere might just work, we just need think out of box and might learn few automation tools, :)
Add comment