Comparing EVPN with Flood-and-Learn Fabrics
One of ipSpace.net subscribers sent me this question after watching the EVPN Technical Deep Dive webinar:
Do you have a writeup that compares and contrasts the hardware resource utilization when one uses flood-and-learn or BGP EVPN in a leaf-and-spine network?
I don’t… so let’s fix that omission. In this blog post we’ll focus on pure layer-2 forwarding (aka bridging), a follow-up blog post will describe the implications of adding EVPN IP functionality.
We’ll compare the traditional bridging (using MLAG) with VXLAN-based bridging using IP multicast in the underlay, head-end replication, or EVPN control plane. We’ll stay at a pretty high level because it’s impossible to get the hardware documentation for switching ASICs, and because the details always depend on the specific ASIC, software vendor’s use of ASIC resources, quality of ASIC vendor SDK… and I’m definitely not touching those cans of worms.
Pete Lumbis and Dinesh Dutt helped me figure out or confirmed some of the details. Thanks a million!
Every bridging implementation needs two data structures:
- Interfaces (links)
- MAC table (or tables)
An implementation might use a single MAC table or different MAC tables for unicast and multicast forwarding1. In any case, MAC tables on edge switches are not changed regardless of what forwarding mechanism you use – the edge switches still have to know the MAC addresses of all nodes attached to configured VLANs.
There’s a major difference in MAC table utilization on core switches:
- In traditional bridging implementation, the core switches must know all MAC addresses of all nodes in the network;
- When using VXLAN, SPB, or TRILL, the core switches know the IP- or MAC addresses of edge switches, but are not aware of edge devices.
Traditional bridging, VXLAN (without EVPN), SPB, or TRILL use dynamic MAC learning, so there’s no control-plane difference between them. EVPN uses BGP to propagate MAC addresses, but only across the network. Local MAC addresses are still gathered with the flood-and-learn mechanism. In any case, learning MAC addresses is a control-plane functionality and does not affect the hardware resources (apart from the CPU load incurred).
Finally, there’s flooding. Every bridging implementation must support flooding. An EVPN implementation could optimize flooding behavior with functionality like proxy ARP or no unknown unicast flooding, but as long as you want to retain transparent bridging semantics to support broken crapware, you have to implement flooding.
Hardware-based multicast is often implemented with a bitmask of outgoing interfaces attached to a multicast MAC address. In traditional bridging and other methods that use underlay packet replication (SPB, TRILL, VXLAN with IP multicast), there’s a single uplink interface per VLAN, and there’s absolutely no difference between the uplink interface being a physical uplink (STP), port channel (STP+MLAG) or VXLAN tunnel with a multicast destination address.
Head-end (ingress node) replication uses more hardware resources. VTEP tunnels are usually implemented with virtual interfaces, and every multicast entry (remember: broadcast is just a particular kind of multicast) contains numerous outgoing interfaces (VTEP tunnels)… but still: all you need for ingress replication (as opposed to traditional bridging) is a longer bitmask. Every decent data center switching ASIC supports at least 128 remote VTEPs per VNI – problem solved unless you’re building a huge fabric.
Long story short: there’s no significant difference in hardware resource utilization on edge switches. Bridging is bridging, no matter how it’s implemented.
For more bridging-versus-routing details, watch the Switching, Routing and Bridging part of How Networks Really Work webinar.
From the packet forwarding perspective, broadcast is just a very specific case of multicast. Unknown unicast flooding is a specific case of broadcast. ↩︎
As a small addition, traditional proxy ARP does not stop propagation of ARP broadcast in the layer 2 domain. Thus many vendors use the term ARP suppression when ARP broadcasts are not only answered by an edge switch (the first hop router, usually the ToR), but the request is suppressed instead of flooded.
[Yes, this does interfere with devices expecting to glean information from ARP packets which usually go everywhere in the layer 2 domain.]
Traditional proxy ARP replies to the ARP request (without propagating it any further), admittedly with the MAC address of the router, not the remote endpoint's MAC address, so I'm guessing ARP suppression truly is a better term to use.
Yes, providing the correct end-system MAC address instead of the gateway MAC address is a difference to traditional proxy ARP.
Because the device implementing ARP suppression knows the answer to the question, the question is not needed to be sent to anyone else. The integrated routing and bridging (IRB) in VXLAN/EVPN fabrics allows to suppress the ARP request flooding of transparent bridging, so this is often done.
Traditionally, proxy ARP was used on a router connected to a yellow cable (or an emulation thereof). The ARP request reached every system connected to the (emulated) cable. The router sent an answer, using its MAC address. Of course the router did not propagate the ARP request any further, but the transparent bridging did.
One thing worth nothing would be evpn multi homing Vs mclag with virtual vtep implementations. I can see here how this would use asic differently.
Evpn multi homing is an overlay multi path, since a Mac address with 2 or more "uplinks" it's basically layer 2 loop, you need to use multiple tables and additional ASICS resources.
In the case of multi homing, you have :
Mac Address - - > ESI - - > multiple vteps - - > multiple underlay next hops/paths
In the case of mclag with virtual vtep you have:
Mac address - - > single vtep - - > underlay ecmp