The Dynamic MAC Learning versus EVPN blog post triggered tons of interesting responses describing edge cases and vendor
bugs implementation details, including an age-old case of silent hosts described by Nitzan:
Few years ago in EVPN network, I saw drops on the multicast queue (ingress replication goes to that queue). After analyzing it we found that the root cause is vMotion (the hosts in that VLAN are silent) which starts at a very high rate before the source leaf learns the destination MAC.
It turns out that the behavior they experienced was caused by a particularly slow EVPN implementation, so it’s not exactly the case of silent hosts, but let’s dig deeper into what could happen when you do have silent hosts attached to an EVPN fabric.
Let’s define silent hosts first. They are nodes that never send any traffic, so the switches cannot learn their MAC addresses and are forced to flood the traffic sent to those MAC addresses. Typical examples would be Syslog servers or traffic monitoring/inspection appliances; we’ll ignore monstrosities like Microsoft NLB for the moment.
Then there are what Someone in his comment called shy hosts – hosts that are completely quiet for a long time, so everyone’s ARP and MAC address caches time out before those hosts start chatting. However, if the communication with those hosts involves the usual initial exchange of ARP and TCP SYN packets, everything should be fine… unless the EVPN control plane takes “forever” to propagate the newly-rediscovered MAC address, in which case all communication to those hosts is flooded until the control plane gets its job done1. That’s obviously a pathological scenario that should result in yelling at the vendor until they get their **** together, and never buying from then again, but we all know that’s not exactly how enterprise IT works.
Back to silent hosts. If’s worth noting that with decent EVPN control plane, vMotion should fix the problem, not make it worse. ESXi servers send RARP packets on behalf of the moved virtual machines after completing vMotion to inform the switches that the VM MAC address has moved. Unfortunately, you could turn off the notify switches option making the vMotion events invisible, but that would result in traffic blackholes2 – the switches would think the VM MAC address is still present on the origin ESXi server – not flooding.
Now for the elephant in the room: whoever is sending the traffic to a silent host must know its MAC address. While one could use static ARP entries, we usually don’t, so the senders must send ARP queries now and then, and the silent hosts must respond to them, enabling the switches to learn all the MAC addresses in the VLAN.
Time to go back to first principles: the only way to solve the silent host challenge is to ensure MAC address entries time out later than ARP entries. That’s easy to do if the traffic is entering the VLAN through a router and a bit more cumbersome if you have to adjust the ARP timeouts on all hosts in the VLAN.
Fortunately, modern TCP/IP stacks use short ARP timeouts – default value on Linux is 30 seconds (randomized into 15-45 seconds), and the kernel removes stale entries (mappings without incoming traffic) every 60 seconds. ARP entries for silent hosts should become stale almost immediately and be refreshed in approximately two minutes.
Switches and routers have a different perspective. Cisco IOS and Arista EOS still age out ARP entries in four hours; Cisco Nexus OS does it in 1500 seconds (25 minutes). No wonder we get flooding in VLANs with silent hosts.
Back to the comment:
The quick and ugly solution was to scan the vMotion VLAN with NMAP every few minutes so the leafs would have all of the MAC addresses in their EVPN database.
And now we know why that works (assuming the Linux host running NMAP is attached to the same VLAN): ARP entries in the Linux kernel would become stale between NMAP runs, triggering ARP requests and responses from silent hosts regardless of whether the silent hosts would answer NMAP probes.
Long story short: the ancient challenge often used in vendor certification written exams did not disappear just because we replaced STP with EVPN. You might get flooded traffic whenever the ARP timeouts in your network are larger than the MAC address table timeouts.
Want to know more about EVPN? Check out the EVPN Technical Deep Dive webinar.
- Rewrote the vMotion-related part of the blog post based on the comment describing the impact of a particularly slow EVPN control plane implementation.
vMotion could easily generate a 10 Gbps TCP stream. Now imagine flooding that across the whole vSphere cluster, and sending numerous copies of every packet over leaf-to-spine uplinks due to ingress replication. Fun times. ↩︎
The same thing would happen if the EVPN control plane takes too long to advertise the MAC move, but the black hole would disappear as soon as EVPN gets its act together, whereas without the notify switches traffic would be blackholed until the moved VM sends its first packet (plus whatever time it takes for everyone to learn the new location of the VM MAC address). ↩︎