On ARP and MAC Aging Timers
Naveen Kumar Devaraj mentioned an interesting fact in his EVPN-related comment:
The EOS default ARP timeout is 4 hours, and MAC aging is 5 minutes.
Arista is not the only platform using these default values; did you ever wonder where they came from?
ARP was created in the days when we could support 30 interactive users on a shared computer with 2 MB of memory and a CPU frequency in low MHz. Every bit counted, and it would be a cardinal sin to needlessly send broadcasts that would burn the CPU cycles on every computer attached to the same thick coax cable. The communication patterns were also widely different from what we see today; most workstations communicated only with a few servers.
Setting the ARP timeout to a high value thus made perfect sense – ARP creates a mapping between layers, not a packet-forwarding infrastructure. As IP and MAC addresses are usually somewhat stable1, you can safely use the ARP entry for a long time, and the Gratuitous ARP solves the challenge of occasional changes in MAC or IP addresses. ARP timeouts are more of a garbage collection mechanism.
Interestingly, the Host Requirements RFC (RFC 1122) mentions four mechanisms for ARP cache validation, from the familiar timeout to unicast poll. The RFC was written in 1989, but I don’t remember seeing a unicast ARP until well into the 2000s (or maybe I just wasn’t looking hard enough; in that case, I’d appreciate a comment or two).
MAC aging solves a completely different problem: it keeps the packet-forwarding infrastructure (the MAC address table) reasonably clean. Until EVPN, transparent bridging never had a control plane that could authoritatively tell the switches bridges where a MAC address is. It’s also crucial not to have stale entries in the MAC address table; it’s based on guesswork (dynamic MAC learning), and it’s better to flood a unicast frame than to send it in the wrong direction. On the other hand, a low MAC aging timer increases the amount of flooded unicast traffic, which is clearly undesirable if your bridge connects two 10 Mbps segments2.
The 5-minute MAC aging timer is probably a compromise, and based on when the first bridges appeared3, one could reasonably guess that they chose a value lower than the time it takes to disconnect a workstation, move it to another room, and reconnect it 😜
-
At least until people get overly creative ↩︎
-
The first bridges had two 10-Base-T connections. ↩︎
-
According to this article, we got the transparent bridge idea in “one evening in 1983” with the product (LANBridge 100) shipping in 1986. I also found an IEEE article from 1988. Sadly, the DEBET product code for the LANBridge looks familiar, as do DEMPR and DESPR. I must be getting a bit old. Finally, if you’re into ancient history, I found the MicroVAX 2000 Networking Guide from 1988. ↩︎
Arista EOS had the advantage of being born later, so typically they picked better defaults, but in this particular case IMHO they botched it. Cisco had the same opportunity as Arista with NX-OS and there they picked much more reasonable defaults for the MAC/ARP/ND timeouts:
Enabling IPv4 Unicast AFI/SAFI by default (
bgp default ipv4-unicast) on every BGP peer is another case where Arista fumbled and copied outdated IOS behavior unfortunately.