Sturgeon's Law, VRRPv3 Edition
I just wasted several days trying to figure out how to make the dozen (or so) platforms for which we implemented VRRPv3 in netlab work together. This is the first in a series of blog posts describing the ridiculous stuff we discovered during that journey
The idea was pretty simple:
- Create a lab with the tested device and a well-known probe connected to the same subnet.
- Disable VRRP (or interface) on the probe and check IPv4 and IPv6 connectivity through the tested device (verifying it takes over ownership of VRRP MAC and IP addresses).
- Reenable VRRP on the probe and change its VRRP priority several times to check the state transitions through INIT/BACKUP(lower priority)/MASTER(change in priority)/BACKUP(preempting after a change in priority).
When using an Arista EOS VM as the well-known probe, I discovered that it refuses to yield to a preempting attempt from numerous other devices (for example, Cisco IOS or Cisco Nexus OS) for the IPv4 address family1. The same devices preempted EOS for the IPv6 address family.
Having two (or more) VRRP masters on the same segment cannot be good. If nothing else, you might get duplicate MAC address or flapping MAC address messages on adjacent L2 switches, so it was time to figure out what was happening. I used tcpdump to see the VRRP packets and noticed the bad vrrp cksum
diagnosis for the VRRP packets sent by Cisco IOS:
07:56:11.666890 00:00:5e:00:01:d9 (oui IANA) > 01:00:5e:00:00:12 (oui Unknown), ethertype IPv4 (0x0800), length 46: (tos 0xc0, ttl 255, id 1, offset 0, flags [none], proto VRRP (112), length 32)
172.16.0.1 > vrrp.mcast.net: VRRPv3, Advertisement, vrid 217, prio 30, intvl 100cs, length 12, addrs: 172.16.0.42
07:56:11.743555 00:00:5e:00:01:d9 (oui IANA) > 01:00:5e:00:00:12 (oui Unknown), ethertype IPv4 (0x0800), length 60: (tos 0xc0, ttl 255, id 0, offset 0, flags [none], proto VRRP (112), length 32)
172.16.0.2 > vrrp.mcast.net: VRRPv3, Advertisement, vrid 217, prio 20, intvl 100cs, length 12, (bad vrrp cksum d87), addrs: 172.16.0.42
Likewise, Cisco IOS complained that the packets generated by Arista EOS contain invalid checksum:
r2#debug vrrp packet
vrrp packet debugging enabled
r2#
*Jan 22 08:57:27.239: VRRPv4 Ethernet0/1 [217] vrrpv3 chksum D87
*Jan 22 08:57:27.239: VRRPv4 Ethernet0/1 [217] Send V3 Advertisement, Type: 1, Group Id: 217, Priority: 20, Advert interval: 100 csec, Count: 1
*Jan 22 08:57:27.806: VRRP Ethernet0/1 Processing Packet:Invalid checksum Calculated chksum is nonzero (8CA0), Packet chksum is (76E6)
At that moment, I decided it was high time for another journey into the RFC land. VRRPv3 is defined in RFC 5798, which is obsoleted by RFC 9568 from April 2024. The latter RFC contains an interesting change from the RFC 5798:
The checksum calculation in Section 5.2.8 has been clarified to specify precisely what is included and that it does not include the pseudo-header for IPv4.
That section is even more interesting:
- For IPv4 messages, the checksum includes only the VRRP message.
- For IPv6, the checksum includes the pseudo-header.
To recap:
- The wording in RFC 5798 was vague, referring to pseudo-header being included in the checksum calculation.
- There is no pseudo-header in IPv4.
- VRRPv3 implementations assumed either (A) ignore that wording as there is no pseudo-header in IPv4 or (B) create pseudo-header for IPv4 out of thin air (OK, using the rules from IPv6 RFC).
- Arista EOS and FRR seem to be the implementations using the let’s fake the pseudo-header approach, while most others adopted the there is no pseudo-header in IPv4 mentality.
- Most VRRP implementations ignore packets with incorrect checksum, potentially resulting in two VRRP masters on the same segment in multi-platform deployments.
- RFC 5798 was published in 2010, and the first implementations appeared in the early 2010s, yet some vendors or open-source projects seem to have skipped the interoperability tests with other platforms.
- The early implementations2 used the same there is no pseudo-header in IPv4 approach, leaving one to wonder how we arrived at the two interpretations of the RFC (and how the minority interpretation made it into tcpdump).
Finally:
- Dell OS10 takes a particularly creative approach: it switches to the checksum calculation used by the other VRRP device.
Enrique Vallejo and Erik Auerswald added some interesting details in their comments:
- TCP and UDP checksums include pseudo-headers. A naive reading of RFC 5798 would thus result in “VRRP is another transport protocol; let’s do the same thing we did for TCP and UDP.”
- Previous VRRP RFCs were clear: the checksum is calculated only on the VRRP message. An ossified reading of RFC 5798 would be “gee, there’s no pseudo-header in IPv4, that must apply to IPv6 only.” or even “we already have the code for IPv4, let’s reuse it” (we know how well that ends).
- There’s still no excuse for the lack of interoperability testing and two sets of incompatible implementations, considering all early adopters chose the same way to interpret the vague RFC wording.
Fortunately, you can make Arista EOS RFC 9568-compliant with the vrrp ipv4 checksum pseudo-header exclude configuration command, and FRR has no vrrp checksum-with-ipv4-pseudoheader command since late 2022.
However, as is often the case, downstream distros can take a long time to pick up the changes. Cumulus Linux release 5.10 still uses an older FRR version, and the current (as of January 2025) VyOS Vagrant box (v20240817.00.20) has no nerd knob to configure the underlying FRR VRRP process.
Revision History
- 2025-01-23
- Added more nuanced reasons for the bifurcated reading of the RFC 5798
-
I also failed to detect that Arista EOS quickly switched to MASTER state even if the other VRRP router had higher priority because it showed the initial VRRP state as BACKUP. I should have waited a few seconds for the VRRP dust to settle. ↩︎
-
The Junos release 12.2 in May 2012, and the Cisco IOS release 15.3 in early 2013 ↩︎
Very interesting!
It's surprising that the initial VRRPv3 RFC 5798 refers to the IPv6 RFC to define the pseudo-header calculation, leaving the behavior with IPv4 undefined.
If I understand correctly, the original IP specification does not define a pseudo-header, but it is defined in transport protocols (TCP and UDP both include the same exact definition for the pseudo-header fields) in order to calculate a valid checksum that does not depend on varying IPv4 fields such as TTL. Since IPv6 is defined after TCP and UDP, they had to include the pseudo-header calculation mechanism in the RFC for TCP and UDP use.
However, to me it's more surprising that when they identified the VRRPv3 ambiguity and clarified it in RFC 9568, they didn't refer to the pseudo-header definition in RFCs 793/768 (TCP/UDP) but instead preferred to avoid using any field from IPv4. I mean: if the pseudo-header is useful (since it was employed it in the first case), why not using it in IPv4? I suppose that compatibility with existing deployments was more relevant than functionality.
> The original IP specification does not define a pseudo-header, but it is defined in transport protocols
Wow. Thanks a million. From that perspective, the alternate interpretation of the RFC makes sense. Have to reword that paragraph.
> I suppose that compatibility with existing deployments was more relevant than functionality.
All early implementations used a single (interoperable) approach, so one would hope that everyone else would lean the same way, but of course, that never happens. Some people write code in perfect isolation 🤷♂️
From the "what's out there" perspective, it makes sense to define the majority view as the "correct" one, or we could look at the authors of the new RFC and draw whatever conclusions 😜
IPv6 does not have a header checksum, thus only the VRRP checksum can be used to detect errors in the IPv6 header, and only if it includes the "pseudo-header". Since IPv4 has a header checksum, it seems OK to not use a "pseudo-header" for the IPv4 VRRP checksum.
RFC 5798 introduced the ambiguity in the checksum definition, the previous VRRP RFCs 2338 and 3768 clearly stated that the checksum starts with the VRRP message and don't mention any "pseudo-header". Implementation experience with older VRRP versions combined with only IPv6 defining the "pseudo-header" because it lacks a header checksum could be an explanation for early VRRP implementations using the "pseudo-header" only for IPv6.