Optimal L3 Forwarding with VARP and Active/Active VRRP

Wednesday, May 29, 2013 07:14 +0200

Optimal L3 Forwarding with VARP and Active/Active VRRP

I’ve blogged about the need for optimal L3 forwarding across the whole data center in 2012 when I introduced it as one of the interesting requirements in Data Center Fabrics webinar. Years later, the concept became one of the cornerstones of modern EVPN fabrics, but there are still only a few companies that can deliver this functionality in a more traditional environment.

Fabric solutions that appear as a single system to the outside world usually offer optimal L3 forwarding. These solutions include:

Stacking ToR switches and other similar solutions, including HP IRF and Juniper’s Virtual Chassis) definitely fall in this category (note: using stacked switches or virtual chassis architectures with ring-based interconnect in environments with heavy east-west traffic is NOT a good idea);
Other architectures that present the whole fabric as a single layer-3 entity like Juniper’s QFabric (now mostly obsolete).

While optimal L3 forwarding with anycast first-hop gateways became a table stake for EVPN implementations, and most vendors offer active/active first-hop gateways in MLAG clusters, there are only a few companies I’m aware of that can implement anycast gateways across a traditional layer-2 fabric: Arista with Virtual ARP, Cumulus Linux with Virtual Router Redundancy, and Enterasys (now Extreme) with Fabric Routing.

Arista’s Virtual ARP is extremely simple¹ – it’s like VRRP without VRRP. You have to configure the same IP address (first-hop gateway) on a VLAN interface of all ToR switches with ip virtual-router address configuration command and associate a MAC address with the shared IP address with the ip virtual-router mac-address interface configuration command.

The first switch that is hit with an ARP request for the shared virtual IP address will reply with the shared MAC address (I’m not sure about the details – it might well be that the ARP broadcast gets flooded to all switches, in which case the sender gets numerous replies). When a host sends an IP packet to that same shared MAC address, the first ToR switch that the packet hits intercepts the packet (because it’s listening to the shared MAC address), and performs L3 routing.

Things might get nasty if you have configuration mismatches – for example, missing ip virtual-router address configuration on one of the ToR switches. Make sure you use some sort of automation or orchestration system to configure the ToR switches.

Revision History

2023-02-01

Removed mentions of obsolete products/startups.
Added a mention of Cumulus Linux VRR.
Added a link to VARP deep dive blog post

Cumulus Linux Virtual Router Redundancy is functionally equivalent to Arista’s Virtual ARP. ↩︎

Latest blog posts in Anycast Resources series

Recent posts in the same categories

data center

ARP

fabric

14 comments:

Kaj J. Niemi 29 May 2013 08:24

does arista have something similar for ipv6 or is the trend for 2013 still to ignore the elephant in the room?

Replies

Anonymous 29 May 2013 10:41

Arista's VARP works with both IPv4 and IPv6.
(support was added in EOS 4.11.3 if you wanted to look back to where it appeared).

IPv6 is most definitely not a 2nd class citizen on Arista.

Kaj J. Niemi 31 May 2013 14:23

Hi Lincoln,

Thanks for the clarification and apologies for the implied snark ;-)

Ant 29 May 2013 14:10

Arista's VARP sounds like GLBP, Am I wrong with this assumption?

Replies

Anonymous 30 May 2013 01:31

Cisco GLBP uses multiple mac-addresses for multiple gateways to 'share' traffic. It requires a heartbeat protocol and messages between active/standby to handle failures.

Arista VARP uses a single common mac address across all devices (more than 2 are supported) and in fact you can run it at different places in your network (e.g. both leaf and spine). Since every device is 'active' there is no need for any protocol and thus there is also no failover time period.

Ant 30 May 2013 14:55

Thanks for the explanation!

Nico TLS 29 May 2013 21:32

Cisco has something similar in NX-OS. The behavior of HSRP when used together with vPC is changed from the typical HSRP implementation that we are familiar with in Cisco IOS.

In NX-OS vPC and HSRP implementation, both the active and standby HSRP gateways actively forward packets (HSRP virtual MAC of vPC switches are programmed with the G flag on both systems).

This is still limited to a vpC pair of N7k or N5k but Anycast FHRP on Fabricpath should pop up in the next months..

Replies

Anonymous 30 May 2013 01:36

Nico,
Cisco's alternative behavior of run the standby as active for HSRP, VRRP and GLBP in vPC isn't really 'similar'. You still have a protocol, you still have a maximum of 2-way active and you still have scale limitations imposed by the protocol scaling (e.g. see Cisco's published "maximum system scale" numbers for # of FHRP instances.

FabricPath doesn't solve this problem (and nor does FabricPath address the inherent scale issues either with mac-table size on F1/F2 modules on N7K).

Anycast FHRP would be a good thing but then again I think I was talking about that 4 years ago, its still not there?

Anonymous 30 May 2013 01:56

Ivan,
A small comment: you mention "Things might get nasty if you have configuration mismatches – for example, missing ip virtual-router address configuration on one of the ToR switches":

Actually, nothing 'bad' will happen if you did have a configuration mismatch like that. All that would happen is that you'd have more traffic flowing towards wherever the actual virtual-mac-address is that the host last heard a gratitous ARP from. And that may oscillate.
I think (but haven't checked) that VARP even knows about that oscillation and will point it out - its a neat aspect of gratuitous arps being broadcast, every switch will 'see' them.

Anonymous 30 May 2013 05:29

we use Arista with VARP. works amazingly well and we love it. As a note though, it is absolutely critical that the mac-address you put in is unique. In a bit of a novice move we put the same mac-address on multiple pairs of switches. That works fine until you need to bridge a vlan through one pair to the other pair where the SVI is. The first switch that sees the frame sees that it is the mac-address and tries to route it, quite unsuccessfully.

makes 100% sense and was a silly thing to do, but easy mistake for a VARP rookie...

Replies

Anonymous 05 June 2013 22:24

I find the best way to deal with this is to take the MAC address from one of the router interfaces in the VARP group, and turn it into a locally administered mac address. Take the second-least-significant bit in the most significant byte and change it from 0 to 1. Or just do it in your head, I think Arista only has one OUI, 00-1c-73, the local version would be 02-1c-73. Then using the same last 3 bytes from one of the vendor assigned MAC addresses for a router interface in that vlan, you would end up with a MAC that should be unique in your LAN.

Unknown 30 May 2013 17:05

The Plexxi solution is pretty much the same (virtual IP and MAC address shared by all switches), but configuration simplicity is achieved by not having the user configure this on each switch. The virtual IP address is configured per VLAN only, our controller will ensure all appropriate switches will be configured properly. And the shared MAC address is the same for all VLANs.

Anonymous 08 January 2016 15:01

Sorry for the years late thread reply here; (Ivan/Dale) is there any way to get comment on this remark?

"The first switch that is hit with an ARP request for the shared virtual IP address will reply with the shared MAC address (I’m not sure about the details – it might well be that the ARP broadcast gets flooded to all switches, in which case the sender gets numerous replies). "

Just wanting to have an understanding of what should be expected data path wise. Can't be as simple as multiple replies to the ARPing host, can it?

Anonymous 12 January 2016 23:19

Hi nsxtech.net,
The answer will obviously be implementation-dependent, but the short answer is that it could be any number of things and still work just fine.

What is required for ARP to work is that a device answers the ARP request. That 'reply' could eithe be a broadcast response (sort of like what GARP does) or unicast. If its unicast, only the destination receives it.

The "implementation dependent" piece depends on what the initial 'hop' switch does, its a broadcast ARP request, does it eat that broadcast and respond on its own, or does it forward the broadcast and potentially get multiple answers back from many distributed [independent] gateways.
Either is possible, via configuration.
There may be merits in localizing ARP response but nothing bad happens if there are duplicate responses.

Add comment