Most BGP implementations I’ve worked with split the neighbor BGP configuration into two parts:
- Global configuration that creates the transport session
- Address family configuration that activates the address family across a configured transport session, and changes the parameters that affect BGP updates
AS numbers, source interfaces, peer IPv4/IPv6 addresses, and passwords clearly belong to the global neighbor configuration.
BGP policies like route maps and prefix lists clearly belong to the address family configuration, but what about things like route reflector clients and next hop processing?
One might argue that these parameters belong to the address family configuration. After all, they affect BGP updates within an address family. One might also argue that it doesn’t make sense to have different route reflector topologies for individual address families. That might have been the argument that caused Arista to implement neighbor route-reflector-client and neighbor next-hop-self commands on the global BGP configuration level. I would have no problem with that if only they were implemented consistently.
As I described in April 2022, Arista EOS takes next-hop-self a bit too literally – that option also changes the next hops on reflected routes. No problem, one can also use bgp route-reflector preserve-attributes command to fix it. The “only” problem: that command does not work on all address families, and there’s no way to fix that.
Here are the relevant parts of the global BGP configuration netlab created when I started testing the leaf-and-spine EVPN topology with Arista EOS:
router bgp 65000 bgp advertise-inactive bgp log-neighbor-changes no bgp default ipv4-unicast no bgp default ipv6-unicast router-id 10.0.0.7 bgp cluster-id 10.0.0.7 bgp route-reflector preserve-attributes ! neighbor 10.0.0.5 remote-as 65000 neighbor 10.0.0.5 description l1 neighbor 10.0.0.5 update-source Loopback0 neighbor 10.0.0.5 next-hop-self neighbor 10.0.0.5 route-reflector-client neighbor 10.0.0.5 send-community standard extended ! neighbor 10.0.0.6 remote-as 65000 neighbor 10.0.0.6 description l2 neighbor 10.0.0.6 update-source Loopback0 neighbor 10.0.0.6 next-hop-self neighbor 10.0.0.6 route-reflector-client neighbor 10.0.0.6 send-community standard extended
So far so good. IPv4 works, next hops are correct. Now for the EVPN part:
router bgp 65000 address-family evpn ! neighbor 10.0.0.5 activate neighbor 10.0.0.6 activate neighbor 10.0.0.8 activate
Looks simple, right? The only problem is it doesn’t work – no routes are reflected between L1 and L2. I tried all sorts of things, and the only way to get EVPN route reflector to work was to remove the neighbor next-hop-self from the IBGP neighbors.
It looks like EVPN BGP AF processing ignores bgp route-reflector preserve-attributes setting, and as the change in the next hop would bring the traffic to the spine switch (which did not have the VXLAN interfaces to handle it), the spine switch decided not to send the updates.
Interestingly, you can configure neighbor next-hop-unchanged within the EVPN address family, but it only applies to EBGP neighbors (you need that when you believe in building EBGP-only data centers). It does not affect the neighbor next-hop-self global setting.
OK, so I gave up and removed all the neighbor next-hop-self commands from the global configuration. All of a sudden, EVPN worked like a charm, but of course the test IBGP+EBGP topology wouldn’t work because the next hops of some routes (EBGP neighbors) wouldn’t be reachable.
In the end, I was forced to use the BGP equivalent of the Swiss Army knife: a route map that sets the next hop (netlab commit), resulting in the following configuration:
route-map next-hop-self-ipv4 permit 10 match route-type external set ip next-hop peer-address ! route-map next-hop-self-ipv4 permit 20 ! address-family ipv4 ! network 10.0.0.7/32 ! neighbor 10.0.0.5 activate neighbor 10.0.0.5 route-map next-hop-self-ipv4 out neighbor 10.0.0.6 activate neighbor 10.0.0.6 route-map next-hop-self-ipv4 out
- Want to reproduce my tests? Install netlab and use Arista cEOS containers.
- Want to learn more about EVPN? There’s probably no better source than EVPN Deep Dive webinar with Dinesh Dutt (the author of BGP in the Data Center), Lukas Krattiger (the author of Building Data Centers with VXLAN BGP EVPN), and Krzysztof Grzegorz Szarkowicz (the author of MPLS in the SDN Era)
- Arista EOS supports per-AF next-hop-self in release 4.29.0F