Building network automation solutions

9 module online course

Start now!

Next Hops of BGP Routes Reflected by Arista EOS

Imagine a suboptimal design in which:

  • A BGP route reflector also servers as an AS edge (PE) router1;
  • You want to use next-hop-self on AS edge routers.

Being exposed to Cisco IOS for decades, I considered that to be a no-brainer. After all, section 10 of RFC 4456 is pretty specific:

In addition, when a RR reflects a route, it SHOULD NOT modify the following path attributes: NEXT_HOP, AS_PATH, LOCAL_PREF, and MED.

Arista EOS is different – a route reflector happily modifies NEXT_HOP on reflected routes (but then, did you notice the “SHOULD NOT” wording?2)

Arista EOS has two sets of routing daemons configured as ribd or multi-agent model. This blog post is describing the behavior of multi-agent model.

The behavior is easy to reproduce in a 4-router lab with the following BGP topology:

I configured BGP on RR the way I would have done it on Cisco IOS:

BGP configuration on route reflector
router bgp 65000
   router-id 10.0.0.1
   bgp cluster-id 10.0.0.1
   bgp advertise-inactive
   neighbor 10.0.0.2 remote-as 65000
   neighbor 10.0.0.2 next-hop-self
   neighbor 10.0.0.2 update-source Loopback0
   neighbor 10.0.0.2 description e1
   neighbor 10.0.0.2 route-reflector-client
   neighbor 10.0.0.2 send-community standard extended
   neighbor 10.0.0.3 remote-as 65000
   neighbor 10.0.0.3 next-hop-self
   neighbor 10.0.0.3 update-source Loopback0
   neighbor 10.0.0.3 description e2
   neighbor 10.0.0.3 route-reflector-client
   neighbor 10.0.0.3 send-community standard extended
   neighbor 10.1.0.10 remote-as 65100
   neighbor 10.1.0.10 description x1
   neighbor 10.1.0.10 send-community standard
   !
   address-family ipv4
      neighbor 10.0.0.2 activate
      neighbor 10.0.0.3 activate
      neighbor 10.1.0.10 activate

The only difference I noticed when comparing Arista EOS configuration with Cisco IOS one was the need to specify route-reflector-client and next-hop-self per-neighbor and not within an address family. That might be a good choice: it makes little sense to have some neighbors as RR clients in one address family but not in another one, and having attributes specified per neighbor not per-AF-per-neighbor ensures you’re not making stupid mistakes.

The BGP table on E1 was a shocker: prefix 10.0.0.3/32 (reflected route from E2) has RR as the next hop. The originator-id is set to 10.0.0.3, proving the route was originated by E2, but the next-hop is set to cluster-id (10.0.0.1), proving the next hop was changed by RR when reflecting the route.

BGP table on E1
e1#sh ip bgp | begin Network
          Network                Next Hop              Metric  AIGP       LocPref Weight  Path
 * >      10.0.0.1/32            10.0.0.1              0       -          100     0       i
 * >      10.0.0.2/32            -                     -       -          -       0       i
 * >      10.0.0.3/32            10.0.0.1              0       -          100     0       i Or-ID: 10.0.0.3 C-LST: 10.0.0.1
 * >      10.0.0.4/32            10.0.0.1              0       -          100     0       65100 i

I almost made a perfect mess creating a route map to change next hops on external BGP routes (but not on internal ones) when I noticed the nerd knob I needed to get Arista EOS behavior more in line with the recommendation of RFC 4456: ‌bgp route-reflector preserve-attributes. All of a sudden, the BGP table changed to what I expected to see:

BGP table on E1 after RR reconfiguration
e1#sh ip bgp | begin Network
          Network                Next Hop              Metric  AIGP       LocPref Weight  Path
 * >      10.0.0.1/32            10.0.0.1              0       -          100     0       i
 * >      10.0.0.2/32            -                     -       -          -       0       i
 * >      10.0.0.3/32            10.0.0.3              0       -          100     0       i Or-ID: 10.0.0.3 C-LST: 10.0.0.1
 * >      10.0.0.4/32            10.0.0.1              0       -          100     0       65100 i

Reproducibility Is the Key

You’ll find the lab topology and configuration files on GitHub. The tar archives contain device configurations (initial and fixed) and containerlab configuration needed to set up the lab3.

Alternatively, you can use netlab to set up the lab:

  • Install netlab and your preferred lab environment
  • Copy topology.yml file into an empty directory
  • Execute netlab up

You can specify virtualization provider or default device type with netlab up, making it easy to test the route reflector behavior on a dozen devices supported by netlab.


  1. Because you ran out of budget, or because you forgot you needed a route reflector in your BGP network, and then randomly chose one of the routers to do that. ↩︎

  2. Maybe that should be upgraded to REALLY SHOULD NOT↩︎

  3. Some Assembly Required: you’ll have to install Docker, containerlab and Arista EOS container on a Linux host. ↩︎

2 comments:

  1. Different vendor defaults can be surprising, indeed.

    Many vendors use a default different from the Arista EOS default described above. Some allow to configure similar behavior:

    • On Cisco IOS-XR there is the command ibgp policy out enforce-modifications to get the behavior you described for Arista EOS above.

    • On Cisco IOS the neighbor &lt;IP&gt; internal-vpn-client command enables this for iBGP PE<--->CE connections.

    • Huawei VRP has the configuration command reflect change-path-attribute to enable changing path attribute of reflected routes via policy.

    Replies
    1. To be fair, 'nexthop-self' isn't the default behavior when advertising towards the RR client, if you notice the config: neighbor 10.0.0.2 next-hop-self for the RR client 10.0.0.2. If you don't configure that, it will be nexthop-unchanged, which would be compliant to the 'SHOULD' behavior.

      So in a way, the default behavior difference here is really whether a config would take its face value, or a strict higher layer would always forbid it, by default.

    2. Yeah, you could say that I asked for it ;)

      I definitely found the behavior unexpected, more so as other platforms with very similar syntax behave in a different way. Will reword it a bit (give me a few days).

  2. I've also found that when doing an eBGP Route Server setup across a shared subnet (Third Party Next Hop), Arista changes the next hop to self while Cisco doesn't. To make it worse, adding next-hop-unchanged didn't work (though the command took), you needed to set it via a route-map. Even worse, routes not learned over the shared interconnect were swept up in this, and dropped because:

    RFC-4271 section-5.1.3 Clause 2 of section 5.1.3: 2) When sending a message to an external peer, X, and the peer is one IP hop away from the speaker:

         - Otherwise, if the route being announced was learned from an
           external peer, the speaker can use an IP address of any
           adjacent router (known from the received NEXT_HOP attribute)
           that the speaker itself uses for local route calculation in
           the NEXT_HOP attribute, provided that peer X shares a common
           subnet with this address.  This is a second form of &quot;third
           party&quot; NEXT_HOP attribute.
    
         - Otherwise, if the external peer to which the route is being
           advertised shares a common subnet with one of the interfaces
           of the announcing BGP speaker, the speaker MAY use the IP
           address associated with such an interface in the NEXT_HOP
           attribute.  This is known as a &quot;first party&quot; NEXT_HOP
           attribute.
    

    * - By default (if none of the above conditions apply), the BGP speaker SHOULD use the IP address of the interface that the speaker uses to establish the BGP connection to peer X in the NEXT_HOP attribute.

    sh ip bgp nei x.x.x.x showed: Nexthop invalid for single hop eBGP: 1

    Making the peering eBGP multihop, even though its 1 hop away, allowed the route in, per another part of the same RFC: 3) When sending a message to an external peer X, and the peer is multiple IP hops away from the speaker (aka "multihop EBGP"):

         - The speaker MAY be configured to propagate the NEXT_HOP
           attribute.  In this case, when advertising a route that the
           speaker learned from one of its peers, the NEXT_HOP attribute
           of the advertised route is exactly the same as the NEXT_HOP
           attribute of the learned route (the speaker does not modify
           the NEXT_HOP attribute).
    

    Or , change matching criteria in the NH unchanged prefix list.

Add comment
Sidebar