Interface EBGP Sessions on Arista EOS
Arista EOS and Cisco Nexus OS got interface EBGP sessions years after Cumulus Linux. While they’re trivially easy to configure on FRRouting (the routing daemon used by Cumulus Linux), getting them to work on Arista EOS is a bit tricky.
To make matters worse, my Google-Fu failed me when I tried to find a decent step-by-step configuration guide; all I got was a 12-minute video full of YouTube ads. Let’s fix that.
Terminology Matters
A bit of terminology first (you know that I’m a bit obsessed with that). The functionality described in this blog post is sometimes called unnumbered BGP sessions, which makes no sense. BGP is running over TCP, and there’s nothing unnumbered about TCP.
“Running BGP over unnumbered IPv4 interfaces” is a bit better but still misleading, as BGP runs over IPv6 LLA addresses. What we’re doing is:
- Running BGP TCP session between IPv6 LLA addresses;
- Enabling IPv4 address family on that session;
- Use IPv6 next hops for IPv4 prefixes (more details)
That’s a mouthful, isn’t it? However, as most implementations allow you to configure an interface EBGP neighbor (the router figures out remote IPv6 LLA from ICMP messages), let’s call this thingy interface EBGP sessions.
Back to Arista EOS
To get interface EBGP sessions working on the Arista cEOS release 4.31.2F, you have to:
- Enable IPv6 on relevant interfaces (no surprise there)
interface Ethernet1
ipv6 enable
!
interface Ethernet2
ipv6 enable
- Enable IPv6 routing. Even though you’re not routing IPv6 and have no IPv6 non-LLA addresses on the device, you still have to do it, or the box refuses to look for potential IPv6 LLA EBGP neighbors.1
ipv6 unicast-routing
- You can find several related examples using the ip routing ipv6 interfaces global configuration. I have no idea what it does. The latest Arista EOS documentation fares no better, but I got feedback along the lines of “Use that when you don’t have an IPv4 address on the interface.” Throw it in for good measure if things fail to work 🤷♂️
- Create a BGP peer group. You cannot configure an interface peer without specifying a peer group.
router bgp 65000
router-id 10.0.0.1
no bgp default ipv4-unicast
neighbor ebgp peer group
- Specify interface neighbors, their parent peer group, and their BGP AS:
router bgp 65000
neighbor interface Et1 peer-group ebgp remote-as 65100
neighbor interface Et2 peer-group ebgp remote-as 65101
- Activate the interface neighbor peer group for the IPv4 address family. If you’re not routing IPv6, you don’t have to activate the IPv6 address family on these neighbors.
router bgp 65000
address-family ipv4
neighbor ebgp activate
- Specify that you want to use IPv6 LLA as the next hop for the IPv4 prefixes. Without this command, Arista EOS does not negotiate RFC 8950 next hops for IPv4. The IPv4 BGP prefixes sent over the IPv6 LLA BGP session either get (useless) IPv4 next hops or aren’t advertised at all.
router bgp 65000
address-family ipv4
neighbor ebgp next-hop address-family ipv6 originate
If you did everything right, you’d see IPv4 prefixes with IPv6 next hops in the BGP table:
rtr#show ip bgp
BGP routing table information for VRF default
Router identifier 10.0.0.1, local AS number 65000
Route status codes: s - suppressed contributor, * - valid, > - active, E - ECMP head, e - ECMP
S - Stale, c - Contributing to ECMP, b - backup, L - labeled-unicast
% - Pending best path selection
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI Origin Validation codes: V - valid, I - invalid, U - unknown
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop
Network Next Hop Metric AIGP LocPref Weight Path
* > 10.0.0.1/32 - - - - 0 i
* > 192.168.100.0/24 fe80::a8c1:abff:feb4:ab82%Et1 0 - 100 0 65100 i
* > 192.168.101.0/24 fe80::a8c1:abff:fe18:8133%Et2 0 - 100 0 65101 i
You can practice the above recipe in the EBGP Sessions over IPv6 LLA Interfaces BGP lab.
Revision History
- 2024-03-21
- Added a hint that you might need ip routing ipv6 interfaces configuration command for addressless forwarding over IPv6 LLA interfaces.
-
It could be that enabling IPv6 routing starts IPv6 router advertisements. ↩︎
Hi Ivan, we did a few projects using this method and it works absolutly great and you can use mostly the same configlets in CVP for all switches, that makes it really handy. For the command "ip routing ipv6 interfaces"; I learned it during testing for the first project where we used this and its basically a switch to allow routing through your box with IPv4 over IPv6 interfaces. Without that command, we could send data from a leaf to the spine. But the spine did just nothing, it was a blackhole. Only with that command the forwarding would work and then everything was fine.
Thanks for the feedback. I think I got it to work without the "ip routing ipv6 interfaces", but of course, that was in a virtual lab.
Maybe I didn't make the network big enough (so I had no transit switches), or the Linux TCP/IP stack (cEOS container) accepts IPv4 packets, whereas the forwarding ASIC is not programmed to accept them without that magic nerd knob.
I happened to trip across the Arista TOI that details why this knob exists. Can confirm it's a forwarding ASIC knob that is only relevant for some hardware platforms. Just in case you need a solid reference:
https://www.arista.com/en/support/toi/eos-4-17-0f/13784-ip-addressless-forwarding-changes-for-bgp-v6-nexthop-for-v4-routes
Thanks a million for confirming that!
This is some neat config. Not needing to configure explicit IP addresses on links is a simplification. KISS.
But it strikes me that the entire industry lost out when we didn't do SPB or TRILL. Specifically, I like how Avaya did SPB.
IS-IS as an interior routing protocol can handle 1,000s of routers. We don't need anything more scalable like BGP unless you're AWS/Microsoft/Google/Facebook.
IS-IS doesn’t need addressing because it’s an ISO protocol. As long as the interface can run Ethernet, an adjacency can form. No IPv4 or IPv6 addresses needed, link-local or otherwise.
Keep in mind, all this “Interface EBGP Session” stuff is needed to bootstrap all the other stuff we will need: multi-protocol BGP, adjusting the NLRI in BGP, VXLAN-GPO, loopbacks for the VTEPs, routing protocols to coordinate with the devices in the overlay (e.g., firewalls), etc.
Agghh! Why are we making this so complex? I’m probably preaching to the choir on this forum.
I'm all up for 100% (okay, more like 99.99%) layer-3 eBGP driven networks. SP, DC, Enterprise, Home Lab? Everything 100% (meaning 99%) eBGP Driven networks. Easy to configure, easy to manipulate/traffic engineer with granular route filters with various attributes such as BGP communities, no full-mesh nonsense (or route reflectors) as was/is the case with iBGP. eBGP all the way to the host on DC networks, BGP multipathing included for your network-level load balancing for K8s (or possibly Docker Swarm) clusters etc, augmented with BGP communities to influence paths from host upto the DFZ even. With BGP, you can build unconventional topologies in any shape or form as you see fit. IGPs make the network flat and hence have some limitations.
Here's an example of why IGPs simply don't scale for TE properly: https://anuragbhatia.com/2022/04/networking/isp-column/inefficient-igp-can-make-ebgp-go-wild/
When I say eBGP driven layer 3-only networks, it does not imply that MPLS isn't in use, it doesn't imply that VXLAN/EVPN isn't in use (for DC networking), these “transport” protocols are very much in use, but they are BGP driven, such as BGP signalled VPLS. It also does not imply IGPs aren't in use - they are, but they are limited in functionality to the purpose of only establishing loopback learning/adjacency for adjacent peers in a network segment (like say an MPLS cloud) or path.
BGP, at most basic operational use, is very easy to work with and scales if you need it to.
However, currently, there's not much documentation or blog posts or tutorials on how to design eBGP driven SP networks (which is something I do in production), there is some documentation for DCs, but even that largely assumes a typical Spine/Leaf/Clos topology. I've worked in a DC environment where we took some inspiration from the hypercube network topology concept (and therefore it really wasn't a clos topology) and everything was 100% eBGP, up-to the host, almost everything was interconnected on layer 1 for adjacent devices Spine<>Spine, Leaf<>Leaf etc — It was more like a mix of SP and DC networking.
The basic visual representation of this eBGP approach: Vertical paths = eBGP up/down with private ASN numbering and default routes for egress back up. remove-private-as on the edge routers that talks DFZ. Horizontal paths = IGP + iBGP or IGP/LDPv6 etc as and when required for loopback learning.
So coming to “numbering”, I would probably be okay with “unnumbered” (link-local IPv6) interfaces for establishing adjacency for the horizontal paths. However, for the vertical paths, I'd still use route-able IPv6 GUA addressing to help make my life easier when running a traceroute or troubleshooting etc.
But at the same time, life's easy for numbered IPv6 GUA interfaces if you use something similar to my geographical denomination model for IP(v6) addressing architecture: https://www.daryllswer.com/ipv6-architecture-and-subnetting-guide-for-network-engineers-and-operators/
I'm absolutly behind you on. For the folks working with BGP all day, its nice to have it everywhere. For everybody else (and thats 90% of networking guys and girls) BGP is a hellhole which they don't want to touch. Those people are really left behind with all the fabrics. Even after multiple trainings and workshops, they still forget what is in the underlay or overlay and don't know why you would build stuff like that. And everybody that worked with SPB or Fabricpath will totally agree. I had a few years working with Fabricpath. After understanding how everything worked together, it was so easy. You couldn't do something wrong. Absolutly perfect for a DC. Would have loved to deploy that in campus if the boxes would have supported it. And we know why Cisco dropped it. After many years they officially told us that it was scrapped because they wanted to push ACI. They absolutly knew that Fabricpath was good enough for 90% of enterprises and ACI would never take off.
thank you for this, helped alot. Is there a way we don't have to specify any remote-as ? external keyword is missing in arista OS?
You could use the same AS on all switches of the same tier (might require allowas-in on the remote end) and specify the remote-as on the peer group. I haven't found the remote-as external feature in Arista EOS yet.