Category: BGP

BGP Route Reflectors in the Forwarding Path

Bela Varkonyi left two intriguing comments on my Leave BGP Next Hops Unchanged on Reflected Routes blog post. Let’s start with:

The original RR design has a lot of limitations. For usual enterprise networks I always suggested to follow the topology with RRs (every interim node is an RR), since this would become the most robust configuration where a link failure would have the less impact.

He’s talking about the extreme case of hierarchical route reflectors, a concept I first encountered when designing a large service provider network. Here’s a simplified conceptual diagram (lines between boxes are physical links as well as IBGP sessions between loopback interfaces):

read more see 1 comments

Leave BGP Next Hops Unchanged on Reflected Routes

Here’s the last question I’ll answer from that long list Daniel Dib posted weeks ago (answer to Q1, answer to Q2).

I am trying to understand what made the BGP designers decide that RR should not change the BGP Next Hop for IBGP-learned routes.

If anyone wants to have the answer to the very last question in Daniel’s list, they’re free to search for “BGP Next Hops” on my blog and start exploring. Studying OSPF Forwarding Address might provide additional clues.
read more see 2 comments

New Webinar: Internet Routing Security

I’m always in a bit of a bind when I get an invitation to speak at a security conference (after all, I know just enough about security to make a fool of myself), but when the organizers of the DEEP Conference invited me to talk about Internet routing security I simply couldn’t resist – the topic is dear and near to my heart, and I planned to do a related webinar for a very long time.

Even better, that conference would have been my first on-site presentation since the COVID-19 craze started, and I love going to Dalmatia (where the conference is taking place). Alas, it was not meant to be – I came down with high fever just days before the conference and had to cancel the talk.

read more add comment

Why Do We Need IBGP Full Mesh?

Here’s another question from the excellent list posted by Daniel Dib on Twitter:

BGP Split Horizon rule says “Don’t advertise IBGP-learned routes to another IBGP peer.” The purpose is to avoid loops because it’s assumed that all of IBGP peers will be on full mesh connectivity. What is the reason the BGP protocol designers made this assumption?

Time for another history lesson. BGP was designed in late 1980s (RFC 1105 was published in 1989) as a replacement for the original Exterior Gateway Protocol (EGP). In those days, the original hub-and-spoke Internet topology with NSFNET core was gradually replaced with a mesh of interconnections, and EGP couldn’t cope with that.

read more see 2 comments

More Arista EOS BGP Route Reflector Woes

Most BGP implementations I’ve worked with split the neighbor BGP configuration into two parts:

  • Global configuration that creates the transport session
  • Address family configuration that activates the address family across a configured transport session and changes the parameters that affect BGP updates

AS numbers, source interfaces, peer IPv4/IPv6 addresses, and passwords clearly belong to the global neighbor configuration.

Starting with EOS release 4.29.0F, you can configure the neighbor next-hop-self option within IPv4 and IPv6 address families. Great job! Hopefully, we can consider this blog post a historical curiosity.
read more see 3 comments

Next Hops of BGP Routes Reflected by Arista EOS

Imagine a suboptimal design in which:

  • A BGP route reflector also servers as an AS edge (PE) router1;
  • You want to use next-hop-self on AS edge routers.

Being exposed to Cisco IOS for decades, I considered that to be a no-brainer. After all, section 10 of RFC 4456 is pretty specific:

In addition, when a RR reflects a route, it SHOULD NOT modify the following path attributes: NEXT_HOP, AS_PATH, LOCAL_PREF, and MED.

Arista EOS is different – a route reflector happily modifies NEXT_HOP on reflected routes (but then, did you notice the “SHOULD NOT” wording?2)

read more see 2 comments

BGP Labeled Unicast Interoperability Challenges

Jeff Tantsura left me tantalizing hint after reading the BGP Labeled Unicast on Cisco IOS blog post:

Read carefully “Relationship between SAFI-4 and SAFI-1 Routes” section in RFC 8277

The start of that section doesn’t look promising (and it gets worse):

It is possible that a BGP speaker will receive both a SAFI-11 route for prefix P and a SAFI-42 route for prefix P. Different implementations treat this situation in different ways.

Now for the details:

read more add comment

Combining BGP and IGP in an Enterprise Network

Syed Khalid Ali left the following question on an old blog post describing the use of IBGP and EBGP in an enterprise network:

From an enterprise customer perspective, should I run iBGP, iBGP+IGP (OSPF/ISIS/EIGRP), or IGP with mutual redistribution on the edge routers? I was hoping you could share some thoughtful insight on when to select one over the other.

We covered many relevant details in the January 2022 Design Clinic; here’s the CliffNotes version. Remember that the road to hell (and broken designs) is paved with great recipes and best practices and that I’m presenting a black-and-white picture because I don’t feel like transcribing our discussion into an oversized blog post. People wrote books on this topic; search for “Russ White books” to find a few.

Finally, there’s no good substitute for understanding how things work (which brings me to another webinar ;).

read more see 3 comments

BGP Labeled Unicast on Cisco IOS

While researching the BGP RFCs for the Three Dimensions of BGP Address Family Nerd Knobs, I figured out that the BGP Labeled Unicast (BGP-LU, advertising MPLS labels together with BGP prefixes) uses a different address family. So far so good.

Now for the intricate bit: a BGP router might negotiate IPv4 and IPv4-LU address families with a neighbor. Does that mean that it’s advertising every IPv4 prefix twice, once without a label, and once with a label? Should that be the case, how are those prefixes originated and how are they stored in the BGP table?

As always, the correct answer is “it depends”, this time on the network operating system implementation. This blog post describes Cisco IOS behavior, a follow-up one will focus on Arista EOS.

read more see 3 comments

Should We Use LISP?

LISP started as yet-another ocean-boiling project focused initially on solving the “we use locators as identifiers” mess (not quite), and providing scalable IPv6 connectivity over IPv4-only transport networks by adding another layer of indirection and thus yet again proving RFC 1925 rule 6a. At least those are the diagrams I remember from the early “look at this wonderful tool” presentations explaining for example how Facebook is using LISP to deploy IPv6 (more details in this presentation).

Somehow that use case failed to gain traction and so the pivots1 started explaining how one can use LISP to solve IP mobility or IP multihoming or live VM migration, or to implement IP version of conversational learning in Cisco SD-Access. After a few years of those pivots, I started dismissing LISP with a short “cache-based forwarding never worked well” counterargument.

read more see 2 comments

Worth Reading: Performance Testing of Commercial BGP Stacks

For whatever reason, most IT vendors attach “you cannot use this for performance testing and/or publish any results” caveat to their licensing agreements, so it’s really hard to get any independent test results that are not vendor-sponsored and thus suitably biased.

Justin Pietsch managed to get a permission to publish test results of Junos container implementation (cRPD) – no surprise there, Junos outperformed all open-source implementations Justin tested in the past.

What about other commercial BGP stacks? Justin did the best he could: he published Testing Commercial BGP Stacks instructions, so you can do the measurements on your own.

add comment

Running BGP between Virtual Machines and Data Center Fabric

Got this question from one of my readers:

When adopting the BGP on the VM model (say, a Kubernetes worker node on top of vSphere or KVM or Openstack), how do you deal with VM migration to another host (same data center, of course) for maintenance purposes? Do you keep peering with the old ToR even after the migration, or do you use some BGP trickery to allow the VM to peer with whatever ToR it’s closest to?

Short answer: you don’t.

Kubernetes was designed in a way that made worker nodes expendable. The Kubernetes cluster (and all properly designed applications) should recover automatically after a worker node restart. From the purely academic perspective, there’s no reason to migrate VMs running Kubernetes.

read more see 2 comments
Sidebar