Almost a decade ago I described a scenario in which a perfectly valid IBGP topology could result in a permanent routing loop. While one wouldn’t expect to see such a scenario in a well designed network, it’s been known for ages1 that using BGP route reflectors could result in suboptimal forwarding.
Here’s a simple description of how that could happen:
- Multiple edge routers advertise the same prefix (IPv4, IPv6, or VPNv4).
- BGP route reflector (RR) receives all alternate BGP paths to that prefix, and selects one of them as the best one. When the BGP paths are too similar, it uses IGP cost to the BGP next hop2 as the tie breaker.
- The best path selected by the BGP RR is advertised to its clients.
The challenge: RR clients might be better served using a different prefix (due to a different position in IGP topology), or could use multiple prefixes with identical IGP cost for IBGP multipathing.
We had a solution to that challenge for years: Advertisement of Multiple Paths in BGP (RFC 7911) aka BGP AddPath3, and it’s available in most modern BGP implementations… but the remaining flies in the ointment still bother some people:
- BGP RR clients receive more information than needed, resulting in memory- and CPU overhead.
- With most BGP AddPath implementations the operator can limit the number of alternate routes sent to the BGP RR clients… but what is the minimum number of alternate paths you need to get optimal end-to-end packet forwarding?
As you might have expected: whenever there’s a niche challenge to be solved, there’s an IETF draft or RFC solving it (sometimes in five different ways). This time, it’s BGP Optimal Route Reflection (BGP ORR) (RFC 9107).
Here’s the CliffsNotes version of that idea: the BGP route reflector imagines how it must feel to be its client, selects the best BGP paths from its client perspective and sends them to the client.
Hope you got two questions while reading the previous sentence:
- Are the best BGP paths calculated for every client (and how much overhead would that generate)? Fortunately, the BGP ORR implementations are smarter than that, and allow you to configure groups of clients. Also, you need to run the client-specific calculations only for otherwise-identical paths where IGP cost is the tie breaker unless you want to support client-specific route selection policies – a morass into which we won’t look.
- How does the BGP RR know what it feels like to be a client? BGP RR and its clients could be part of the same link-state IGP area, or the RR clients could sent their topology information to the reflector via BGP-LS4
Has anyone implemented BGP ORR? I found IOS XR and Junos implementations, and someone has been promising to implement it in FRR for a year, so it might happen in not-too-distant future.
Is it useful? In theory, you could use it whenever a BGP RR is far enough outside of the optimal ingress-to-egress forwarding path5. In practice, I prefer structured network designs that can work without extra magic.
You might want to explore what these RFCs have to say on the subject:
- Border Gateway Protocol (BGP) Persistent Route Oscillation Condition (RFC 3345)
- Distribution of Diverse BGP Paths (RFC 6774)
- BGP Wedgies (RFC 4264)
Figuring out what happens if the BGP next hop is reachable through another BGP route is left as an exercise for the reader. ↩︎
What else did you expect? IETF has a hammer for every nail. ↩︎