BGP « ipSpace.net blog

Thursday, October 14, 2021 06:23 UTC

BGP Optimal Route Reflection 101

Almost a decade ago I described a scenario in which a perfectly valid IBGP topology could result in a permanent routing loop. While one wouldn’t expect to see such a scenario in a well designed network, it’s been known for ages¹ that using BGP route reflectors could result in suboptimal forwarding.

Here’s a simple description of how that could happen:

Must Read: BGP Private AS Range

We all know that you have to use an AS number between 64512 and 65535 for private BGP autonomous systems, right? Well, we’re all wrong – the high end of the range is 65534, and Chris Parker wrote a nice blog post explaining the reasons behind that change.

see 2 comments

BGP

Wednesday, June 23, 2021 06:56 UTC

Why Do We Need BGP-LS?

One of my readers sent me this interesting question:

I understand that an SDN controller needs network topology information to build traffic engineering paths with PCE/PCEP… but why would we use BGP-LS to extract the network topology information? Why can’t we run OSPF with controller by simulating a software based OSPF instance in every area to get topology view?

There are several reasons to use BGP-LS:

Unexpected Interactions Between OSPF and BGP

It started with an interesting question tweeted by @pilgrimdave81

I’ve seen on Cisco NX-OS that it’s preferring a (ospf->bgp) locally redistributed route over a learned EBGP route, until/unless you clear the route, then it correctly prefers the learned BGP one. Seems to be just ooo but don’t remember this being an issue?

Ignoring the “why would you get the same route over OSPF and EBGP, and why would you redistribute an alternate copy of a route you’re getting over EBGP into BGP” aspect, Peter Palúch wrote a detailed explanation of what’s going on and allowed me to copy into a blog post to make it more permanent:

Questions about BGP in the Data Center (with a Whiff of SRv6)

Henk Smit left numerous questions in a comment referring to the Rethinking BGP in the Data Center presentation by Russ White:

In Russ White’s presentation, he listed a few requirements to compare BGP, IS-IS and OSPF. Prefix distribution, filtering, TE, tagging, vendor-support, autoconfig and topology visibility. The one thing I was missing was: scalability.

I noticed the same thing. We kept hearing how BGP scales better than link-state protocols (no doubt about that) and how you couldn’t possibly build a large data center fabric with a link-state protocol… and yet this aspect wasn’t even mentioned.

Unequal-Cost Multipath with BGP DMZ Link Bandwidth

In the previous blog post in this series, I described why it’s (almost) impossible to implement unequal-cost multipathing for anycast services (multiple servers advertising the same IP address or range) with OSPF. Now let’s see how easy it is to solve the same challenge with BGP DMZ Link Bandwidth attribute.

I didn’t want to listen to the fan noise generated by my measly Intel NUC when simulating a full leaf-and-spine fabric, so I decided to implement a slightly smaller network:

Worth Reading: Running BGP in Large-Scale Data Centers

Here’s one of the major differences between Facebook and Google: one of them publishes research papers with helpful and actionable information, the other uses publications as recruitment drive full of we’re so awesome but you have to trust us – we’re not sharing the crucial details.

Recent data point: Facebook published an interesting paper describing their data center BGP design. Absolutely worth reading.

Just in case you haven’t realized: Petr Lapukhov of the RFC 7938 fame moved from Microsoft to Facebook a few years ago. Coincidence? I think not.

see 5 comments

Wednesday, May 26, 2021 07:06 UTC

Packet Forwarding and Routing over Unnumbered Interfaces

In the previous blog posts in this series, we explored whether we need addresses on point-to-point links (TL&DR: no), whether it’s better to have interface or node addresses (TL&DR: it depends), and why we got unnumbered IPv4 interfaces. Now let’s see how IP routing works over unnumbered interfaces.

The Challenge

A cursory look at an IP routing table (or at CCNA-level materials) tells you that the IP routing table contains prefixes and next hops, and that the next hops are IP addresses. How should that work over unnumbered interfaces, and what should we use for the next-hop IP address in that case?

Worth Watching: Rethinking BGP in the Data Center

Ever since draft-lapukhov was first published almost a decade ago, we all knew BGP was the only routing protocol suitable for data center networking… or at least Thought Leaders and vendor marketers seem to be of that persuasion.

BGP-Free MPLS Core with Segment Routing

After I created the Segment Routing lab to test the relationship between Node Segment ID (SID) and MPLS labels, I was just a minor step away from testing BGP-free core with SR-MPLS.

I added two nodes to my lab setup, this time using IOSv as those nodes need nothing more than EBGP support (and IOSv is tiny compared to IOS XE on CSR):

Category: BGP

The Challenge