Must Read: Redistributing Full BGP Feed into OSPF

Wednesday, October 14, 2020 07:42 UTC

Must Read: Redistributing Full BGP Feed into OSPF

The idea of redistributing the full Internet routing table (840.000 routes at this moment) into OSPF sounds as ridiculous as it is, but when fat fingers strike, it should be relatively easy to recover, right? Just turn off redistribution (assuming you can still log into the offending device) and move on.

Wrong. As Dmytro Shypovalov explained in an extensive blog post, you might have to restart all routers in your OSPF domain to recover.

And that, my friends, is why OSPF is a single failure domain, and why you should never run OSPF between your data center fabric and servers or VM appliances.

BGP
OSPF

Recent posts in the same categories

BGP

OSPF

3 comments:

Minh Ha 17 October 2020 02:33

I've re-read Dmytro's blog several times to make sure I didn't miss some important details. One question that came up was, since MaxAge LSAs are meant to be purged anyway, then after a flapping adj comes back up, isn't it better to program OSPF so that its neighbors know not to send MaxAge LSAs -- since they're useless wrt to building the LS topology -- to it? That way all the adj can go down once, due to the massive flooding of MaxAge LSAs, but not twice, and the network can get back to a clean slate after sometime? How come vendors don't implement OSPF this way? Can you elaborate on this Ivan?

I also have this question for quite some time and his blog reminds me again. Assuming we have a network made up of only high-end routers, is there any reason that a good implementation of IS-IS with well-designed timers, running in such a network, cannot scale to, say 100k nodes, with a single-area design? Sheer flooding in dense topologies has always been a big issue, but it can be alleviated to some extent using IS-IS mesh group. Another reason was due to routers' inadequate control-plane processing power and memory resources. But even so, there were networks consisting of 1k+ IS-IS routers in a single area back in the day. So surely our current routers can handle way more, can they? If they still can't, is it because IGPs still lack a dynamic flow control mechanism that BGP has, thanks to its use of TCP?

And what about EIGRP? Given the same kind of network as above, can it handle a single routing domain of 200k-300k routers and 500k routes (the 500k route figure I found in Russ White's Complexities book)? EIGRP is very much like BGP, in that it's distance-vector, incremental, partial, and bounded. So overall, EIGRP is a lot more stateless than LS IGPs, and should be more scalable right Ivan?

Ivan Pepelnjak 17 October 2020 04:59

The whole idea of OSPF is to have the eventually-consistent topology database synchronized across all routers in the area, and if you want to do that, you simply have to synchronize the DELETE operations as well as INSERT or UPDATE operations. There's no way around that.

As for scaling link-state protocols, keep in mind that while flooding does burn CPU cycles, in the end Dijkstra algorithm isn't O(n), and that's what will eventually kill you no matter how fast the CPU is.

EIGRP scales much better because its computational complexity depends on the number of routes and neighbors.

Minh Ha 18 October 2020 04:34

Thx a lot for the clear answers Ivan!!! Indeed I wasn't thinking from the DB aspect of OSPF. And only a select few like yourself have the deep knowledge encompassing many fields, to discern this kind of issue. And thanks for confirming EIGRP, as an advanced distance-vector IGP, can indeed scale much better.

Re the Dijkstra algorithm/SPF computation complexity, looks like even routers some 15 yrs back could finish the computation in about a second or so, for very large-size areas, and it was no longer considered the scaling bottleneck. Apparently the maximum complexity of Dijstra is O(N^2) -- I could be wrong here, so it's 100m calculations for a 100k-node area. This could be an issue if the control plane is already busy with other stuff.

The time to transfer RIB to FIB for very large RIB was considered a bigger issue I think, but even that was found to be acceptable for 1m prefixes, hence my question re the capability of current routers :)) . Part of my curiosity wrt the limit of modern IGPs, particularly of the better ones like IS-IS and EIGRP, using current-day hardware, stemmed from the way BGP has been hailed as the best IGP in DC leaf-spine fabrics for quite some time now, and the bold claim that BGP is needed there because IGPs have trouble scaling there. I find it absolutely ridiculous, and strongly believe there's massive vested interest in spreading this kind of BS propaganda to the uninitiated.

IMO, just because some big guys fail to scale their networks using IGP and resort to BGP, doesn't mean it's the right way to do things. Size doesn't necessarily mean they know/care what they're doing. MS, despite all their size and money, couldn't deliver a proper microkernel and a stable windows OS to begin with. I've been using windows for 18 yrs, and my XP was having less trouble than my current win10 box -- no jokes, and XP was never great to begin with. Google can't write a proper chrome browser that doesn't hog memory like there's no tomorrow, and the list goes on. How come suddenly these guys become the sages of networking and their choices of routing protocol becomes the gospel?

Add comment