Why Is OSPF (and BGP) More Complex than STP?

Thursday, April 27, 2023 06:26 UTC

Why Is OSPF (and BGP) More Complex than STP?

I got this question from one of my readers:

Why are OSPF and BGP are more complex than STP from a designer or administrator point of view? I tried everything to come to a conclusion but I couldn’t find a concluded answer, ChatGPT gave a circular loop answer.

There are numerous reasons why a protocol, a technology or a solution might be more complex than another seemingly similar one (or as Russ White would have said, “if you haven’t found the tradeoffs, you haven’t looked hard enough”):

TL;DR generated by ChatGPT GPT-4 ?: OSPF and BGP are more complex than STP because they solve different problems and have different performance goals. STP focuses on preventing loops in forwarding topology, while OSPF and BGP find optimal forwarding topologies and propagate edge information. OSPF’s complexity arises from various optimizations, while BGP complexity comes from implementing routing policies in large-scale environments.

Someone failed to find a simpler solution. Every now and then, someone finds an amazingly simple solution to what seemed to be a hard problem. Considering the long history of all three protocols¹, the time spent on theoretical foundations of routing (example: graph theory), and numerous routing protocols developed in the past, we can probably conclude this one doesn’t apply.

Trading performance for complexity. Faster implementations or implementations with lower memory requirements or stricter performance guarantees tend to be more complex than simplistic implementations (assuming we looked hard enough for a simpler solution). The long history of sorting algorithms is an excellent example. Cache-based forwarding and vector packet processing versus simple packet-by-packet forwarding is another one.

They are solving a different problem. This is obviously the case for STP, OSPF, and BGP.

The only job STP has is to ~~shut down~~ block² interfaces to prevent loops in a forwarding topology, and it does that in the simplest possible way: if a bridge hears about a more prominent bridge³ through multiple interfaces, it ~~shuts down~~ blocks all but one of them. STP does not advertise edge prefixes⁴, has no concept of neighbors and no reliability⁵, and the only way to make such a simplistic protocol work is to wait a while after each change to make sure things settle down.

OSPF has to find the optimal forwarding topology from the perspective of any node in the network and propagate edge information (IP subnets) to all forwarding nodes to allow them to build deterministic optimal forwarding tables. While simpler routing protocols like RIP get the same job done, OSPF does it faster and more reliably. To reach all those goals, OSPF has to:

Find adjacent routers (neighbors)
Exchange its network topology information with adjacent routers
Figure out when the adjacent routers are no longer reachable and track other changes in local network topology.
Propagate those changes to all its neighbors, and flood changes generated by remote routers
Use the network topology information and edge reachability information to build optimal forwarding table.

That sounds reasonable, but the designers of OSPF wanted to get it implemented on 16-bit processors with 2MB of memory running at 4 MHz clock rate. To get there, they had to implement tons of optimizations like areas, inter-area summarization, and stub areas. Like that wouldn’t be hard enough, people started using OSPF in weird scenarios and added features like not-so-stubby-areas, totally not-so-stubby areas, and on-demand circuits to make those scenarios work. Long story short, exploring all the potential nooks in the solution space turned OSPF into bloatware⁶.

Finally, the designers of BGP wanted to implement routing policies in large-scale environments, and they did a marvelous job – BGP can carry millions of routes advertised by tens of thousands of autonomous systems. To get there, they:

Decided to use a well-understood transport protocol (TCP),
Used distance-vector⁷ approach because one cannot build an Internet-wide link state graph⁸, and
Traded speed of convergence for scalability.

You’ll learn more about the routing protocol basics in the Routing Protocols part of How Networks Really Work webinar. I also collected links to several open-source networking textbooks in the Textbooks and Other Resources part of Networking Fundamentals roadmap.

As for ChatGPT gave a circular answer: there’s still no substitute for hard work and understanding how things really work.

Revision History

2023-04-28: Correction: STP does not shut down interfaces but puts them into blocking state.

And assuming networking engineers working in IETF aren’t stupid or blindsided ;) ↩︎
As an anonymous commenter pointed out, STP does not shut down the interfaces (or it wouldn’t be able to listen to BPDUs) but puts them into a BLOCKING state in which they don’t receive or send anything that is not a layer-2 control-plane traffic. ↩︎
Sometimes called ’the root of the spanning tree' ↩︎
Relying instead on dynamic MAC learning that does not scale beyond a few hundred attached nodes ↩︎
It happily forms a forwarding loop if a neighbor forgets to announce what it’s doing. ↩︎
The same thing happened to STP with Rapid STP, per-VLAN Spanning Tree, and Multiple STP. ↩︎
OK, path vector for the pedants. ↩︎
Plus nobody wants their competitors to see the innards of their networks or their external connectivity. ↩︎

Revision History

Recent posts in the same categories

bridging

OSPF

BGP

3 comments: