Your browser failed to load CSS style sheets. Your browser or web proxy might not support elliptic-curve TLS

Building network automation solutions

6 week online course

Start now!
back to overview

Routing Protocols: a Perfect Example of RFC 1925 Rule 5

In case you’re not familiar with RFC 1925, its Rule 5 states:

It is always possible to agglutinate multiple separate problems into a single complex interdependent solution. In most cases this is a bad idea.

Most routing protocols are a perfect demonstration of this rule.

A typical routing protocol tries to handle:

  • Neighbor discovery (which nodes can I see on the link)
  • Failure detection (is the path to my neighbor working)
  • Health check and byzantine neighbor failure detection (is the neighbor sane)
  • Dissemination of information (aka flooding)
  • Collecting and distributing topology information (who is connected to whom)
  • Collecting and distributing endpoint reachability information (what endpoints are connected to the network)

Most routing protocols reinvent every single wheel listed above, and also try to bundle several features with sometimes conflicting requirements into a single protocol. For example, OSPF, EIGRP, and IS-IS use hello or keepalive messages to discover neighbors, detect link failure, and verify neighbor’s health check. BGP uses keepalive messages to discover link and neighbor failure.

Is there anything wrong with that approach? Of course – links usually fail more often than nodes or routing protocols. It’s therefore crucial to detect link failure relatively quickly, but we don’t have to be so aggressive with the node health check.

Not surprisingly, we got BFD a while back – a lightweight protocol with a single mission: detect path loss between two adjacent unicast IP addresses… and nonetheless years later we’re still discussing BGP keepalives.

While more and more network deployments use BFD, we still haven’t even touched the dissemination of information problem. Why does every single routing protocol have to reinvent flooding when we have so many production-tested message queue or eventually-consistent database products.

Finally, agglutinating routing and reachability information led to bloated LSPs in IS-IS and an a zillion tiny LSAs in OSPF.

As always, it’s possible to get a more stable and scalable network (even though the tools are suboptimal) with sensible architecture. A typical large-scale network design would use the right tool for each job:

  • OSPF or IS-IS to discover network topology and shortest paths;
  • BGP to collect endpoint information and map it to egress next-hop.

However, while we knew how to design scalable network for ages, many engineers deploying large enterprise networks consistently ignored that insight, resulting in way too many OSPF Band-Aids.

Facing a similar challenge? Let’s discuss it in a short online session.

3 comments:

  1. Will you still use Bfd even if you have an Ip address directly configured on the interface without L2 transport in the middle to detect if there is some kind of software failure on the other side? I usually avoid it.

    ReplyDelete
  2. Yes, I do. Too many things can go wrong even on a simple point to point link. The latest ones I have observed are a router not properly detecting when a link goes down, and one of the SFPs on a DAC cable failing while keeping the link up towards the router. BFD detects all of these.

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Sidebar