Is OSPF Unpredictable or Just Unexpected?
I was listening to a very interesting Future of Networking with Fred Baker a long while ago and enjoyed Fred’s perspectives and historical insight until Greg Ferro couldn’t possibly resist the usual bashing of traditional routing protocols and praising of intent-based (or flow-based or SDN or…) whatever.
Here’s what I understood he said around 35:17
The problem with the dynamic or distributed algorithms is that they quite often do unexpected things.
You might think it was a Freudian slip-of-tongue, but it seems to be a persistent mantra. Recently it became “a fallacy that a network will ever be reliable or predictable.”
Well, I totally believe that routing algorithms like OSPF would surprise Greg or myself (as I often admit during my network automation workshops), but that only means that with all the nerd knobs we added they became too complex for mere mortals to intuitively grasp their behavior.
Anyway, let’s move from subjective unexpected to objective unpredictable or non-deterministic.
Interestingly, with the clear split between information distribution (LSA flooding) and route computation (SPF algorithm), link-state routing protocols are one of the most predictable distributed algorithms out there, and can in the worst-case scenario result in temporary forwarding loops due to eventual consistency of topology database.
Assuming you have infinite patience, it’s quite easy to predict what an OSPF network will look like:
- Take topology database;
- Follow all the intricate rules in various OSPF-related RFCs;
- Get the final forwarding table.
Nobody in his right mind would do something like that, but once the steps to a solution are well-defined, it’s trivial (from the perspective of a mathematical proof, not the actual implementation) to carry them out… and there are tools like Cariden’s MATE that do exactly that.
However, because it’s easier to not spend money on something that would prevent an event with uncertain probability (network going down due to misconfigured OSPF, or losing customer data due to an intrusion), vendors like Cariden have relatively few customers, resulting in expensive tools.
Of course, there’s another way of dealing with the “unexpectedness” of OSPF: stop being a MacGyver, forget the nerd knobs, keep your network design as simple as possible, and use the absolute minimum subset of features you need to get the job done.
Unfortunately, it seems like only a small minority of engineers or architects want to follow this particular advice. It’s so much easier to believe in yet another technology wonder.
Speaking of "persistent oscillations" (in BGP, for example) - most examples of those that I have seen are "we have added more knobs, and people build brittleness by twisting too many knobs". BGP or EIGRP in itself are not as deterministic as OSPF (since "on which link was the update heard first?" becomes relevant) but *stable* they are...
Rare? Sure. Impossible? Definitely not.
RFC 1925 Rule 7a is more applicable: we chose to have Fast routing protocol on Cheap processors and sacrificed Good (or simple) in the process.
What operating system do you use on your switches?