BGP or OSPF? Does Topology Visibility Matter?
One of the comments added to my Using BGP in Data Centers blog post said:
With symmetric fabric… does it make sense for a node to know every bit of fabric info or is reachability information sufficient?
Let’s ignore for the moment that large non-redundant layer-3 fabrics where BGP-in-Data-Center movement started don’t need more than endpoint reachability information, and focus on a bigger issue: is knowledge of network topology (as provided by OSPF and not by BGP) beneficial?
Before We Start
A few weeks after I published this blog post I received an email that indicated at least some readers understood the question I was trying to address in this blog post as “Do we need to know the state of the topology of our network?”
That was never the intended scope of this blog post, and anyone who thinks the answer is not a resounding YES is a clear victim of section 2.4 of RFC 1925.
However, as some people advertise using OSPF as the underlying protocol in large data center networks instead of now more hip BGP-only approach “because it gives you visibility into network topology”, this blog post addresses the routing protocol side of things, specifically: does visibility of network topology enable a router in a distributed system to construct better forwarding table?
Also, keep in mind that there are many ways one can discover the current state of network topology, starting with reading tables of LLDP neighbors, BGP neighbors, or OSPF neighbors using ancient methods like SNMP. BGP-LS or other methods of data extraction from OSPF or IS-IS topology databases are not the only answer.
The History
Link-state protocols (OSPF and IS-IS) were created as a reaction to extremely slow convergence speeds of early versions of RIP that propagated reachability information with periodic updates.
EIGRP (which is just an optimized distance vector protocol) was created as a reaction to complexities of OSPF.
It looks like another case of a technology pendulum being swung back and forth between two extremes. Does one of them work better than the other?
The Results Are the Same
With link costs being equal, OSPF, IS-IS and EIGRP produce the same forwarding topology (so would RIP if it had link costs, or BGP is you’d encode link costs in AS-path length).
No surprise there. As long as we stick with the hop-by-hop destination-only forwarding paradigm, there’s nothing that a router could do based on knowledge of wider network topology, because it cannot influence the forwarding decisions made by the downstream next-hop router.
The only difference between link state and distance vector protocols used in traditional IP routing is the method of information dissemination: flooding-and-computing (link-state) or computing-and-propagating (distance vector).
Are Convergence Speeds Different?
Short answer: No.
Early implementations of distance vector protocols were excruciatingly slow. Modern implementations using flash updates and reverse poisoning are approximately as fast as link-state protocols.
In any case, you don’t want the network overreacting to every change, so every routing protocol includes all sorts of dampening knobs (LSA origination timer, flooding timer, SPF interval…), making link-state and distance-vector protocols even more similar in performance.
Finally, in large networks the routing protocol convergence time becomes insignificant compared to the time needed to install the changes in the forwarding hardware.
Is There Any Use for Network Topology?
Is there something that a link-state protocol can do that a distance-vector one cannot? As long as we’re using hop-by-hop forwarding paradigm the answer is NO.
The topology information becomes important when the forwarding technology used in the network supports paths (virtual circuits) between network devices, the typical examples being MPLS Traffic Engineering and Segment Routing. In these cases the nodes can use the knowledge of wider network topology to build alternate paths (MPLS Fast Reroute) or redirect traffic away from the failure point (Remote LFA).
Back to the Data Center
Is there any use for detailed knowledge of network topology in a data center switch? Not for IP routing… unless you’re deploying MPLS-TE in your data center, in which case I would like to remind you that additional bandwidth might be cheaper than the engineers needed to operate such a network.
Could the data center switches use network topology for other purposes? For example, a leaf switch might decide to change its load balancing algorithm on leaf-to-spine links based on utilization of downstream spine-to-leaf links.
While some proprietary fabrics do that, no traditional routing protocol propagates such information for good reasons – it might quickly lead to widespread instabilities.
The final answer is thus NO. From the packet forwarding perspective it doesn’t matter whether you use OSPF or BGP in your data center fabric. For other relevant aspects, watch the Leaf-and-Spine Fabric Designs webinar.
Apart from that not all dampening timers can be tuned in BGP to as a low level as it is possible for link-state protocols. In theory it could be but there are some race conditions when BGP can miss its triggered compute time and can wait an another 60 seconds. Unless the network is built on BGP IPv4 AFI only and there is no protocol dependencies and a network vendor allows for subsecond tunning. But agree that for large Data Centers the most significant factor is the prefix insertion into FIB. There is no need to fight for every 50 ms.
In regards to ACI:
http://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/application-centric-infrastructure/guide-c07-733236.html#_Toc406851690
CWB
http://simula.stanford.edu/~alizade/papers/conga-sigcomm14.pdf.
Contrary to some SDN evangelists I believe in learning from history and past mistakes, and if you want to do that, you have to know it/them first.
Do you think that RIFT is barking up the right tree?
https://tools.ietf.org/html/draft-przygienda-rift-02