Must Read: OSPF Protocol Analysis (RFC 1245)

Daniel Dib found the ancient OSPF Protocol Analysis (RFC 1245) that includes the Router CPU section. Please keep in mind the RFC was published in 1991 (35 years ago):

Steve Deering presented results for the Dijkstra calculation in the “MOSPF meeting report” in [3]. Steve’s calculation was done on a DEC 5000 (10 mips processor), using the Stanford internet as a model. His graphs are based on numbers of networks, not number of routers. However, if we extrapolate that the ratio of routers to networks remains the same, the time to run Dijkstra for 200 routers in Steve’s implementation was around 15 milliseconds.

Talking about Millions of Instructions per Second (MIPS) makes as much sense as measuring outside temperature with a wet finger because the amount of work you can do in an instruction depends on the CPU architecture, and memory access quickly becomes the bottleneck anyway. However, while Cisco 2500 router had a CPU comparable to the one mentioned in RFC 1245, the then-recommendation was “30 routers per area”, giving more credence to the rumors about suboptimal OSPF implementation (as opposed to IS-IS) in Cisco routers.

Anyhow, modern CPUs are a bit faster than their 1991 counterparts. Ten-year-old Intel CPUs have ~1000 times higher CPU frequency and ~8000 times the CPU performance of the machine described in RFC 1245. The claims that one needs to replace OSPF (or IS-IS) with EBGP to make a data center fabric scale (as opposed to wants to play with a new toy or would like to boost his resume) are obviously total BS, unless you bought networking software from a vendor who decided to implement OSPF in JavaScript or Prolog 😜

1 comments:

  1. As far as I understood, sometimes BGP is preferred over OSPF because of administrative domain separation. Usually, the people managing the hosts and the people managing the network are in different departments with a lot of nasty politics involved. BGP gives you more control. A single OSPF domain creates too much interdependency between those departments.

    Replies
    1. You're absolutely right and I would never recommend anything else (see https://blog.ipspace.net/2013/08/virtual-appliance-routing-network.html).

      Thanks a million for pointing out another detail I keep forgetting.

    2. o, no... now all our networking is in different ADs. how could we live in even single AS with different routing domains?

    3. > how could we live in even single AS with different routing domains?

      I'll start off with: I do not personally agree with the eBGP design described in RFC7938 (eBGP everywhere forever/always).

      However, I've worked in a large DC network, where this “segregation” between net eng and “server” people existed, politics was not particularly terrible, but ran into it a few times.

      We used an eBGP-driven network design and the “AD” (in the layer 8 sense + configuration sense, not routing protocol sense) is a single domain from the perspective of the NOC team, from the perspective of C-suite and from the perspective of our adjacent system ops team. ASN numbering was pre-defined, it is eBGP all the way down, iBGP for adjacent neighbours intra-rack over an LACP bonding iface (or Aggregated Ethernet as Juniper calls it), OSPF (or is-is) for inter-rack or inter-site iBGP overlay. No MC-LAG/Layer 2 spanning/stacking/anything nonsense anywhere, no route reflectors and no iBGP full-mesh either. 100% layer 3-only networking.

      Each network device (everything is 100% routing, there's no layer 2-only-switching device) only had a default route upwards from the bottom, routes were cleanly aggregated with blackholes on the way down, always (our IPAM, was IPv6-heavy, and it was based on a version that I created, that's similar to what I wrote in my IPv6 Architecture blog post). The concept of a million of routes flooding our core routers (which are downstream of edge routers), Spine/Leaf (routed layer 3) was NIL, each device at most only had a few routes (less than a 100 or so because of the default route for egress back up), Leaf switches with N number of ports, only had N number of /64s for the link-interface addressing etc.

      It in fact simplified configuration and traffic engineering, because we can now deploy ECMP/UCMP network-wide up-to the host itself (the physical server hypervisor ran eBGP with FRR and peers with the Leaf switch) with BGP multipathing. Augment it with pre-defined custom BGP communities, and you're golden for ensuring all links are evenly saturated most of the time.

      OSPF/is-is simply can't do advance and complex traffic engineering, granular route filtering, injection of business policy etc to the level BGP can. IMO, BGP is a policy-driven routing protocol vs link-state IGPs.

      Unfortunately, there's no public documentation on this specific eBGP design (that does not match RFC7938 nor does it match a clos topology, I know I mentioned spine/leaf, but we interconnected them to each other which is, to my knowledge, not a feature of clos topology and very different paradigm). I've been doing my part in sharing my knowledge on this specific design when I get some bandwidth.

      You can find some examples below of the specific design I tried my best to describe in this ultra-short comment, the very same design was used for Ceph networking (no layer 2 nonsense, 100% layer 3-only): https://blog.widodh.nl/2024/05/using-l3-bgp-routing-for-your-ceph-storage/

      I successfully ported this eBGP-driven DC networking to SP networks as well, talked a bit about it below: https://blog.ipspace.net/2024/04/repost-ebgp-only-sp-network.html

      I do intend to write properly on the eBGP-driven SP network design in the future, when my bandwidth permits.

Add comment
Sidebar