Why Would I Use BGP and not OSPF between Servers and the Network?

While we were preparing for the Cumulus Networks’ Routing on Hosts webinar Dinesh Dutt sent me a message along these lines:

You categorically reject the use of OSPF, but we have a couple of customers using it quite happily. I'm sure you have good reasons and the reasons you list [in the presentation] are ones I agree with. OTOH, why not use totally stubby areas with the hosts being in such an area?

How about:

Because OSPF stub areas would be a total mess to configure? Hmm… maybe not really, we could make it reasonably easy, particularly with network automation.

One host going crazy would impact at least all other hosts in the same area. Definitely not as bad as running servers in backbone area but still. Maybe it’s not so relevant if you’re running the same version of FRR on both ends.

Because you couldn’t filter the prefixes announced by the host? Well, you could control the summarization of prefixes from totally stubby area into backbone area (in theory, not sure many vendors actually implemented that), but within the area you’d still trust everyone. That might not be a problem if you control all the hosts, but would be a huge deal if you don’t… and it would be a nirvana for any intruder trying to move laterally.

Because you can’t implement routing policies (like no transit) in OSPF? I’ve seen designs where an IBM mainframe was a single link failure away from becoming a transit router.

Finally, server-to-network interface is usually a trust boundary, and I don’t believe in running OSPF across trust boundaries. Maybe that’s less of an issue if the same team controls the servers and the network, and runs the same routing software on both, but I definitely wouldn’t run OSPF with just any software that happens to be lying around on a host.

Or maybe it’s just that I like BGP and keep inventing reasons why it’s the best tool for the $job.

Want to know more? There’s over 20 hours of goodies waiting for you in Leaf-and-Spine Fabric Architectures webinar.

Latest blog posts in BGP in Data Center Fabrics series


  1. Totally agree, and all of those are reasons why we've moved from OSPF on servers...but not to BGP. RIP isn't RIP yet !

    RIP itself is simple (important for server admins), summarisation is simple, route filtering is simple, convergence speed is even better than BGP because we don't announce a lot of prefixes, and trust boundary is here, you can use BFD if you want, you can use it in unicast mode, VM mobility between l2 networks is doable, etc etc.

    What's not to like ?

  2. RFC7938 gives you pretty good ideas what can be done with BGP and would be extremely complicated to do with an IGP. If i recall correctly, Ivan did a podcast with some academia guy who hacked OSPF
    1. Jen Rexford's students worked on that IIRC. The idea was creative, but alas not too practical (well, academic :)). In production environments it's generally easier to do BGP route injection, given reasonably broad support in DC/WAN devices (which, btw Jen and Albert worked on at AT&T a while ago)
  3. IBM mainframes have been using RIP and OSFP for HA purposes for a long time.
  4. BGP with exaBGP, what else?
  5. This comment has been removed by the author.
  6. https://tools.ietf.org/html/draft-fang-mpls-hsdn-for-hsdc specifies a functional hierarchical BGP & MPLS design for 10 million hypervisors (500K servers x 20 datacenters). I've never seen a link-state IGP tested beyond 64K routes (more than 2 orders of magnitude below the scale in the draft).

    If they really want the convergence benefits of a link-state IGP, they should look in to implementing BGP-LS/BGP-TE.

Add comment