Why Would I Use BGP and not OSPF between Servers and the Network?

While we were preparing for the Cumulus Networks’ Routing on Hosts webinar Dinesh Dutt sent me a message along these lines:

You categorically reject the use of OSPF, but we have a couple of customers using it quite happily. I’m sure you have good reasons, and the reasons you list [in the presentation] are ones I agree with. OTOH, why not use totally stubby areas with hosts in such an area?

How about:

Because OSPF stub areas would be a total mess to configure? Hmm… maybe not really; we could make it reasonably easy, particularly with network automation.

One host going crazy would impact at least all other hosts in the same area. It’s not as bad as running servers in the backbone area, but still. Maybe it’s irrelevant if you’re running the same version of FRR on both ends (on the Top-of-Rack switch and the server).

Because you couldn’t filter the prefixes announced by the host? Well, you could control the summarization of prefixes from the totally stubby area into the backbone area (at least in theory; I’m not sure how many data center switch vendors implemented that). However, within the area, you’d still trust everyone. That might not be a problem if you control all the hosts, but it would be a huge deal if you don’t… and it would be a nirvana for any intruder trying to move laterally.

Because you can’t implement routing policies (like no transit) in OSPF? I’ve seen designs where an IBM mainframe was a single link failure away from becoming a transit router.

Finally, the server-to-network interface is usually a trust boundary, and I don’t believe in running OSPF across trust boundaries. Maybe that’s less of an issue if the same team controls the servers and the network and runs the same routing software on both, but I definitely wouldn’t run OSPF with just any software that happens to be lying around on a host.

Or maybe it’s just that I like BGP and keep inventing reasons why it’s the best tool for the $job.

Latest blog posts in BGP in Data Center Fabrics series


  1. Totally agree, and all of those are reasons why we've moved from OSPF on servers...but not to BGP. RIP isn't RIP yet !

    RIP itself is simple (important for server admins), summarisation is simple, route filtering is simple, convergence speed is even better than BGP because we don't announce a lot of prefixes, and trust boundary is here, you can use BFD if you want, you can use it in unicast mode, VM mobility between l2 networks is doable, etc etc.

    What's not to like ?

  2. RFC7938 gives you pretty good ideas what can be done with BGP and would be extremely complicated to do with an IGP. If i recall correctly, Ivan did a podcast with some academia guy who hacked OSPF
    1. Jen Rexford's students worked on that IIRC. The idea was creative, but alas not too practical (well, academic :)). In production environments it's generally easier to do BGP route injection, given reasonably broad support in DC/WAN devices (which, btw Jen and Albert worked on at AT&T a while ago)
  3. IBM mainframes have been using RIP and OSFP for HA purposes for a long time.
  4. BGP with exaBGP, what else?
  5. This comment has been removed by the author.
  6. https://tools.ietf.org/html/draft-fang-mpls-hsdn-for-hsdc specifies a functional hierarchical BGP & MPLS design for 10 million hypervisors (500K servers x 20 datacenters). I've never seen a link-state IGP tested beyond 64K routes (more than 2 orders of magnitude below the scale in the draft).

    If they really want the convergence benefits of a link-state IGP, they should look in to implementing BGP-LS/BGP-TE.

Add comment