Build the Next-Generation Data Center
6 week online course starting in spring 2017

Is BGP Really that Complex?

Anyone following the popular networking blogs and podcasts is probably familiar with the claim that BGP is way too complex to be used in whatever environment. On the other hand, more and more smart people use it when building their data center or WAN infrastructure. There’s something wrong with this picture.

BGP is complex

There’s no doubt about that. BGP is complex enough to make a week-long course out of it (trust me, I created at least two of them). However, most of that complexity comes from the initial BGP use case: inter-domain routing driven primarily by the needs of routing policies.

To make matters worse, various service providers pushed vendors to implement ever-more-intricate nerd knobs instead of solving the problem the right way: building a policy server (call it an SDN controller to make marketers ecstatic) that acts as a BGP route reflector and tweaks the updates sent to individual BGP routers to enforce the desired routing policy.

However, when you start using BGP as a simple endpoint reachability distribution mechanism (example: running BGP over large DMVPN cloud), most of its intricacies become unnecessary, and you’re left with an elegant and simple protocol, combining the simplicity of RIP with versatility of OSFP or IS-IS.

Know your tools

A good craftsman knows his tools and uses the tool that’s best suited for the job at hand (a screw needs a screwdriver even if the closes tool is a hammer). A networking engineer should do no less, even though some people believe in universality of their preferred tool. Well, while there are universal tools out there, they tend not to be good at any particular job.

Back to the routing protocols:

  • If you have a complex haphazard mesh of links of various speeds, use OSPF or IS-IS. They were designed for that job.
  • If you have to carry a large set of prefixes within your routing domain, use IBGP on top of OSPF or IS-IS. IBGP will have no problems carrying the prefixes, and OSPF or IS-IS will do a good job finding the optimal path.
  • If you have a large highly symmetrical fabric, BGP is a perfect tool for the job, particularly when combined with BFD.

Those pesky implementation details

You might have network design that’s a perfect match to BGP’s capabilities, and yet you’re hesitating because BGP configuration quickly becomes a nightmare. Time to shop around; there are vendors who realized BGP configuration tools designed for the initial inter-AS BGP use case don’t cut it when we want to deploy BGP in the data centers.

For example:

For more details, start with the Simplifying BGP Configurations video in which Dinesh Dutt explains how easy it is to run BGP in a leaf-and-spine fabric, and continue with the rest of the Leaf-and-Spine Fabrics webinar.

Finally, do read the BGP: Application Networking Dream by Tom Hollingsworth.


  1. Agree with this completely. When we built out a few data centers and had the opportunity to do greenfield architecture, we looked very closely at our internal MPLS design and decided to run BGP over it. There was a fair amount of hesitation from the support staff, but we got through it. Primarily, we showed them a sample config and they were surprised at how simple it was... if I recall, they had been looking at a carrier's example of a BGP/MPLS config, which was far more complex. Since then, the entire WAN/LAN/edge is BGP. And against general recommendations, each campus or data center zone (production, development, etc...) has its own ASN. We found the configs and hassles of EBGP peering to be much simpler than IBGP peering. And we found it much easier to shift traffic around via prepend statement modifications during maintenance. With OSPF, messing with costs and priorities is fine, but you need to know the cost of all your paths, otherwise you'll get unintended results. BGP we rarely run into unexpected behavior. Just resist the urge to use all of the knobs BGP provides and the configs are relatively simple. It's not the solution for every problem, but we have no regrets now that it's been in place for several years.

  2. Always a good idea to know your tools, by that I mean experiment with them and actually take the time to read about them. Case in point I've been using ping and traceroute for years, but all I knew about them was how to run them and read the results. When you take the time to read and learn about them you really appreciate what they can and cannot do. BTW I'm terrified of any routing protocol actually more terrified of printing out the show commands and having to interpret them. But I only look at that maybe once a year.

  3. Hi Ivan,

    "If you have to carry a large set of prefixes without your routing domain..."
    I think here you meant 'within your routing domain.


    1. I'm guessing there was a battle between the words "within" and "throughout" with both sides taking casualties, eh?

  4. I think that OSPF is more complex than BGP. I always need to remeber the SLA types in the google, and the way to filter a prefix can be different depending of the SLA type.

  5. "... instead of solving the problem the right way: building a policy server (call it an SDN controller to make marketers ecstatic) that acts as a BGP route reflector ..."

    I don't think this would work.
    At least not as simple as you might think it would be.

    Suppose this topology:
    foreign peer A - my eBGP speaker B - my Route Reflector C, acting as policy server - my iBGP speakers D, E and F.

    Your idea would work fine for routers C, D, E and F. But what about eBGP speaker B ? It receives a prefix P from A. Advertises P to RR C. RR C applies policies to P. RR C advertises P to D, E and F. Works fine.

    But router B would not be subject to the policies of RR C. That would only work if B would not install prefix P in its BRIB, but only advertise it to RR C. Then RR C applies policies, and should advertise back to B. Then B can install the prefix P it got from RR C into its BRIB.

    A bit convoluted. And you'll need to really change the behaviour of BGP implementations for this to work. Which might mean this can't be introduced gradually, but requires all routers to be upgraded. Or using flag-days. Or a migration-scheme that is even harder than plain BGP.

    Nice rough idea. But it would require some extra thinking to make it work.

    1. Of course RR can influence B - by sending identical prefix with higher local preference or two more-specific prefixes with no-advertise community. In any case, it helps if you configure "bgp advertise-best-external".

      IIRC some service providers implemented ideas along these lines more than a decade ago, and there was at least a product (if not more) out there doing something similar. Alas, the memories faded...

  6. I believe BGP is one of the easiest and simplest routing protocol, and its also very predicatable. I prefer to use it over others when and where possible.

  7. Is there a good (historical) summary of all the BGP RFCs, knobs and features. I'm interested in fully learning the protocol. I'd also like to get a perspective of what makes BGP in the DC simpler (so a diff between all the rules required for peering and transit networks and the smaller needs of a L3 DC).


You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.