Estimating BGP Convergence Time

One of my readers sent me this question:

I have an Internet edge setup with two routers connected to two upstream ISPs and receiving full BGP routing table from them. I’m running iBGP between my Internet routers. Is there a formula to estimate convergence time if one of my uplinks fail? How many updates will I need to get the entire 512K routes in BGP table and also how much time it would take?

As always, the answer is it depends.

If you’re trying to estimate how large the updates would be, look into how much memory the BGP process is consuming. BGP updates are pretty tightly packed, and the Cisco IOS in-memory structures probably closely reflect the BGP data model. In one of my experiments BGP used approximately 250 bytes per prefix, or 128 MB for 512K routes.

On the other hand, a BGP update message carrying a prefix with 3 AS numbers in the AS path is less than 100 bytes long (based on sample BGP wireshark capture), so you might need closer to 50-60 MB to send 512K routes in BGP updates.

Do you have more precise numbers? Please share them in the comments!

Next, estimate how long it would take to transfer that information to the other router. With 1Gbps link between the two boxes the answer should be “less than a second”, more so as you have negligible latency and hopefully no packet drops between them.

However, very probably you won’t get anywhere close to that - the two routers have to generate the updates, process the updates, choose new best paths, install them in IP routing table and FIB… and based on the CPU used in your router it can take significantly longer than 1 second.

Some platforms with dismal CPUs could take minutes to converge, resulting in unpleasant brownouts. I described these problems and potential solutions in the BGP Convergence Optimization case study (part of ExpertExpress Case Studies and Data Center Design Case Studies book)

In any case, if you’re worried about BGP convergence time in this simple scenario, use BGP Best External and BGP Prefix Independent Convergence features to ensure the local FIB converges immediately after the link or BGP neighbor loss, not after all the BGP updates have been processed.

15 comments:

  1. Definitely it depends on this one... Also, I assume we are discussing outbound traffic as inbound traffic would be very difficult to measure, depending on how far from our AS we could consider the network as converged.

    The first step is to detect the failure. Design for Loss of Signal (LoS), so don't use any converters, MUXes that do stupid things etc. Ask the provider to run BFD.

    I'm assuming that under normal circumstances, the secondary router would have the iBGP routes with a higher local pref but the eBGP routes already in the BGP table. So when Upstream A fails, the primary router would only have to send Withdraw message and the secondary router would start to use the external routes. The secondary router would then have to send those routes to the primary router through iBGP.

    So some time could be shaved off the convergence if the secondary router attracts all the traffic instead of traffic going to the primary router and then across the iBGP peering. Not knowing what the network looks like, this could be achieved by modifying HSRP priority based on some tracking of an interface or route, conditionally advertising a default route and so on.

    BGP PIC and Best external would definitely be helpful though if supported on the platform.
  2. Hi,

    What Daniel describe in the third paragraph takes us 15 sec (in our network) to re-converge to the second router. We are considering of using BGP PIC and best external. Anyone using this in production environment on the edge routers (Full bgp table) ?
    Replies
    1. How many routes do you get from the ISPs? Full routing table?
    2. On top of previous advices, do you really need the full BGP updates from your ISPs? Asking for partial BGP updates should significantly improve your convergence time.
    3. Hi Ivan. Yes,the full routing table.
  3. You can use a tool like Exabgp to simulate this fairly well. I usually put a real router in the middle, and then the real client router after that. Exabgp can handle advertising a full table fairly easily, whether you get it directly from a live network or build a config with 500K paths in it. We still use a full mesh, with 57 routers, and each router sees approximately 3M unicast paths, with 500K+ best paths after it's all said and done. I simulate all 57 routers in exabgp and use linux TC to simulate the latency between the peers.

    You can imagine on cold boot there is a lot of churn going on as best paths are replaced with better paths, forwarding tables have to be updated, etc. However with modern CPUs convergence is still under 2 minutes once everything comes up

    With a RR in the middle and having just a single neighbor advertising the best 520K odd paths, the convergence is 10-15 seconds.
  4. Phil:

    Why are you using a full mesh? Not here to flame you, just sincerely interested. The only reason I could think of is to have more paths advertised and optimize routing to the IGP nexthops. At least the first one should be solveable by advertisinge best path + N paths.
    Replies
    1. Simplicity more than anything else. In the case of a modern routing engine 57 peers is nothing and neither is 3M routes in the RIB. Everything is redundant so it's not like traffic is down for 2 minutes or even 15 seconds. There are layers of route reflection in the network of course, just not at that layer.
  5. You forget about differences in software.

    My measurements show that when I get a full table from eBGP peers, Juniper MX (with a recent routing engine) is the fastest, 76xx is twice as slow (due to an older CPU I guess), then ASR9k is twice as slow compared to 76xx.

    You'd probably think that ASR9k should be faster than C76xx but Cisco broke something in their BGP implementation and now it no longer groups prefixes with the same BGP attributes in a single update message.

    Juniper MX960: 83k updates per 533k prefixes.
    Cisco 7600: 91k updates per 527k prefixes
    Cisco ASR9k: 506k updates per 531k prefixes
    (Just tested)

    And yes, it was reported, and Cisco after several months said that it's a software-hardware limitation (how obvious) and the only workaround is to do ebgp multihop with clients through another route-server. Well...no. Just no.

    Also I believe you downplay the FIB programming time.
    Replies
    1. What revision of XR is that? Take a busy 7600 and add another full table peer to it and you can probably run to Starbucks and get some coffee before it's done. Same goes with a MX80. Juniper of course had the famous KRT stuck queue issue where it could take quite some time to program the FIB regardless of the RIB being up to date.
  6. Ivan,

    It´s my understanding that BGP Best External & BGP PIC features are for MPLS/VPN environments and also to be implemented in the ISP Network. I do not see how it would be used here.
    Replies
    1. That's true - they were designed for MPLS/VPN networks. However, you could (with plenty of smarts ;) use them in the above setup, and I think I've seen someone supporting Best External for plain IPv4 sessions.
  7. I have a question that might be weird, I have 2 upstream carriers with BGP, one of them currently is giving me zero routes by BGP (they're fixing that) and the other gives me the default route. So basically the only route in the routing table is the one from carrier #2. As carrier #1 is fixing things they tear down the session, and then i got and outage of a few minutes when the default route (the only one) is still in the routing table because it's from #2, is this normal? To lose all connectivity when a default route is still in the table?
    Replies
    1. No, that is definitely not normal.
    2. Any idea what it could be, config or bug?
Add comment
Sidebar