Estimating BGP Convergence Time
One of my readers sent me this question:
I have an Internet edge setup with two routers connected to two upstream ISPs and receiving full BGP routing table from them. I’m running iBGP between my Internet routers. Is there a formula to estimate convergence time if one of my uplinks fail? How many updates will I need to get the entire 512K routes in BGP table and also how much time it would take?
As always, the answer is it depends.
If you’re trying to estimate how large the updates would be, look into how much memory the BGP process is consuming. BGP updates are pretty tightly packed, and the Cisco IOS in-memory structures probably closely reflect the BGP data model. In one of my experiments BGP used approximately 250 bytes per prefix, or 128 MB for 512K routes.
On the other hand, a BGP update message carrying a prefix with 3 AS numbers in the AS path is less than 100 bytes long (based on sample BGP wireshark capture), so you might need closer to 50-60 MB to send 512K routes in BGP updates.
Do you have more precise numbers? Please share them in the comments!
Next, estimate how long it would take to transfer that information to the other router. With 1Gbps link between the two boxes the answer should be “less than a second”, more so as you have negligible latency and hopefully no packet drops between them.
However, very probably you won’t get anywhere close to that - the two routers have to generate the updates, process the updates, choose new best paths, install them in IP routing table and FIB… and based on the CPU used in your router it can take significantly longer than 1 second.
Some platforms with dismal CPUs could take minutes to converge, resulting in unpleasant brownouts. I described these problems and potential solutions in the BGP Convergence Optimization case study (part of ExpertExpress Case Studies and Data Center Design Case Studies book)
In any case, if you’re worried about BGP convergence time in this simple scenario, use BGP Best External and BGP Prefix Independent Convergence features to ensure the local FIB converges immediately after the link or BGP neighbor loss, not after all the BGP updates have been processed.
The first step is to detect the failure. Design for Loss of Signal (LoS), so don't use any converters, MUXes that do stupid things etc. Ask the provider to run BFD.
I'm assuming that under normal circumstances, the secondary router would have the iBGP routes with a higher local pref but the eBGP routes already in the BGP table. So when Upstream A fails, the primary router would only have to send Withdraw message and the secondary router would start to use the external routes. The secondary router would then have to send those routes to the primary router through iBGP.
So some time could be shaved off the convergence if the secondary router attracts all the traffic instead of traffic going to the primary router and then across the iBGP peering. Not knowing what the network looks like, this could be achieved by modifying HSRP priority based on some tracking of an interface or route, conditionally advertising a default route and so on.
BGP PIC and Best external would definitely be helpful though if supported on the platform.
What Daniel describe in the third paragraph takes us 15 sec (in our network) to re-converge to the second router. We are considering of using BGP PIC and best external. Anyone using this in production environment on the edge routers (Full bgp table) ?
You can imagine on cold boot there is a lot of churn going on as best paths are replaced with better paths, forwarding tables have to be updated, etc. However with modern CPUs convergence is still under 2 minutes once everything comes up
With a RR in the middle and having just a single neighbor advertising the best 520K odd paths, the convergence is 10-15 seconds.
Why are you using a full mesh? Not here to flame you, just sincerely interested. The only reason I could think of is to have more paths advertised and optimize routing to the IGP nexthops. At least the first one should be solveable by advertisinge best path + N paths.
My measurements show that when I get a full table from eBGP peers, Juniper MX (with a recent routing engine) is the fastest, 76xx is twice as slow (due to an older CPU I guess), then ASR9k is twice as slow compared to 76xx.
You'd probably think that ASR9k should be faster than C76xx but Cisco broke something in their BGP implementation and now it no longer groups prefixes with the same BGP attributes in a single update message.
Juniper MX960: 83k updates per 533k prefixes.
Cisco 7600: 91k updates per 527k prefixes
Cisco ASR9k: 506k updates per 531k prefixes
(Just tested)
And yes, it was reported, and Cisco after several months said that it's a software-hardware limitation (how obvious) and the only workaround is to do ebgp multihop with clients through another route-server. Well...no. Just no.
Also I believe you downplay the FIB programming time.
It´s my understanding that BGP Best External & BGP PIC features are for MPLS/VPN environments and also to be implemented in the ISP Network. I do not see how it would be used here.