Rich sent me a question about temporary traffic blackholing in networks where every router is running IGP (OSPF or IS-IS) and iBGP.
He started with a very simple network diagram:
+-------+ C1 +----+ C2 +-------+
| +------+ +------+ |
| E1 | | E2 |
| +------+ +------+ |
+-------+ C3 +----+ C4 +-------+
The routers are configured as follows:
- The network does not have a default route;
- All routers run OSPF and iBGP;
- iBGP sessions are established between loopback interfaces;
- E1 and E2 insert external prefixes into iBGP;
- BGP next hop is not changed across the autonomous system;
- There’s a full mesh of iBGP sessions between the routers, or a route reflector somewhere – doesn’t really matter;
Now imagine C1 crashes. No problem. IGP detects the topology change, and changes the IP routing tables accordingly. BGP next hop is unchanged, so there’s no need for BGP convergence. Life is good.
A few minutes later C1 recovers. IGP establishes adjacencies between C1 and its neighbors. BGP sessions are established only after IGP already changed the routing tables (C1’s loopback was not reachable prior to that), and it takes a while for C1 to populate its BGP table and copy its contents into its routing table.
In the meantime, E1 sees two equal-cost paths toward E2 and starts sending traffic toward external destinations to C1, which immediately drops it, resulting in a temporary traffic black hole until C1 receives all the BGP updates and installs BGP prefixes into its IP routing and forwarding tables.
You’ll experience the same problem any time you’re trying to use functionality (IP forwarding) that relies on information supplied by two independent eventually-consistent systems (OSPF and BGP). MPLS forwarding using LDP exhibits very similar behavior; see also this blog post.
Rich’s question: how can I fix that?
As always, it depends. The “canonical” answer (probably expected in the CCIE lab) is max-metric router-lsa on-startup wait-for-bgp OSPF router configuration command (there’s a similar command for IS-IS).
The max-metric router-lsa command makes a router advertise its Type-1 (router) LSA with maximum metric allowed by OSPF, making paths through it less preferred than anything else. The on-startup option tells the router to do that after reload (instead of immediately) and the next parameter tells the router how long it should advertise the maximum metric – you can specify it in seconds or tell the OSPF routing process to wait for BGP to converge (or at most 10 minutes).
The interesting question at this point should be: and how does the router know when the BGP routing process has converged? The Cisco IOS XE documentation is totally mum on the topic, but I remember seeing something along the lines of we assume BGP has converged when we receive a BGP keepalive message from all peers (which means they have nothing more to tell us).
And now for the fun answers:
- Turn the BGP+OSPF synchronization challenge into a LDP+OSPF synchronization challenge by deploying MPLS forwarding in BGP-free core. See also RFC 1925 rule 6.
- Build a BGP-only network. Not necessarily a good idea if you care about convergence times and your network is not highly symmetrical. The proof is left as an exercise for the reader.
- Use BGP-free core with MPLS forwarding based on segment routing instead of LDP.
Want to know more? Explore the Segment Routing 101 webinar.
Have a similar problem? I’m available for short consulting sessions.
Want to automate your network? There’s a course that might help you.