IBGP Migrations Can Generate Forwarding Loops
A group of researches presented an “interesting” result @ IETF 87: migrating from IBGP full mesh to IBGP reflectors can introduce temporary forwarding loops. OMG, really?
Don’t panic, the world is not about to become a Vogon hyperspace bypass. Let’s put their results in perspective.
Disclaimer: IBGP loops weren’t the main focus of the IETF 87 paper (do go through the whole slide deck, it’s interesting), but I hate the big fuss some people make out of corner cases.
Can it really happen? Sure it can. You can always find a pathological case where following best practices (assuming they deserve the name) can lead to into a quagmire. Route reflectors are no exception.
Is the migration from full mesh to route reflectors a relevant use case? You tell me – I always tell my clients to use BGP route reflectors whenever they have more than four BGP routers in an AS ... but I’m also positive there are still some neglected networks out there running IBGP full mesh (more probably partial mesh because they forgot to configure a few sessions) on tens of boxes.
Are best practices broken? No. They are just that – a procedure that will cause the least harm (as compared to random ramblings and cut-and-paste of Google-delivered configuration snippets) when executed by people who don’t know what they’re doing.
Or, as John Sonmez put it more politely in his Principles are Timeless, Best Practices are Fads blog post:
If you were to blindly follow any best practice and not apply that best practice in a way that brings out the underlying principle, you would be very unlikely to actually receive any benefit.
Does that make BGP a bad protocol? Contrary to some vocal beliefs, it doesn’t. Every tool (including BGP) can be misused, and a properly focused researcher can generate an NP-hard problem out of every real-life situation. Is screwdriver a bad tool because I have to spend so much energy when hammering nails with it? Maybe not.
Is there a way around the problem? Sure. Deploy MPLS-based forwarding in your network (aka: MPLS is the answer … what was the question?)
Lacking any better idea, use a network simulation tool like Cariden to see what will happen with your network prior to reconfiguring it. More about better ideas in follow-up blog post ... and if you have one, share it in the comments.
- In critical networks (e.g., financial institutions, SWIFT), **any** traffic disruption is bad.
- In large networks---spanning multiple countries---and composed of hundreds (if not more) BGP routers, non-trivial BGP reconfiguration can take a long time. As comparison, it took one week for AOL to migrate from OSPF to ISIS: http://meetings.ripe.net/ripe-47/presentations/ripe47-eof-ospf.pdf It is therefore crucial to know that the network will stay consistent in all the intermediate states (you don't want a loop in your network lasting for days).
Also, some *non-neglected* networks still run an iBGP full-mesh, just to get better paths diversity in order to load-balance BGP traffic or to feed fast-convergence mechanisms like BGP PIC with backup paths without using fancy BGP extensions like ADD-PATH.
It is not only in pathological cases. In the presentation, I give results on an *actual* Tier1 topology, not on crazy academic ones---using best current practices---and we found that numerous forwarding loops can be created. I assume pervasive BGP though, not MPLS.
You're right that MPLS does guarantee forwarding correctness. It does not however guarantee signaling correctness. Your BGP network can still oscillate during the reconfiguration. This is annoying as you might send your eBGP traffic to different eBGP next-hops, potentially connected to different ASes. Your customers will wonder why they see perpetual changes in their paths' performance and where do these strange traceroute outputs come from...
To finish on a funny note, yes, most researchers confronted with a non-trivial BGP problem can probably show that it is NP-hard (-complete). Actually, we went one step further and proved that some BGP problems are Turing-complete by building AND/OR/NOT logic gates, as well as memory and clock circuits using *only* BGP configurations. Check out http://vanbever.eu/pdfs/vanbever_turing_icnp_2013.pdf if you want more details ;-)
There are large providers today who just simply use a full BGP mesh, even with 50+ routers. The reality is even though RP CPUs aren't the beefiest in the computing world, handling 50 BGP sessions and the related updates is nothing these days. In a lot of instances route reflection just complicates matters.
There are solutions as well like using dual-plane topologies where you can effectively shut off a plane for maintenance, make your changes, and then turn it back up.
"Micro-loops" can turn into "Mega-loops" when they run over multiple intermediate reconfiguration steps. For instance, I reconfigure (manually, sic) router A and, doing so, creates a loop for a destination D because A starts to send traffic to B which is not reconfigured yet. That loop will stay until I reconfigure (at least) router B. But if you don't know that in advance, you may reconfigure router B at the very end, causing a loop for a large part of the reconfiguration process. Of course, as Ivan correctly pointed out, MPLS removes that problem.
The BGP reconfiguration framework we describe at the end of the presentation leverages a similar idea of running two BGP control-planes, although we do have scalability in mind (i.e., avoiding to duplicate all BGP routes and the associated churn).
A backbone network consisting of less than maybe a few dozen iBGP peers however, I would normally configure using an iBGP full mesh.
I believe that the manageability problem is overstated, because most networks do not change the topology of their BGP backbone each day.
I have wondered how far the iBGP full mesh would scale. I guess nobody knows, because nobody ever really tried. People follow conventional wisdom, which is that you should use route reflectors.
It does not appear to me that there is a lot of burden for a router in an iBGP session, and I believe it would scale to hundreds or thousands of peers.
So if the managemability problem for iBGP sessions could be mitigated or solved, then I believe that even fairly large networks could do without route reflectors.