… updated on Saturday, December 26, 2020 08:49 UTC
LDP-IGP Synchronization in MPLS Networks
A reader of my blog planning to migrate his network from a traditional BGP-everywhere design to a BGP-over-MPLS one wondered about potential unexpected consequences. The MTU implications of introducing MPLS in a running network are usually well understood (even though you could get some very interesting behavior); if you can, increase the MTU size by at least 16 bytes (4 labels) and check whether MTU includes L2 header. Another somewhat more mysterious beast is the interaction between IGP and LDP that can cause traffic disruptions after the physical connectivity has been reestablished.
Here’s a typical BGP-over-MPLS design (applies equally well to MPLS/VPN, 6PE, 6VPE, VPLS or pseudowires):
- Edge routers (PE-routers) run BGP between themselves to exchange external (customer) prefixes;
- Edge and core (P) routers run IGP (usually OSPF or IS-IS) to find optimum path toward BGP next hops;
- P- and PE-routers use LDP to exchange labels for known IP prefixes (including BGP next hops). LDP indirectly builds end-to-end LSPs across the network core.
IP packets can be forwarded across BGP-free core even though the core routers don’t know how to forward them. Ingress PE-router labels incoming IP packets with MPLS labels for BGP next hops, labeled packets are sent across the core (core routers don’t perform IP lookup), last P-router pops the top label (penultimate hop popping) and the egress PE-router performs IP lookup and sends the datagram toward an external destination (the process is slightly different when you use technologies like MPLS/VPN that need a two-label stack).
If a core link fails, IGP quickly finds an alternate path. LDP does not need to converge; when using independent label distribution and liberal label retention mode (default settings on most modern routers), every LSR saves labels advertised by all its neighbors. In our network, when A discovers that it can use B to reach the egress router, it already has the label B assigned to EG prefix in its Label Information Base (LIB). LDP thus causes no interruption in traffic flow.
The situation is completely different after the physical link is restored. IGP quickly discovers new neighbors and reconverges; LDP is slower. In the short interval between IGP convergence and LDP synchronization router A sends IP packets through the new shortest path toward the egress router with no label (it hasn’t received a label from D yet and thus has no outgoing label to use). Router D, not running BGP, has no idea what to do with those packets and thus drops them.
There are three solutions to this problem:
Segment routing: MPLS-based segment routing uses IGP to propagate globally unique labels. There’s no need for LDP in an SR-MPLS environment. For more details watch the Segment Routing with MPLS part of MPLS Essentials webinar. I would use this one in new network designs.
LDP-IGP synchronization: Link cost of a newly established adjacency is set to the maximum value until LDP tells IGP it’s OK to use the link. You can configure the LDP-IGP synchronization in OSPF or IS-IS, either with the mpls ldp sync command in the routing protocol configuration or with the mpls ldp igp sync interface configuration command where you can also specify an additional delay (to ensure both IGP and LDP are completely stable before you start using the new link for packet forwarding).
LDP session protection: A router tries to retain LDP sessions with all its neighbors even if they are currently not directly connected (A and D in our link failure scenario). Since the router no longer receives multicast LDP hellos from such neighbors, it uses targeted LDP hellos (unicast UDP packets sent to neighbor’s LDP transport address) to prevent session timeouts.
Targeted LDP sessions allow the routers to retain LIB information that can be used immediately after IGP convergence. There’s also no need to reestablish LDP sessions with the newly-adjacent neighbors as they were never disconnected.
To configure this feature, use the mpls ldp session protection global configuration command. You can use an ACL to specify the neighbors you’re interested in (it makes more sense to use this feature on core links than on non-redundant access links) and the duration of the session protection (default: 24 hours).
More background information
Enterprise MPLS/VPN Deployment webinar describes the basics of LDP, LSP establishment and packet forwarding across MPLS networks (including label stack used by MPLS/VPN and penultimate hop popping).
If you need a bit more in-depth details, buy one of the MPLS books (unfortunately none of my books covers the new features like LDP session protection).
Revision history
- 2020-12-26
- Added information about SR-MPLS making LDP (and LDP/IGP synchronization) obsolete.
http://ripe63.ripe.net/archives/video/184/
Worth watching, real data from Randy (IIJ)
As Ivan wrote "when using independent label distribution and liberal label retention mode (default settings on most modern routers)", I assumed independent label distribution and liberal retention mode. In this case black-holes are surely possible, you get a black-hole as soon as LDP session on a link drops down (in VPN service, this happens even if LDP session drops down on the last link of the path).
http://www.juniper.net/techpubs/en_US/junos11.3/topics/reference/standards/ldp.html
max-metric router-lsa on-startup 120
Also, for the edge devices BGP PIC can again accelerate repair for backup paths. I would like to see an article on BGP PIC, especially the "no bgp recursion host" piece.
JUNOS defaults:
Downstream Unsolicited label distribution (as opposed to Downstream on Demand),
Ordered label distribution control (as opposed to Independent),
Liberal label retention (as opposed to Conservative)
Anyway, my claim on traffic black-hole remains valid, independently of label allocation mode.
What about the impact of NSF and BFD features in the IGP, they too if used have an affect on the process as well. Any thoughts?
Also, what about " donwstream on demand" vs. "unsolcited downstream" label distribution methods as part of the equation, any thoughts?
Thanks.
I am wondering whether the various vendors' implementations allow the IGP and the LDP to be always synchronised in an ECMP scenario, this way making this feature not needed in such scenario. I'm also wondering whether this is true for label imposition as well as for label swapping. Will be investigating soon but should anyone have any info already and willing to share, it'd be highly appreciated !
Cheers/Ciao
Andrea