… updated on Friday, January 10, 2025 11:57 +0100
Is BGP PIC Edge an Oxymoron?
This blog post discusses an old arcane question that has been nagging me from the bottom of my Inbox for almost exactly four years. Please skip it if it sounds like Latin to you, but if you happen to be one of those readers who know what I’m talking about, I’d appreciate your comments.
Terminology first:
- Prefix Independent Convergence allows entries in the forwarding table to point to shared next hops (or next-hop groups), reducing the FIB update bottleneck when changing the next hop for a large number of prefixes (for example, when dealing with a core link failure). More details in the initial blog post and PIC applicability to fast reroute.
- PIC Edge (as defined by vendor marketing) is the ability to switch to a backup CE route advertised to a backup PE router before the network convergence is complete.
Here’s (in a nutshell) how PIC Edge is supposed to work:
- Backup PE router receives a route from a CE router that it does not use (because it has a better route from the primary PE router).
- The backup PE router nonetheless advertises the CE route (BGP Best External functionality I described in this video
- The primary PE router eventually receives the backup CE route and stores it (yet again, without using it). Obviously, if our network uses route reflectors, we need a bit of extra magic (BGP Add Path or per-PE VRF route distinguishers) to make this work.
- The backup CE route (as advertised through the MPLS/VPN core) carries a label that points straight to the PE-CE interface. This allows the primary PE router to send traffic straight to the backup PE-CE interface before the backup PE router knows it should use the backup PE-CE interface to reach the CE router.
Now for the PIC Edge trick: when the primary PE-CE link fails, the primary PE router rewrites its LFIB entry to send the traffic for the now-unreachable destinations to the backup PE router and straight through the PE-CE interface to the CE router. That roundabout forwarding path works immediately, even before the primary PE router sends a BGP update saying, “I lost the CE prefixes.” Once the BGP updates are propagated, everyone installs new forwarding entries and stops sending the traffic to the (previous) primary PE router. Eventually, the (former) primary PE router cleans up its LFIB table.
At this point, we’re ready for the crux of the blog post: PIC Edge needs per-prefix (or per-CE) VPN labels. With the per-VRF labels, we’d get a temporary micro-loop between the primary and the backup PE routers (the details are left as an exercise for the reader). That’s why we can’t get PIC Edge in most EVPN implementations.
However, using per-prefix VPN labels (the default on Cisco IOS, where we first encountered the PIC Edge idea) effectively blocks the Prefix Independent Convergence part of the PIC Edge as each prefix uses a different VPN MPLS label. The only way to reduce the number of FIB updates seems to be the per-CE label allocation mode. That’s the default setting on Junos and available on IOS XR, FRRouting, and newer IOS XE releases (but not on Nexus OS).
Finally, Cisco’s documentation for the IOS XE release 16.6 claims that using PIC with per-CE labels is not supported and PIC Edge without per-CE labels sounds like an oxymoron to me. What am I missing?
Revision History
- 2024-12-06
- As pointed out by Thomas, you can use per-PE RD instead of BGP Add Path functionality to avoid information loss on BGP route reflectors.
- 2025-01-10
- Harold fixed my lack of Google-Fu. Per-CE label allocation is available on IOS XE, at least from release 16.6 (with some interesting limitations). Updated the last paragraph.
Just copy your last paragraph and upload it to ChatGPT. The answer is pretty impressive (though I'm not an advocate of ChatGPT). Finally you could build a corresponding topology with netlab and test it.
It's still the "sloppy intern bullshit" (as in "some things are factually wrong") but I agree it sounds pretty impressive.
As for "netlab topology", we might be able to inspect the forwarding tables, but who knows what would really happen in ASICs. I try to stay away from testing data-plane or real-time features in virtual labs.
I think with a test lab (IOS XE with per-prefix VPN labels) you would be able to prove the oxymoron with the help of debug outputs. I see no need for support of data-plane features in this case.
You will never know the internals of ASICs anyway unless you sign an NDA.
And how do you potentially monitor. BMP Local RIB, RFC 9069 with Path Marking, https://datatracker.ietf.org/doc/html/draft-ietf-grow-bmp-path-marking-tlv.
You have no mass withdrawal mechanism for L3VPN PE-CE link fail scenario, so BGP PIC EDGE is a kind of impossible in this use case.
There was a draft to introduce such thing to enable BGP PIC edge for L3VPN https://www.ietf.org/archive/id/draft-raszuk-aggr-withdraw-00.txt but nobody cares.
PE fail scenario is still relevant, but that is another story even if per-vrf label allocation mode is used. You just need to advertise BestExternal route labels in a per-NH fashion.
mpls label mode {vrf vrf-name | all-vrfs} protocol {bgp-vpnv4 | bgp-vpnv6 | all-afs} {per-ce}