Is BGP PIC Edge an Oxymoron? « ipSpace.net blog

Wednesday, December 4, 2024 09:09 +0100… updated on Friday, January 10, 2025 11:57 +0100

Is BGP PIC Edge an Oxymoron?

This blog post discusses an old arcane question that has been nagging me from the bottom of my Inbox for almost exactly four years. Please skip it if it sounds like Latin to you, but if you happen to be one of those readers who know what I’m talking about, I’d appreciate your comments.

Terminology first:

Prefix Independent Convergence allows entries in the forwarding table to point to shared next hops (or next-hop groups), reducing the FIB update bottleneck when changing the next hop for a large number of prefixes (for example, when dealing with a core link failure). More details in the initial blog post and PIC applicability to fast reroute.
PIC Edge (as defined by vendor marketing) is the ability to switch to a backup CE route advertised to a backup PE router before the network convergence is complete.

Here’s (in a nutshell) how PIC Edge is supposed to work:

Backup PE router receives a route from a CE router that it does not use (because it has a better route from the primary PE router).
The backup PE router nonetheless advertises the CE route (BGP Best External functionality I described in this video
The primary PE router eventually receives the backup CE route and stores it (yet again, without using it). Obviously, if our network uses route reflectors, we need a bit of extra magic (BGP Add Path or per-PE VRF route distinguishers) to make this work.
The backup CE route (as advertised through the MPLS/VPN core) carries a label that points straight to the PE-CE interface. This allows the primary PE router to send traffic straight to the backup PE-CE interface before the backup PE router knows it should use the backup PE-CE interface to reach the CE router.

Now for the PIC Edge trick: when the primary PE-CE link fails, the primary PE router rewrites its LFIB entry to send the traffic for the now-unreachable destinations to the backup PE router and straight through the PE-CE interface to the CE router. That roundabout forwarding path works immediately, even before the primary PE router sends a BGP update saying, “I lost the CE prefixes.” Once the BGP updates are propagated, everyone installs new forwarding entries and stops sending the traffic to the (previous) primary PE router. Eventually, the (former) primary PE router cleans up its LFIB table.

At this point, we’re ready for the crux of the blog post: PIC Edge needs per-prefix (or per-CE) VPN labels. With the per-VRF labels, we’d get a temporary micro-loop between the primary and the backup PE routers (the details are left as an exercise for the reader). That’s why we can’t get PIC Edge in most EVPN implementations.

However, using per-prefix VPN labels (the default on Cisco IOS, where we first encountered the PIC Edge idea) effectively blocks the Prefix Independent Convergence part of the PIC Edge as each prefix uses a different VPN MPLS label. The only way to reduce the number of FIB updates seems to be the per-CE label allocation mode. That’s the default setting on Junos and available on IOS XR, FRRouting, and newer IOS XE releases (but not on Nexus OS).

Finally, Cisco’s documentation for the IOS XE release 16.6 claims that using PIC with per-CE labels is not supported and PIC Edge without per-CE labels sounds like an oxymoron to me. What am I missing?

Revision History

2024-12-06: As pointed out by Thomas, you can use per-PE RD instead of BGP Add Path functionality to avoid information loss on BGP route reflectors.
2025-01-10: Harold fixed my lack of Google-Fu. Per-CE label allocation is available on IOS XE, at least from release 16.6 (with some interesting limitations). Updated the last paragraph.

5 comments:

Anonymous 04 December 2024 07:53

Just copy your last paragraph and upload it to ChatGPT. The answer is pretty impressive (though I'm not an advocate of ChatGPT). Finally you could build a corresponding topology with netlab and test it.

Ivan Pepelnjak 05 December 2024 10:22

It's still the "sloppy intern bullshit" (as in "some things are factually wrong") but I agree it sounds pretty impressive.

As for "netlab topology", we might be able to inspect the forwarding tables, but who knows what would really happen in ASICs. I try to stay away from testing data-plane or real-time features in virtual labs.

Anonymous 05 December 2024 08:14

I think with a test lab (IOS XE with per-prefix VPN labels) you would be able to prove the oxymoron with the help of debug outputs. I see no need for support of data-plane features in this case.
You will never know the internals of ASICs anyway unless you sign an NDA.

Thomas 05 December 2024 07:07

Correction: Add path on RR or unique RD on PE. Both ensure that path hiding at RR doesn’t occur.

And how do you potentially monitor. BMP Local RIB, RFC 9069 with Path Marking, https://datatracker.ietf.org/doc/html/draft-ietf-grow-bmp-path-marking-tlv.

Roman 07 December 2024 01:29

You have no mass withdrawal mechanism for L3VPN PE-CE link fail scenario, so BGP PIC EDGE is a kind of impossible in this use case.

There was a draft to introduce such thing to enable BGP PIC edge for L3VPN https://www.ietf.org/archive/id/draft-raszuk-aggr-withdraw-00.txt but nobody cares.

PE fail scenario is still relevant, but that is another story even if per-vrf label allocation mode is used. You just need to advertise BestExternal route labels in a per-NH fashion.

Harold Ritter 09 January 2025 06:55

per-ce label allocation mode is actually supported in IOS-XE through the following configuration:

mpls label mode {vrf vrf-name | all-vrfs} protocol {bgp-vpnv4 | bgp-vpnv6 | all-afs} {per-ce}

Blake 22 March 2025 12:00

Great stuff Ivan. Totally agree with Thomas on unique RDs per PE. I just have JunOS auto-generate type 1 RDs based on the PE loopback, which has the added bonus of not having to configure it manually for each VRF. https://www.juniper.net/documentation/us/en/software/junos/cli-reference/topics/ref/statement/route-distinguisher-id-edit-routing-options.html

Here are some other "fun" JunOS L3VPN PE protection/multipath tidbits I've learned recently:

PE-CE link protection requires vrf-table-label as of JunOS 12.3. https://www.juniper.net/documentation/us/en/software/junos/vpn-l3/topics/topic-map/l3-vpns-pe-link-protection.html I tried it in the lab without it, & it causes a circular VPN label generation ping-pong fight between the PEs... (bug or feature?)

chained-composite-nexthop and vpn-unequal-cost equal-external-internal nerd knobs are mutually exclusive: https://www.juniper.net/documentation/us/en/software/junos/vpn-l3/topics/topic-map/l3-vpns-load-balancing.html

Ivan Pepelnjak 22 March 2025 08:48

Thank you! It's always "lovely" to discover these unexpected details.

Revision History

Recent posts in the same categories

IP routing

MPLS

5 comments: