Internet-in-a-VRF and LFIB explosion

Matthew Stone encountered another unintended consequence of full Internet routing in a VRF design: the TCAM on his 6500 was 80% utilized even though he has the new Sup modules with one million IPv4 routes.

A closer look revealed the first clue: L3 forwarding resources on a Cat6500 are shared between IPv4 routes and MPLS labels (don’t know about you, but I was not aware of that) and half the entries were consumed by MPLS labels:

L3 Forwarding Resources
FIB TCAM usage: Total Used %Used
72 bits (IPv4, MPLS, EoM) 1048576 843727 80%
144 bits (IP mcast, IPv6) 524288 11654 2%
288 bits (IPv6 mcast) 262144 3 1%

detail: Protocol Used %Used
IPv4 433781 41%
MPLS 409945 39%
EoM 1 1%

IPv6 11639 2%
IPv4 mcast 15 1%
IPv6 mcast 3 1%

What’s up?

There’s a fundamental difference in the way MPLS assigns labels to BGP routes in different routing tables:

  • MPLS labels are not assigned to BGP routes in the global routing table. When the router copies BGP routes from RIB into FIB, it uses the labels its downstream neighbor allocated to the BGP next hop. All BGP routes advertised by the same BGP next hop thus get the same label.
  • A unique MPLS label is assigned to every VRF route when it’s imported into VPNv4 address family. In the Internet-in-the-VRF design, the Internet edge PE-routers receive Internet routing through EBGP sessions running in a VRF, and those routes automatically appear in the VPNv4 address family (and get their labels) even if they are never propagated to other PE-routers.

Net result: if you have plenty of BGP routes in the global routing table (for example, around 450.000), your router allocates a local MPLS label for each BGP next hop. If those routes move to a VRF, your router allocates a local MPLS label for each route.

Why all the fuss?

To make the long story short: the creators of the MPLS architecture wanted to minimize forwarding hardware requirements and thus created a solution that ensures LSRs (including PE-routers) forward the packets (both IPv4 and labeled packets) with a single lookup in a single table.

The proof is left as an exercise for the reader. I know a really good one, but it wouldn’t fit in the sidebar of this blog post.

Can we fix it? Yes we can!

Wherever there’s a challenge, there’s a kludge. In this particular case, the magic command is mpls label mode vrf Internet protocol all-afs per-vrf. This command changes the label allocation mechanism from one-label-per-prefix to one-label-per-VRF.

With the changed label allocation model, the incoming label no longer uniquely identifies the outgoing interface and IP next hop. The egress PE-router thus has to perform two lookups: label lookup to identify the next lookup table (VRF FIB), and IPv4 destination address lookup in the VRF FIB.

The performance hit on the Cat 6500 seems to be minimal (at least the documentation claims so), but you lose the ability to do EIBGP multipathing (IPv4 lookup in the egress PE-router could lead to forwarding loops) and Carrier’s Carrier functionality (IPv4 lookup in the egress PE-router breaks the end-to-end LSP between CE-routers) in the VRFs for which you’ve configured per-VRF label allocation.

More information

You’ll find most of what you need to know about MPLS/VPN design and deployment in enterprise networks in my Enterprise MPLS/VPN Deployment webinar, and there are plenty of great books if you need in-depth technical details. Last but definitely not least, I’m always available for short consulting sessions or design reviews.

10 comments:

  1. Don't forget about VPN CAM when using per-vrf labels. Or you will hit recirculation for overflowing vrfs (more than 512).
  2. Love these types of solid and concrete posts!
  3. How does JUNOS handle this?
  4. Assigning an MPLS label per next-hop would be nice middle-of-the-road solution. No need to do double lookups and save lot of memory.
  5. Already happened to face and solve the issue, but the post is great for sure.
  6. On ASR9K/XR theres a third solution which avoids the extra lookup, you can do per next-hop label allocation

    qoute from: http://www.cisco.com/en/US/docs/ios_xr_sw/iosxr_r3.6/routing/configuration/guide/rc36bgp.html#wpmkr1456095

    label-allocation-mode per-ce

    Configures the per-CE label allocation mode to avoid an extra lookup on the PE router and conserve label space (per-prefix is the default label allocation mode). In this mode, the PE router allocates one label for every immediate next-hop (in most cases, this would be a CE router). This label is directly mapped to the next hop, so there is no VRF route lookup performed during data forwarding. However, the number of labels allocated would be one for each CE rather than one for each VRF. Because BGP knows all the next hops, it assigns a label for each next hop (not for each PE-CE interface). When the outgoing interface is a multiaccess interface and the media access control (MAC) address of the neighbor is not known, Address Resolution Protocol (ARP) is triggered during packet forwarding.
  7. This is available from 12.2(33)SXH and later on Cat 6500 and the command is the following:
    mpls label mode all-vrfs protocol bgp-vpnv4 per-vrf

    On 12.2SR code the command is:
    mpls label mode all-vrfs protocol all-afs per-vrf
  8. IOS XE3.10S (ASR1k) "per-CE label allocation":

    http://www.cisco.com/en/US/docs/routers/asr1000/release/notes/asr1k_feats_important_notes_310s.html#wp3378629
  9. Juniper does per next-hop (like per-ce) as well. Configure your export policy like this:
    then {
    label-allocation per-nexthop;
    community add vpn1;
    accept;
    }

    I don't know why this wasn't the default/standard from all vendors right from the start..
Add comment
Sidebar