Penultimate Hop Popping (PHP) demystified

I got an interesting question after writing the Asymmetric MPLS MTU Problem post: “Why does PHP happen only on directly-connected interfaces but not on other non-MPLS routes?” Obviously it’s time for a deep dive into Penultimate Hop Popping (PHP) mysteries (warning label: read the MPLS books if you plan to get seriously involved with MPLS).

When developing the MPLS architecture, its designers had to consider two fundamental facts:

  • Label lookup is simpler than IP lookup;
  • Two lookups (label + IP) are more expensive (in whatever terms) than one lookup.

The “label lookup is simpler” is no longer true for hardware-based L3 switches, although the label lookup might still incur slightly lower latency as it can be done as a simple table lookup, not a TCAM search. Furthermore, some hardware-based switching platforms don’t support two lookups at all.

The goals they tried to achieve with PHP were thus:

  • Use implicit null whenever an IP lookup would have to be performed anyway to prevent two lookups;
  • Use explicit labels whenever you can.

There is a very clear case that always requires an IP lookup: a summary route generated by the router (for example, an OSPF ABR). Consider the following OSPF configuration:

router ospf 1
 log-adjacency-changes
 area 11 range 10.2.0.0 255.255.0.0

The area range command causes the IP prefix 10.2.0.0/16 to be advertised into the backbone area (assuming at least one more specific prefix in that range is in area 11). However, the router does not know in advance where to send the packets forwarded toward that prefix from the OSPF backbone; it has to perform a full IP lookup on them.

The area range command generates a summary route pointing to null 0 in IP routing table and CEF table to prevent forwarding loops:

C1#show ip route 10.2.0.0
Routing entry for 10.2.0.0/16
  Known via "ospf 1", distance 110, metric 60, type intra area
  Routing Descriptor Blocks:
  * directly connected, via Null0
      Route metric is 60, traffic share count is 1
C1#show ip cef 10.2.0.0 detail
10.2.0.0/16, epoch 0
  attached to Null0

The corresponding entry in LFIB contains implicit null label (displayed as None) and punt outgoing interface (meaning: do a full IP lookup).

C1#show mpls forwarding-table 10.2.0.0 detail
Local      Outgoing   Prefix           Bytes Label   Outgoing   Next Hop
Label      Label      or Tunnel Id     Switched      interface
None       No Label   10.2.0.0/16      0             punt
        MAC/Encaps=0/0, MRU=0, Label Stack{}
        No output feature configured

On the other hand, packets forwarded toward a prefix received from another router don’t need an IP lookup – we know who the next hop is, we can build full L2 header in advance and afterwards forward IP packets toward the destination without inspecting the IP header.

Let’s look at an OSPF route received from another router in the same area. As expected, it’s associated with an IP next hop and an outgoing interface:

C1#show ip route 10.2.2.0
Routing entry for 10.2.2.0/24
  Known via "ospf 1", distance 110, metric 60, type intra area
  Last update from 10.0.7.5 on Serial1/0, 00:09:46 ago
  Routing Descriptor Blocks:
  * 10.0.7.5, from 10.0.1.1, 00:09:46 ago, via Serial1/0
      Route metric is 60, traffic share count is 1
C1#show ip cef 10.2.2.0 detail
10.2.2.0/24, epoch 0
  local label info: global/1004
  nexthop 10.0.7.5 Serial1/0

The corresponding LFIB entry contains the outgoing interface and the full L2 header (in our case, the PPP header associated with IP datagrams).

C1#show mpls forwarding-table 10.2.2.0 detail
Local      Outgoing   Prefix           Bytes Label   Outgoing   Next Hop
Label      Label      or Tunnel Id     Switched      interface
1004       No Label   10.2.2.0/24      0             Se1/0      point2point
        MAC/Encaps=4/4, MRU=1504, Label Stack{}
        FF030021
        No output feature configured

The outgoing label associated with an IP prefix received from another router could be:

  • No Label if MPLS is not enabled on the outgoing interface or if the next-hop router has not advertised a label for the prefix;
  • Pop label if the next-hop router has advertised implicit null label for the prefix;
  • Label advertised by the next-hop router for the IP prefix through LDP.

Going one step further, let’s look at an LFIB entry containing an outgoing label. The L2 information associated with the LFIB entry contains not only the L2 header (yet again, PPP header in our printout), but also the outgoing MPLS label stack. The Label Switch Router (LSR) performing the label lookup thus fetches the full outgoing header (L2 header + MPLS label stack) during the lookup operation.

C2#show mpls forwarding 10.2.2.0 detail
Local      Outgoing   Prefix           Bytes Label   Outgoing   Next Hop
Label      Label      or Tunnel Id     Switched      interface
2014       1004       10.2.2.0/24      0             Se1/1      point2point
        MAC/Encaps=4/8, MRU=1500, Label Stack{1004}
        FF030281 003EC000
        No output feature configured

When comparing the above printout with the previous one, note the difference in PPP protocol type (0x0021 for IP versus 0x0281 for labeled packets). The 0x03EC value in the MPLS label header is the outgoing label (1004) displayed in hex.

Just in case you’re wondering how an LFIB entry looks like for LAN next hops, here’s a sample printout. As expected, full MAC header is included in the LFIB entry (0x8847 at the end of the MAC header is MPLS Ethertype).

PE-A#show mpls forwarding-table 10.0.1.3 detail
Local      Outgoing   Prefix           Bytes Label   Outgoing   Next Hop
Label      Label      or Tunnel Id     Switched      interface
25         16         10.0.1.3/32      0             Fa0/0      10.2.2.2
        MAC/Encaps=14/18, MRU=1500, Label Stack{16}
        CA0107880008CA00078800088847 00010000
        No output feature configured

To summarize what we’ve discovered so far:

  • Summary routes always require an IP lookup. The routers always advertise implicit null or explicit null prefix (if so configured) for them.
  • We don’t need to perform IP lookups for packets sent toward prefixes associated with explicit IP next hops; we can build outgoing L2 header in advance and fetch it during label lookup.

Last step: the connected interfaces. Obviously, there’s no next hop associated with a connected interface. In fact, the router has to use a different L2 header for every directly connected host and has to perform an IP lookup on inbound packets to figure out which directly connected host is the final destination (and which L2 header to fetch from the ARP table on LAN interfaces). LDP therefore advertises null labels (implicit or explicit) for directly connected IP prefixes, causing penultimate hop popping (when the implicit null is advertised) on the upstream router.

A directly connected subnet could also be considered a summary route for a bunch of host routes. In fact, that’s exactly how L3 switching for directly connected subnets is implemented in many hardware-based L3 switches.

More information

You’ll find in-depth description of MPLS/VPN technology and enterprise network deployment hints in my Enterprise MPLS/VPN Deployment webinar (register for a live session or buy its recording). For more VPN webinars, check my VPN webinar roadmap. You get access to all those webinars when you buy the yearly subscription.

4 comments:

  1. Great post. Thanks for extensive explanation!

    ReplyDelete
  2. Excellent explanation,but i still have a doubt lingering in my mind.

    PHP can be requested by a router in the core of the SP network(it might be doing some summarization)?

    ReplyDelete
  3. Doubt no more! Whenever a router has to perform an L3 lookup, it will signal implicit (or explicit) null. ABR is no exception (which, as it happens, is also documented above ;) ). That's the reason you should never summarize BGP next hops in MPLS/VPN networks.

    ReplyDelete
  4. Hello Ivan,

    If i have a scenario such as PE1 ---- P1 ---- PE2 .
    If instead of advertising labels for loopback interfaces, we advertise for directly connected interfaces between these two routers, then will it work? ( considering that PE1 and PE2 each have reachability to one another ? )

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.