Junos Day One: MPLS Behind The Scenes

When I started making my first wobbling steps into the Junos MPLS world, Dan (@Johansfo) Backman took time to explain the differences between Cisco IOS and Junos MPLS implementations (and some of the reasons they are so different). This is my feeble attempt at describing what I understood he told me.

A bit of a history first

The fundamental reason for widely different MPLS implementation is the first use case: Cisco IOS started with tag switching targeted at IP-over-ATM transport, where every IP prefix needs an end-to-end LSP; Junos started with layer-3 MPLS/VPNs, where you need LSPs only toward BGP next hops (or was it MPLS-TE? See the comments).

MPLS in Cisco IOS

Let’s revisit how LDP-based MPLS works in Cisco IOS and what its data structures are:

  • Every routing protocol has its own data structures (OSPF or IS-IS topology databases) and its own routing table (SPF results in OSPF or IS-IS or Routing Information Base – RIB – in BGP);
  • Routes from per-protocol routing tables are copied into the main IP routing table using administrative distance as the criterion to prefer routes from one routing protocol over routes from another one;
  • Fully evaluated entries from the main routing table are copied into IP forwarding table (FIB or CEF table);
  • LDP assigns a label to every non-BGP entry in the FIB, stores those labels in LFIB and in its LDP database and advertises them to LDP neighbors;
  • CEF table and LDP database are combined to find outbound labels (labels assigned to IP prefixes by next-hop routers) that are then used in CEF table (FIB) and LFIB.

The following diagram illustrates the protocols, data structures and their relationships.

MPLS in Junos

Junos has a completely different approach to MPLS. Let’s start with IP routing tables:

  • Some routing protocols still have their own data structures (OSPF or IS-IS topology database), others don't (BGP and RIP).
  • There are no per-protocol IP routing tables (or BGP RIB); entries from different routing protocols are stored directly in the main IP routing table (inet.0);
  • Active routes from the IP routing table are copied into the IP forwarding table (because inet.0 serves both as IP routing table and BGP RIB, you might have inactive routes in the inet.0 table);

LDP and other label distribution protocols (for example, MPLS-TE) create local labels and FEC-to-label mappings:

  • Labels received LDP neighbors are stored in the LDP database;
  • Labels received from next-hop routers are also stored in the FEC mapping table (inet.3);
  • Local LDP labels are created for all entries in the inet.3 table (thus implementing ordered label distribution control) and stored in the LDP database;
  • Local labels are also created for loopback interfaces (default behavior) or IP prefixes matched by the egress-policy routing policy;
  • Local-to-next-hop label mappings are stored in Label Routing Table (mpls.0) and copied into Label Forwarding Table (LFIB).

Finally, Junos uses the FEC mapping table to insert outbound labels into the IP routing table (not just FIB). The FEC mapping table is (by default) used only for BGP destinations. Traffic toward BGP next hop (for example, SNMP traffic sent to a PE-router’s loopback interface) is thus not labeled, traffic for BGP destinations using the same next hop is.

The interactions between OSPF, BGP, LDP, and various Junos data structures are shown in the following diagram:

Default behavior

If you enable MPLS (using the default settings) in a Cisco IOS-based network, every router generates labels for all non-BGP IP prefixes, and all the traffic is labeled by the first-hop routers.

If you enable MPLS (yet again using default settings) in a Junos-based network, the routers generate labels only for the loopback interfaces, and label only the traffic sent toward BGP destinations reachable through loopback-based BGP next hops.

In a multi-vendor network, you’ll get a mixture of both behaviors:

  • Labels will be assigned to most prefixes by most routers. Once a Cisco IOS router allocates a label to an IGP prefix, all upstream Junos routers will allocate labels to the same prefix;
  • IP traffic received by a Cisco IOS router will be labeled if at all possible (outbound label is entered in the FIB whenever there’s a corresponding mapping in LDP database);
  • Junos routers will label only IP traffic for BGP destinations.

Yet again, remember that this section describes default behavior; you can change it on both Cisco IOS and Junos.

An example is worth more than a thousand words

To illustrate the Junos MPLS behavior, let’s look at data structures in a small OSPF/BGP/LDP network. I took a sample MPLS network created by Dan Backman, disabled RSVP, enabled LDP, and added a global EBGP connection (to test global BGP behavior):

The IP routing table on R3 contains all directly connected, IGP and BGP destinations:

IP routing table @ R3

root@R3> show route table inet.0 terse

inet.0: 37 destinations, 37 routes (37 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

A Destination        P Prf   Metric 1   Metric 2  Next hop        AS path
* 10.0.2.0/30        D   0                       >ge-0/0/5.0
* 10.0.2.1/32        L   0                        Local
* 10.0.2.4/30        D   0                       >ge-0/0/4.0
* 10.0.2.5/32        L   0                        Local
* 10.0.2.8/30        O  10          2             10.0.2.6
                                                 >10.0.2.2
* 10.0.2.12/30       D   0                       >ge-0/0/6.0
* 10.0.2.13/32       L   0                        Local
* 10.0.2.16/30       O  10          2            >10.0.2.6
* 10.0.4.0/30        D   0                       >ge-0/0/3.0
* 10.0.4.2/32        L   0                        Local
* 10.0.4.4/30        O  10          2             10.0.4.13
                                                 >10.0.4.1
* 10.0.4.8/30        O  10          2            >10.0.4.1
                                                  10.0.2.6
* 10.0.4.12/30       D   0                       >ge-0/0/2.0
* 10.0.4.14/32       L   0                        Local
* 10.0.4.16/30       O  10          2            >10.0.4.13
                                                  10.0.2.6
* 10.0.8.0/30        O  10          2            >10.0.2.14
* 10.0.8.4/30        O  10          2            >10.0.2.2
                                                  10.0.2.14
* 10.0.8.8/30        O  10          2            >10.0.2.2
* 10.0.8.12/30       O  10          3             10.0.2.6
                                                 >10.0.2.2
                                                  10.0.2.14
* 10.0.255.1/32      O  10          1            >10.0.4.13
* 10.0.255.2/32      O  10          1            >10.0.4.1
* 10.0.255.3/32      D   0                       >lo0.0
* 10.0.255.4/32      O  10          1            >10.0.2.6
* 10.0.255.5/32      O  10          1            >10.0.2.2
* 10.0.255.6/32      O  10          1            >10.0.2.14
* 10.0.255.7/32      O  10          2            >10.0.2.6
                                                  10.0.2.2
* 10.0.255.8/32      O  10          2            >10.0.2.14
* 10.233.240.0/20    D   0                       >ge-0/0/0.0
* 10.233.255.239/32  L   0                        Local
* 172.17.20.0/23     B 170        100             10.0.2.6        65022 I
                                                 >10.0.2.2
* 172.17.30.0/23     B 170        100            >10.0.2.6        65022 I
                                                  10.0.2.2
* 172.17.31.0/30     O  10          3             10.0.2.6
                                                 >10.0.2.2
* 192.168.0.0/24     O  10          3            >10.0.2.14
* 192.168.1.0/24     O  10          3            >10.0.2.14
* 192.168.2.0/24     O  10          3            >10.0.2.14
* 192.168.3.0/24     O  10          3            >10.0.2.14
* 224.0.0.5/32       O  10          1             MultiRecv

FEC mapping table (inet.3) is much shorter than the IP routing table. It contains only the loopback addresses (the /32 prefixes), and an IP prefix for external BGP next hop that I added to LDP using egress-policy on R7 (the /30 prefix). Prefixes advertised by adjacent routers (R1, R2, R4 and R5) don’t have an outbound label (due to penultimate hop popping), R7’s loopback and EBGP next hop do.

FEC mapping table @ R3

root@R3> show route table inet.3

inet.3: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.0.255.1/32      *[LDP/9] 00:04:52, metric 1
                    > to 10.0.4.13 via ge-0/0/2.0
10.0.255.2/32      *[LDP/9] 03:52:19, metric 1
                    > to 10.0.4.1 via ge-0/0/3.0
10.0.255.4/32      *[LDP/9] 00:02:32, metric 1
                    > to 10.0.2.6 via ge-0/0/4.0
10.0.255.5/32      *[LDP/9] 03:52:17, metric 1
                    > to 10.0.2.2 via ge-0/0/5.0
10.0.255.7/32      *[LDP/9] 00:02:32, metric 1
                      to 10.0.2.6 via ge-0/0/4.0, Push 299872
                    > to 10.0.2.2 via ge-0/0/5.0, Push 299888
172.17.31.0/30     *[LDP/9] 00:02:32, metric 1
                      to 10.0.2.6 via ge-0/0/4.0, Push 299872
                    > to 10.0.2.2 via ge-0/0/5.0, Push 299888

Local labels are created for all prefixes in inet.3 table; locally-originated prefixes don’t need labels; they are associated with label 3 (POP).

LDP-generated part of the LFIB table on R3

root@R3> show route table mpls.0 terse protocol ldp

mpls.0: 22 destinations, 22 routes (21 active, 0 holddown, 1 hidden)
+ = Active Route, - = Last Active, * = Both

A Destination        P Prf   Metric 1   Metric 2  Next hop        AS path
* 299776             L   9          1            >10.0.4.1
* 299776(S=0)        L   9          1            >10.0.4.1
* 299824             L   9          1            >10.0.2.2
* 299824(S=0)        L   9          1            >10.0.2.2
* 299888             L   9          1            >10.0.4.13
* 299888(S=0)        L   9          1            >10.0.4.13
* 299904             L   9          1            >10.0.2.6
* 299904(S=0)        L   9          1            >10.0.2.6
* 299920             L   9          1             10.0.2.6
                                                 >10.0.2.2

If you display detailed information from the mpls.0 table, you can also see the IP prefixes associated with the label entry (note that there are two IP prefixes associated with the same label):

A single entry in the LFIB table

root@R3> show route table mpls.0 protocol ldp detail | find 299920
299920 (1 entry, 1 announced)
        *LDP    Preference: 9
                Next hop type: Router
                Next-hop reference count: 1
                Next hop: 10.0.2.6 via ge-0/0/4.0
                Label operation: Swap 299872
                Next hop: 10.0.2.2 via ge-0/0/5.0, selected
                Label operation: Swap 299888
                State: <Active Int>
                Local AS: 65412
                Age: 6:49       Metric: 1
                Task: LDP
                Announcement bits (1): 0-KRT
                AS path: I
                Prefixes bound to route: 10.0.255.7/32
                                         172.17.31.0/30

LDP database contains all the information received from LDP neighbors or advertised to them; you can inspect the whole LDP database or entries received or sent to an individual neighbor. As expected, output label database contains labels for all entries in inet.3 table and label 3 (POP) for all locally-originated LDP prefixes.

Parts of LDP database on R3 (limited to R5)

root@R3> show ldp database session 10.0.255.5
Input label database, 10.0.255.3:0--10.0.255.5:0
  Label     Prefix
 299936     10.0.255.1/32
 299872     10.0.255.2/32
 299856     10.0.255.3/32
 299952     10.0.255.4/32
      3     10.0.255.5/32
 299888     10.0.255.7/32
 299888     172.17.31.0/30

Output label database, 10.0.255.3:0--10.0.255.5:0
  Label     Prefix
 299888     10.0.255.1/32
 299776     10.0.255.2/32
      3     10.0.255.3/32
 299904     10.0.255.4/32
 299824     10.0.255.5/32
 299920     10.0.255.7/32
 299920     172.17.31.0/30

Last but definitely not least, let’s inspect the routing table entry for the external BGP next hop. You won’t find a label in this entry.

Route toward BGP next-hop on R3

root@R3> show route 172.17.31.0/30 extensive

inet.0: 37 destinations, 37 routes (37 active, 0 holddown, 0 hidden)
172.17.31.0/30 (1 entry, 1 announced)
TSI:
KRT in-kernel 172.17.31.0/30 -> {10.0.2.2}
        *OSPF   Preference: 10
                Next hop type: Router, Next hop index: 262148
                Next-hop reference count: 6
                Next hop: 10.0.2.6 via ge-0/0/4.0
                Next hop: 10.0.2.2 via ge-0/0/5.0, selected
                State: <Active Int>
                Local AS: 65412
                Age: 3:44:50    Metric: 3
                Area: 0.0.0.0
                Task: OSPF
                Announcement bits (3): 0-KRT 3-LDP 5-Resolve tree 2
                AS path: I

Just to ensure there’s no other magic going on behind the scenes, let’s inspect the forwarding table entry for the same prefix. Yet again, no label.

Forwarding entry for BGP next-hop on R3

root@R3> show route forwarding-table destination 172.17.31.0/30 extensive
Routing table: default.inet [Index 0]
Internet:

Destination:  172.17.31.0/30
  Route type: user
  Route reference: 0                   Route interface-index: 0
  Flags: sent to PFE, rt nh decoupled
  Nexthop: 10.0.2.2
  Next-hop type: unicast               Index: 631      Reference: 16
  Next-hop interface: ge-0/0/5.0

On the other hand, the routing table entry for a BGP destination using that same BGP next hop has two labels (because R3 can reach the EBGP next hop through two next-hop routers, R4 and R5.

BGP route on R3

root@R3> show route 172.17.20.0/23 extensive

inet.0: 37 destinations, 37 routes (37 active, 0 holddown, 0 hidden)
172.17.20.0/23 (1 entry, 1 announced)
TSI:
KRT in-kernel 172.17.20.0/23 -> {indirect(262146)}
        *BGP    Preference: 170/-101
                Next hop type: Indirect
                Next-hop reference count: 6
                Source: 10.0.255.7
                Next hop type: Router, Next hop index: 262154
                Next hop: 10.0.2.6 via ge-0/0/4.0
                Label operation: Push 299872
                Next hop: 10.0.2.2 via ge-0/0/5.0, selected
                Label operation: Push 299888
                Protocol next hop: 172.17.31.2
                Indirect next hop: 90d23c0 262146
                State: <Active Int Ext>
                Local AS: 65412 Peer AS: 65412
                Age: 3:50:47    Metric2: 1
                Task: BGP_65412.10.0.255.7+179
                Announcement bits (2): 0-KRT 5-Resolve tree 2
                AS path: 65022 I Aggregator: 65022 172.17.21.2
                Accepted
                Localpref: 100
                Router ID: 10.0.255.7
                Indirect next hops: 1
                        Protocol next hop: 172.17.31.2 Metric: 1
                        Indirect next hop: 90d23c0 262146
                        Indirect path forwarding next hops: 2
                                Next hop type: Router
                                Next hop: 10.0.2.6 via ge-0/0/4.0
                                Next hop: 10.0.2.2 via ge-0/0/5.0
                        172.17.31.0/30 Originating RIB: inet.3
                          Metric: 1                       Node path count: 1
                          Forwarding nexthops: 2
                                Nexthop: 10.0.2.6 via ge-0/0/4.0
                                Nexthop: 10.0.2.2 via ge-0/0/5.0

The outgoing label is also copied into the forwarding table. By default, Junos doesn’t perform load balancing toward BGP destinations; the forwarding table thus contains only a single outgoing label.

Forwarding entry for a BGP route on R3

root@R3> show route forwarding-table destination 172.17.20.0/23 extensive
Routing table: default.inet [Index 0]
Internet:

Destination:  172.17.20.0/23
  Route type: user
  Route reference: 0                   Route interface-index: 0
  Flags: sent to PFE, prefix load balance
  Next-hop type: indirect              Index: 262146   Reference: 3
  Nexthop: 10.0.2.2
  Next-hop type: Push 299888           Index: 668      Reference: 1
  Next-hop interface: ge-0/0/5.0

4 comments:

  1. I found your blog post very interesting. I think you may have missed a point in the history of MPLS and the differences in the approach of Cisco and Juniper as far as MPLS. I believe your history on the Cisco side is correct and that they were looking for a way to do switching through the network without doing an ip lookup at each hop and Cisco was pushing LDP as the standard for MPLS.

    I believe that Juniper’s first routers on the other hand were built on ASICs so they did not have the same constraints as Cisco on IP lookups. The thing that Juniper was trying to address with initial MPLS was a way to replace the TE capabilities that were present with ATM. I think ISPs wanted to do away with the ATM switches but still have Traffic Engineering capabilities. My understanding is this is the reason that Juniper was pushing RSVP as the standard of MPLS.

    Since in RSVP you have an LSP going to a single address the loopback of the remote device it makes sense to only have that single IP address associated with that LSP. I think that Junipers LDP behaviour is just applying their first implementation approach RSVP to what they implemented later in LDP. This is just a guess on the reason for the implementation difference but I know that Cisco pushed LDP and Juniper pushed RSVP at the standard when MPLS first came around.

    Grumpy

    ReplyDelete
  2. Thank you! You're probably right, Juniper came out with edge routers years after the core ones, so their first MPLS application was most likely MPLS-TE (yet again, focusing on transporting traffic toward BGP destinations)

    BTW, Cisco also had RSVP-based MPLS-TE very early on, but it took them a while to get the headend CB-SPF sorted out.

    ReplyDelete
  3. And your handwritten pictures? (I prefer this, probably made by Visio) :-P

    ReplyDelete
  4. These ones were actually made in PowerPoint (I dropped Visio years ago when Microsoft bought and bloated it). The hand-drawn ones make sense for simple things that I'll probably never reuse.

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.