Junos Day One: MPLS Behind The Scenes

When I started making my first wobbling steps into the Junos MPLS world, Dan (@Johansfo) Backman took time to explain the differences between Cisco IOS and Junos MPLS implementations (and some of the reasons they are so different). This is my feeble attempt at describing what I understood he told me.

A bit of a history first

The fundamental reason for widely different MPLS implementation is the first use case: Cisco IOS started with tag switching targeted at IP-over-ATM transport, where every IP prefix needs an end-to-end LSP; Junos started with layer-3 MPLS/VPNs, where you need LSPs only toward BGP next hops (or was it MPLS-TE? See the comments).

MPLS in Cisco IOS

Let’s revisit how LDP-based MPLS works in Cisco IOS and what its data structures are:

  • Every routing protocol has its own data structures (OSPF or IS-IS topology databases) and its own routing table (SPF results in OSPF or IS-IS or Routing Information Base – RIB – in BGP);
  • Routes from per-protocol routing tables are copied into the main IP routing table using administrative distance as the criterion to prefer routes from one routing protocol over routes from another one;
  • Fully evaluated entries from the main routing table are copied into IP forwarding table (FIB or CEF table);
  • LDP assigns a label to every non-BGP entry in the FIB, stores those labels in LFIB and in its LDP database and advertises them to LDP neighbors;
  • CEF table and LDP database are combined to find outbound labels (labels assigned to IP prefixes by next-hop routers) that are then used in CEF table (FIB) and LFIB.

The following diagram illustrates the protocols, data structures and their relationships.

MPLS in Junos

Junos has a completely different approach to MPLS. Let’s start with IP routing tables:

  • Some routing protocols still have their own data structures (OSPF or IS-IS topology database), others don't (BGP and RIP).
  • There are no per-protocol IP routing tables (or BGP RIB); entries from different routing protocols are stored directly in the main IP routing table (inet.0);
  • Active routes from the IP routing table are copied into the IP forwarding table (because inet.0 serves both as IP routing table and BGP RIB, you might have inactive routes in the inet.0 table);

LDP and other label distribution protocols (for example, MPLS-TE) create local labels and FEC-to-label mappings:

  • Labels received LDP neighbors are stored in the LDP database;
  • Labels received from next-hop routers are also stored in the FEC mapping table (inet.3);
  • Local LDP labels are created for all entries in the inet.3 table (thus implementing ordered label distribution control) and stored in the LDP database;
  • Local labels are also created for loopback interfaces (default behavior) or IP prefixes matched by the egress-policy routing policy;
  • Local-to-next-hop label mappings are stored in Label Routing Table (mpls.0) and copied into Label Forwarding Table (LFIB).

Finally, Junos uses the FEC mapping table to insert outbound labels into the IP routing table (not just FIB). The FEC mapping table is (by default) used only for BGP destinations. Traffic toward BGP next hop (for example, SNMP traffic sent to a PE-router’s loopback interface) is thus not labeled, traffic for BGP destinations using the same next hop is.

The interactions between OSPF, BGP, LDP, and various Junos data structures are shown in the following diagram:

Default behavior

If you enable MPLS (using the default settings) in a Cisco IOS-based network, every router generates labels for all non-BGP IP prefixes, and all the traffic is labeled by the first-hop routers.

If you enable MPLS (yet again using default settings) in a Junos-based network, the routers generate labels only for the loopback interfaces, and label only the traffic sent toward BGP destinations reachable through loopback-based BGP next hops.

In a multi-vendor network, you’ll get a mixture of both behaviors:

  • Labels will be assigned to most prefixes by most routers. Once a Cisco IOS router allocates a label to an IGP prefix, all upstream Junos routers will allocate labels to the same prefix;
  • IP traffic received by a Cisco IOS router will be labeled if at all possible (outbound label is entered in the FIB whenever there’s a corresponding mapping in LDP database);
  • Junos routers will label only IP traffic for BGP destinations.

Yet again, remember that this section describes default behavior; you can change it on both Cisco IOS and Junos.

An example is worth more than a thousand words

To illustrate the Junos MPLS behavior, let’s look at data structures in a small OSPF/BGP/LDP network. I took a sample MPLS network created by Dan Backman, disabled RSVP, enabled LDP, and added a global EBGP connection (to test global BGP behavior):

The IP routing table on R3 contains all directly connected, IGP and BGP destinations:

IP routing table @ R3

root@R3> show route table inet.0 terse

inet.0: 37 destinations, 37 routes (37 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

A Destination P Prf Metric 1 Metric 2 Next hop AS path
* 10.0.2.0/30 D 0 >ge-0/0/5.0
* 10.0.2.1/32 L 0 Local
* 10.0.2.4/30 D 0 >ge-0/0/4.0
* 10.0.2.5/32 L 0 Local
* 10.0.2.8/30 O 10 2 10.0.2.6
>10.0.2.2
* 10.0.2.12/30 D 0 >ge-0/0/6.0
* 10.0.2.13/32 L 0 Local
* 10.0.2.16/30 O 10 2 >10.0.2.6
* 10.0.4.0/30 D 0 >ge-0/0/3.0
* 10.0.4.2/32 L 0 Local
* 10.0.4.4/30 O 10 2 10.0.4.13
>10.0.4.1
* 10.0.4.8/30 O 10 2 >10.0.4.1
10.0.2.6
* 10.0.4.12/30 D 0 >ge-0/0/2.0
* 10.0.4.14/32 L 0 Local
* 10.0.4.16/30 O 10 2 >10.0.4.13
10.0.2.6
* 10.0.8.0/30 O 10 2 >10.0.2.14
* 10.0.8.4/30 O 10 2 >10.0.2.2
10.0.2.14
* 10.0.8.8/30 O 10 2 >10.0.2.2
* 10.0.8.12/30 O 10 3 10.0.2.6
>10.0.2.2
10.0.2.14
* 10.0.255.1/32 O 10 1 >10.0.4.13
* 10.0.255.2/32 O 10 1 >10.0.4.1
* 10.0.255.3/32 D 0 >lo0.0
* 10.0.255.4/32 O 10 1 >10.0.2.6
* 10.0.255.5/32 O 10 1 >10.0.2.2
* 10.0.255.6/32 O 10 1 >10.0.2.14
* 10.0.255.7/32 O 10 2 >10.0.2.6
10.0.2.2
* 10.0.255.8/32 O 10 2 >10.0.2.14
* 10.233.240.0/20 D 0 >ge-0/0/0.0
* 10.233.255.239/32 L 0 Local
* 172.17.20.0/23 B 170 100 10.0.2.6 65022 I
>10.0.2.2
* 172.17.30.0/23 B 170 100 >10.0.2.6 65022 I
10.0.2.2
* 172.17.31.0/30 O 10 3 10.0.2.6
>10.0.2.2
* 192.168.0.0/24 O 10 3 >10.0.2.14
* 192.168.1.0/24 O 10 3 >10.0.2.14
* 192.168.2.0/24 O 10 3 >10.0.2.14
* 192.168.3.0/24 O 10 3 >10.0.2.14
* 224.0.0.5/32 O 10 1 MultiRecv

FEC mapping table (inet.3) is much shorter than the IP routing table. It contains only the loopback addresses (the /32 prefixes), and an IP prefix for external BGP next hop that I added to LDP using egress-policy on R7 (the /30 prefix). Prefixes advertised by adjacent routers (R1, R2, R4 and R5) don’t have an outbound label (due to penultimate hop popping), R7’s loopback and EBGP next hop do.

FEC mapping table @ R3

root@R3> show route table inet.3

inet.3: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.0.255.1/32 *[LDP/9] 00:04:52, metric 1
> to 10.0.4.13 via ge-0/0/2.0
10.0.255.2/32 *[LDP/9] 03:52:19, metric 1
> to 10.0.4.1 via ge-0/0/3.0
10.0.255.4/32 *[LDP/9] 00:02:32, metric 1
> to 10.0.2.6 via ge-0/0/4.0
10.0.255.5/32 *[LDP/9] 03:52:17, metric 1
> to 10.0.2.2 via ge-0/0/5.0
10.0.255.7/32 *[LDP/9] 00:02:32, metric 1
to 10.0.2.6 via ge-0/0/4.0, Push 299872
> to 10.0.2.2 via ge-0/0/5.0, Push 299888
172.17.31.0/30 *[LDP/9] 00:02:32, metric 1
to 10.0.2.6 via ge-0/0/4.0, Push 299872
> to 10.0.2.2 via ge-0/0/5.0, Push 299888

Local labels are created for all prefixes in inet.3 table; locally-originated prefixes don’t need labels; they are associated with label 3 (POP).

LDP-generated part of the LFIB table on R3

root@R3> show route table mpls.0 terse protocol ldp

mpls.0: 22 destinations, 22 routes (21 active, 0 holddown, 1 hidden)
+ = Active Route, - = Last Active, * = Both

A Destination P Prf Metric 1 Metric 2 Next hop AS path
* 299776 L 9 1 >10.0.4.1
* 299776(S=0) L 9 1 >10.0.4.1
* 299824 L 9 1 >10.0.2.2
* 299824(S=0) L 9 1 >10.0.2.2
* 299888 L 9 1 >10.0.4.13
* 299888(S=0) L 9 1 >10.0.4.13
* 299904 L 9 1 >10.0.2.6
* 299904(S=0) L 9 1 >10.0.2.6
* 299920 L 9 1 10.0.2.6
>10.0.2.2

If you display detailed information from the mpls.0 table, you can also see the IP prefixes associated with the label entry (note that there are two IP prefixes associated with the same label):

A single entry in the LFIB table

root@R3> show route table mpls.0 protocol ldp detail | find 299920
299920 (1 entry, 1 announced)
*LDP Preference: 9
Next hop type: Router
Next-hop reference count: 1
Next hop: 10.0.2.6 via ge-0/0/4.0
Label operation: Swap 299872
Next hop: 10.0.2.2 via ge-0/0/5.0, selected
Label operation: Swap 299888
State: <Active Int>
Local AS: 65412
Age: 6:49 Metric: 1
Task: LDP
Announcement bits (1): 0-KRT
AS path: I
Prefixes bound to route: 10.0.255.7/32
172.17.31.0/30

LDP database contains all the information received from LDP neighbors or advertised to them; you can inspect the whole LDP database or entries received or sent to an individual neighbor. As expected, output label database contains labels for all entries in inet.3 table and label 3 (POP) for all locally-originated LDP prefixes.

Parts of LDP database on R3 (limited to R5)

root@R3> show ldp database session 10.0.255.5
Input label database, 10.0.255.3:0--10.0.255.5:0
Label Prefix
299936 10.0.255.1/32
299872 10.0.255.2/32
299856 10.0.255.3/32
299952 10.0.255.4/32
3 10.0.255.5/32
299888 10.0.255.7/32
299888 172.17.31.0/30

Output label database, 10.0.255.3:0--10.0.255.5:0
Label Prefix
299888 10.0.255.1/32
299776 10.0.255.2/32
3 10.0.255.3/32
299904 10.0.255.4/32
299824 10.0.255.5/32
299920 10.0.255.7/32
299920 172.17.31.0/30

Last but definitely not least, let’s inspect the routing table entry for the external BGP next hop. You won’t find a label in this entry.

Route toward BGP next-hop on R3

root@R3> show route 172.17.31.0/30 extensive

inet.0: 37 destinations, 37 routes (37 active, 0 holddown, 0 hidden)
172.17.31.0/30 (1 entry, 1 announced)
TSI:
KRT in-kernel 172.17.31.0/30 -> {10.0.2.2}
*OSPF Preference: 10
Next hop type: Router, Next hop index: 262148
Next-hop reference count: 6
Next hop: 10.0.2.6 via ge-0/0/4.0
Next hop: 10.0.2.2 via ge-0/0/5.0, selected
State: <Active Int>
Local AS: 65412
Age: 3:44:50 Metric: 3
Area: 0.0.0.0
Task: OSPF
Announcement bits (3): 0-KRT 3-LDP 5-Resolve tree 2
AS path: I

Just to ensure there’s no other magic going on behind the scenes, let’s inspect the forwarding table entry for the same prefix. Yet again, no label.

Forwarding entry for BGP next-hop on R3

root@R3> show route forwarding-table destination 172.17.31.0/30 extensive
Routing table: default.inet [Index 0]
Internet:

Destination: 172.17.31.0/30
Route type: user
Route reference: 0 Route interface-index: 0
Flags: sent to PFE, rt nh decoupled
Nexthop: 10.0.2.2
Next-hop type: unicast Index: 631 Reference: 16
Next-hop interface: ge-0/0/5.0

On the other hand, the routing table entry for a BGP destination using that same BGP next hop has two labels (because R3 can reach the EBGP next hop through two next-hop routers, R4 and R5.

BGP route on R3

root@R3> show route 172.17.20.0/23 extensive

inet.0: 37 destinations, 37 routes (37 active, 0 holddown, 0 hidden)
172.17.20.0/23 (1 entry, 1 announced)
TSI:
KRT in-kernel 172.17.20.0/23 -> {indirect(262146)}
*BGP Preference: 170/-101
Next hop type: Indirect
Next-hop reference count: 6
Source: 10.0.255.7
Next hop type: Router, Next hop index: 262154
Next hop: 10.0.2.6 via ge-0/0/4.0
Label operation: Push 299872
Next hop: 10.0.2.2 via ge-0/0/5.0, selected
Label operation: Push 299888
Protocol next hop: 172.17.31.2
Indirect next hop: 90d23c0 262146
State: <Active Int Ext>
Local AS: 65412 Peer AS: 65412
Age: 3:50:47 Metric2: 1
Task: BGP_65412.10.0.255.7+179
Announcement bits (2): 0-KRT 5-Resolve tree 2
AS path: 65022 I Aggregator: 65022 172.17.21.2
Accepted
Localpref: 100
Router ID: 10.0.255.7
Indirect next hops: 1
Protocol next hop: 172.17.31.2 Metric: 1
Indirect next hop: 90d23c0 262146
Indirect path forwarding next hops: 2
Next hop type: Router
Next hop: 10.0.2.6 via ge-0/0/4.0
Next hop: 10.0.2.2 via ge-0/0/5.0
172.17.31.0/30 Originating RIB: inet.3
Metric: 1 Node path count: 1
Forwarding nexthops: 2
Nexthop: 10.0.2.6 via ge-0/0/4.0
Nexthop: 10.0.2.2 via ge-0/0/5.0

The outgoing label is also copied into the forwarding table. By default, Junos doesn’t perform load balancing toward BGP destinations; the forwarding table thus contains only a single outgoing label.

Forwarding entry for a BGP route on R3

root@R3> show route forwarding-table destination 172.17.20.0/23 extensive
Routing table: default.inet [Index 0]
Internet:

Destination: 172.17.20.0/23
Route type: user
Route reference: 0 Route interface-index: 0
Flags: sent to PFE, prefix load balance
Next-hop type: indirect Index: 262146 Reference: 3
Nexthop: 10.0.2.2
Next-hop type: Push 299888 Index: 668 Reference: 1
Next-hop interface: ge-0/0/5.0

13 comments:

  1. I found your blog post very interesting. I think you may have missed a point in the history of MPLS and the differences in the approach of Cisco and Juniper as far as MPLS. I believe your history on the Cisco side is correct and that they were looking for a way to do switching through the network without doing an ip lookup at each hop and Cisco was pushing LDP as the standard for MPLS.

    I believe that Juniper’s first routers on the other hand were built on ASICs so they did not have the same constraints as Cisco on IP lookups. The thing that Juniper was trying to address with initial MPLS was a way to replace the TE capabilities that were present with ATM. I think ISPs wanted to do away with the ATM switches but still have Traffic Engineering capabilities. My understanding is this is the reason that Juniper was pushing RSVP as the standard of MPLS.

    Since in RSVP you have an LSP going to a single address the loopback of the remote device it makes sense to only have that single IP address associated with that LSP. I think that Junipers LDP behaviour is just applying their first implementation approach RSVP to what they implemented later in LDP. This is just a guess on the reason for the implementation difference but I know that Cisco pushed LDP and Juniper pushed RSVP at the standard when MPLS first came around.

    Grumpy
  2. Thank you! You're probably right, Juniper came out with edge routers years after the core ones, so their first MPLS application was most likely MPLS-TE (yet again, focusing on transporting traffic toward BGP destinations)

    BTW, Cisco also had RSVP-based MPLS-TE very early on, but it took them a while to get the headend CB-SPF sorted out.
  3. And your handwritten pictures? (I prefer this, probably made by Visio) :-P
  4. These ones were actually made in PowerPoint (I dropped Visio years ago when Microsoft bought and bloated it). The hand-drawn ones make sense for simple things that I'll probably never reuse.
  5. Ivan,

    It took me complete 24 hours to keep my head banging on the wall that why non BGP (infact I was not running BGP in my core at all) routes are not in mpls table of JUNOS and secondly why IGP learnt routes were not being labelled / tagged.

    Until I found this post :)

    No doubt, there is immeasurable gap between the knowledge base write ups of Juniper as compare to CISCO .

    Cheers.

    JM
  6. Ivan - do you know if there's a way to make JUNOS load balance between inet.0 and inet.3?
    Replies
    1. Unfortunately I don't know enough about Junos to answer this one.
    2. Vade, Why would you want to do that? Does configuring 'mpls traffic-engineering bgp-igp' not address your requirement?
  7. Thanks for the reply Chris.
    The reason: I want to limit the number of LSPs. (I want to load balance to a neighbour via 2 links, of which only one is directly connected to the neighbour. On Cisco I use forwarding adjacency, and require only one TE tunnel)
    I did see 'mpls traffic-engineering bgp-igp', but did not try it because I was under the impression this is only needed where you want LSP's to be used for IGP destinations. In my case all destinations are BGP destinations (which automatically uses the LSP if the next-hop exists, and has preferred preference and metric in, the inet3.0 table). But I have since discovered that I might have to use that, in conjunction with
    'isis traffic-engineering family inet shortcuts'
    and
    'isis traffic-engineering multipath lsp-equal-cost'
    Will try and remember to post here if I ever get a chance to test it
  8. Hi all

    "IP traffic received by a Cisco IOS router will be labeled if at all possible (outbound label is entered in the FIB whenever there’s a corresponding mapping in LDP database)"

    Since this is the default behaviour

    with MPLS enabled, Cisco normally uses labels to forward as soon as it receives labels from its LDP neighbors, if you want to have IP forwarding you have to disable LDP on egress (next-hop) interface.

    Is there a way you could have IP forwarding on an interface that is LDP enabled without having to disable LDP?
    Replies
    1. You can control which prefix-to-label mappings are accepted with LDP filters, but (IIRC) once you accept a mapping from downstream neighbor, it goes into your FIB.
  9. Hi Ivan,

    Thanks for the reference

    I read about LDP filters, seems like the way to go. Since I haven't put that in practice, please help me understand this

    Say you have a series of PE routers namely (PE1, PE2...PE100). Now at PE1 you decided to allocate label for PE1 loopback (IP 1.1.1.1/32) with below command

    #ip prefix-list PE1-LOOPBACK permit 1.1.1.1/32
    #mpls ldp label
    allocate global prefix-list PE1-LOOPBACK

    Does this mean PE1 will only allocate a label for the loopback IP 1.1.1.1 and ignore any other IGP learned routes?

    If yes does it mean if we need to have PE1 assign labels for loopback IPs of other PE routers from PE2,PE3,PE4....PE100 we have to use the command below

    #mpls ldp neighbor 2.2.2.2 labels accept 99
    #access-list 99 permit host 2.2.2.2
    #access-list 99 permit host 3.3.3.3
    #access-list 99 permit host 3.3.3.3

    Therefore the access-list 99 will contain the other PE routers (PE2,PE3...PE100) loopback IPs, but this does not scale very well assuming 100 PE routers, you will need a 100 entries in the access-list. Is there a way you can reference all /32 prefixes in a non-contagious IP block?

    Or what is the work around in this scenario?
    Replies
    1. Can't see one, apart from automating stuff.
Add comment
Sidebar