Building network automation solutions

9 module online course

Start now!

Anycast Works Just Fine with MPLS/LDP

I stumbled upon an article praising the beauties of SR-MPLS that claimed:

Yet MPLS, until recently, was deprived of anycast routing. This is because MPLS is not a pure packet switching technology, but has a control plane based on virtual circuit switching.

My first reaction was “that’s not how MPLS works,"1 followed by “that would be fun to test” a few seconds later.

I created a tree network to test the anycast with MPLS idea:

Anycast test network

Anycast test network

The whole network is running OSPF, and MPLS/LDP is enabled on all links. A1, A2 and A3 will advertise the same prefix (10.0.0.42/32) into OSPF. According to the “no anycast with MPLS” claim, L1 should not be able to reach all three anycast nodes.

You probably know I prefer typing CLI commands over chasing rodents, so I used netsim-tools to build the lab. Here’s the topology file (I don’t think it can get any simpler than that)

module: ospf

defaults:
  device: iosv

nodes: [ l1, l2, l3, s1, a1, a2, a3 ]

links: [ s1-l1, s1-l2, s1-l3, l2-a1, l2-a2, l3-a3 ]

I created the network diagram with netlab create -o graph command followed by ‌dot -Grankdir=RL -T png -o graph.ospf.png graph.dot (using the rankdir trick Jeroen van Bemmel taught me).

Next step: starting the lab with netlab up and waiting a minute or so.

Now for the fun part: netsim-tools don’t support MPLS/LDP or anycast yet. Time for some custom Jinja2 templates.

I used netlab create -o yaml to get the final data structure that would be passed to Ansible playbooks in YAML format – there’s a links element in every lab node describing its links. Alternatively, you could look into Ansible inventory created with netlab create command.

Ansible inventory data for S1
---
box: cisco/iosv
links:
- ifindex: 1
  ifname: GigabitEthernet0/1
  ipv4: 10.1.0.2/30
  linkindex: 1
  name: s1 -> l1
  neighbors:
    l1:
      ifname: GigabitEthernet0/1
      ipv4: 10.1.0.1/30
  remote_id: 1
  remote_ifindex: 1
  type: p2p
- ifindex: 2
  ifname: GigabitEthernet0/2
  ipv4: 10.1.0.6/30
  linkindex: 2
  name: s1 -> l2
  neighbors:
    l2:
      ifname: GigabitEthernet0/1
      ipv4: 10.1.0.5/30
  remote_id: 2
  remote_ifindex: 1
  type: p2p
...

Using the links element to configure MPLS with LDP is a piece of cake:

mpls ldp explicit-null
mpls ldp router-id Loopback 0
{% for l in links %}
!
interface {{ l.ifname }}
 mpls ip
{% endfor %}

netlab config command allows you to configure lab devices with a custom Jinja2 template. netlab config mpls-ldp.j2 was all I needed to configure MPLS in my lab.


Please note that the above template configures two LDP parameters:

  • Advertise explicit NULL to make the LFIB table on L2 and L3 look nicer;
  • Set LDP router ID to a loopback interface with a unique IP address (more about that at the end of the blog post)

Configuring anycast was even easier – add another loopback interface:

interface loopback 42
 ip address 10.0.0.42 255.255.255.255
 ip ospf 1 area 0

I had to be careful when running netlab config. The loopback interface should be added only to A1, A2, and A3, but I thought about that use case when writing netlab config code – any parameter after the template name is passed to the internal Ansible playbook. Presto: netlab config ospf-anycast-loopback.j2 --limit a1,a2,a3

Smoke Test

Let’s inspect the routing tables first (hint: netlab connect is an easy way to connect to lab devices without bothering with their IP addresses or /etc/hosts file).

Here’s the routing table entry for 10.0.0.42 on L2:

Anycast routing entry on L2
l2#show ip route 10.0.0.42
Routing entry for 10.0.0.42/32
  Known via "ospf 1", distance 110, metric 2, type intra area
  Last update from 10.1.0.17 on GigabitEthernet0/3, 08:38:23 ago
  Routing Descriptor Blocks:
    10.1.0.17, from 10.0.0.6, 08:38:23 ago, via GigabitEthernet0/3
      Route metric is 2, traffic share count is 1
  * 10.1.0.13, from 10.0.0.5, 08:38:23 ago, via GigabitEthernet0/2
      Route metric is 2, traffic share count is 1

Likewise, S1 has two paths to the anycast prefix (through L2 and L3):

Anycast routing entry on S1
s1#show ip route 10.0.0.42
Routing entry for 10.0.0.42/32
  Known via "ospf 1", distance 110, metric 3, type intra area
  Last update from 10.1.0.9 on GigabitEthernet0/3, 08:41:20 ago
  Routing Descriptor Blocks:
    10.1.0.9, from 10.0.0.7, 08:41:20 ago, via GigabitEthernet0/3
      Route metric is 3, traffic share count is 1
  * 10.1.0.5, from 10.0.0.5, 08:41:20 ago, via GigabitEthernet0/2
      Route metric is 3, traffic share count is 1

What about MPLS forwarding table? Here’s the LFIB entry for 10.0.0.42 or S1. Please note that a single incoming label maps into two outgoing labels, interfaces, and next hops.

Anycast MPLS entry on S1
s1#show mpls forwarding-table 10.0.0.42 detail
Local      Outgoing   Prefix           Bytes Label   Outgoing   Next Hop
Label      Label      or Tunnel Id     Switched      interface
25         25         10.0.0.42/32     0             Gi0/2      10.1.0.5
	MAC/Encaps=14/18, MRU=1500, Label Stack{25}
	525400D2EC095254000324028847 00019000
	No output feature configured
    Per-destination load-sharing, slots: 0
           26         10.0.0.42/32     0             Gi0/3      10.1.0.9
	MAC/Encaps=14/18, MRU=1500, Label Stack{26}
	525400474CE15254008857F68847 0001A000
	No output feature configured
    Per-destination load-sharing, slots: 1

And here’s the corresponding LFIB entry from L2. Please note that the anycast nodes advertise the anycast prefix with explicit-null label because I configured ‌mpls ldp explicit-null.

Anycast MPLS entry on L2
l2#show mpls forwarding-table 10.0.0.42 detail
Local      Outgoing   Prefix           Bytes Label   Outgoing   Next Hop
Label      Label      or Tunnel Id     Switched      interface
25         explicit-n 10.0.0.42/32     0             Gi0/2      10.1.0.13
	MAC/Encaps=14/18, MRU=1500, Label Stack{}
	525400AE15B4525400A4C6208847 00000000
	No output feature configured
    Per-destination load-sharing, slots: 0
           explicit-n 10.0.0.42/32     0             Gi0/3      10.1.0.17
	MAC/Encaps=14/18, MRU=1500, Label Stack{}
	5254006D57E7525400CAEC468847 00000000
	No output feature configured
    Per-destination load-sharing, slots: 1

The final test: traceroute from L1 to anycast IP address. I had to configure ip cef load-sharing algorithm include-ports source destination to change the IOS load balancing algorithm to 5-tuple load balancing. After that, traceroute commands ended on different anycast nodes:

Traceroute from L1 to anycast IP
l1#traceroute 10.0.0.42 port 80
Type escape sequence to abort.
Tracing the route to 10.0.0.42
VRF info: (vrf in name/id, vrf out name/id)
  1 s1 (10.1.0.2) [MPLS: Label 25 Exp 0] 1 msec 1 msec 1 msec
  2 l2 (10.1.0.5) [MPLS: Label 25 Exp 0] 1 msec 1 msec
    l3 (10.1.0.9) [MPLS: Label 26 Exp 0] 1 msec
  3 a3 (10.1.0.21) 1 msec *
    a2 (10.1.0.17) 1 msec

After increasing the probe count (as suggested by Anonymous in the comments), the trace reaches all three anycast servers:

Increasing the probe count on traceroute from L1 to anycast IP
l1#traceroute 10.0.0.42 port 80 probe 10
Type escape sequence to abort.
Tracing the route to 10.0.0.42
VRF info: (vrf in name/id, vrf out name/id)
  1 s1 (10.1.0.2) [MPLS: Label 25 Exp 0] 1 msec 1 msec...
  2 l3 (10.1.0.9) [MPLS: Label 26 Exp 0] 1 msec 1 msec
    l2 (10.1.0.5) [MPLS: Label 25 Exp 0] 1 msec 1 msec...
  3 a2 (10.1.0.17) 1 msec *
    a3 (10.1.0.21) 1 msec *
    a1 (10.1.0.13) 1 msec *
    a3 (10.1.0.21) 1 msec *
    a1 (10.1.0.13) 1 msec *

Myth busted. Traditional MPLS offers more than P2P virtual circuits. MPLS forwarding entries follow the IP routing table entries. While I like SR-MPLS (as opposed to its ugly cousin SRv6), you don’t need it to run anycast services; LDP works just fine.

The Curse of Duplicate Addresses

While anycast works with MPLS/LDP (as demonstrated), LDP is not completely happy with the setup.

Worst case, anycast servers choose the anycast IP address as the LDP Identifier, and the adjacent devices try to connect to the anycast IP address when establishing LDP TCP session. That can’t end well. To fix this one, use mpls ldp router-id on Cisco IOS (and an equivalent command on your platform-of-choice).

LDP also advertises all local IP addresses to LDP neighbors to help them map FIB next hops to LDP neighbors2. Multiple LDP neighbors advertising the same IP address make L2 decidedly unhappy3, resulting in syslog messages like this one:

%TAGCON-3-DUP_ADDR_RCVD: Duplicate Address 10.0.0.42 advertised ↩︎
by peer 10.0.0.6:0 is already bound to 10.0.0.5:0
%TAGCON-3-TDPID: peer 10.0.0.6:0, TDP Id/Addr mapping problem ↩︎
(rcvd TDP address PIE, bind failed)

The setup still works, but the extraneous syslog messages might upset an overly fastidious networking engineer. To make LDP happy, run BGP (not OSPF) with the anycast servers, and distribute labels for anycast addresses with IPv4/IPv6 labeled unicast (BGP-LU) address family.

Yeah, I know I have to set up another lab to prove that ;) Mañana…

Revision History

2021-11-17
The curse of duplicate addresses section has been added based on feedback from Dmytro Shypovalov. Thanks a million for keeping me on the straight and narrow!
2021-11-18
Added a traceroute printout with larger probe count as suggested by an anonymous commenter.

  1. At least not on any device I worked with. ↩︎

  2. You can also use the list of local addresses to identify parallel links. ↩︎

  3. The error messages appear only on devices that have more than one LDP session to anycast servers (L2 in our lab topology). ↩︎

Blog posts in this series

3 comments:

  1. Wondering if traceroute on "s1" would also reach "a1" (on third hop) as it's not shown in the output. Maybe by increasing probe count for traceroute?

    Minor typo: title "Anycast MPLS entry on S1" should be "Anycast MPLS entry on L2"

  2. Very interesting topic Ivan : )). While those who study transport mechanics in internetwork quickly realize the claim is incorrect based on first principles, it's always nice to see a conclusive proof laid out with step-by-step instructions and empirical data.

    MPLS is Transport, and Transport is more concerned about addressing, routing, synchronization etc, while Anycast, taken to its root, is a naming issue, so one can quickly realize there's no fundamental reason that prevents MPLS from supporting anycast. Any limitation, is strictly implementational, not technological. So it's not too surprising how even old LDP code can take on Anycast.

    "MPLS is not a pure packet switching technology, but has a control plane based on virtual circuit switching." while I can see why Dmytro might have come to this conclusion, it's not really true either. It's true that MPLS is an evolution of ATM, and therefore, is not tunnel, but VC -- never understand the pointless debates about whether MPLS is tunnel or VC as the people who claim it's tunnel obviously don't understand its history -- but MPLS is more like loose VC. MPLS relies on a functional routing protocol to discover the paths, and LDP -- if it can be called MPLS control plane as MPLS control plane can consist of way more than LDP -- assigns the labels for the prefixes, so MPLS in an IP network is essentially packet-switching, not VC. If a core DCE breaks down in a VC network, the VC breaks, but in MPLS IP network, the LSP adjusts itself based on the underlying routing topology. LDP itself is a simplistic label assignment protocol, AFAIK it doesn't perform any function central to a VC control plane, so all in all MPLS deviates a fair bit from traditional VC network.

    But the most important part is: MPLS is more general than both VC and packet-switch, because it's Transport, and this point is crucial. MPLS can therefore be generalized to support non-packet-switch network, in VC style or what not. So all of these points make it obvious that nothing would stop MPLS from supporting Anycast routing, or any form of routing, for that matter. And the reason this needs to be clearly identified is so vendors can't use that as an excuse to sell SR for the wrong reason.

    And I never understand what the excitement is with SR, given that it's a product of SDN, itself a misconception(or a flight of fantasy) from the start. SR is more like a point solution for a few special cases, and it tries to overcome the state problems by introducing more constructs into the network, like SRGB and global segments, that can lead to tight binding and confusion, esp. at large scale.

    In fact Dmytro's post already mentioned 3 deficiencies of SRGB mismatch. Introducing more moving parts, esp. global ones, into big distributed systems, always leads to complexity and unintended consequences. SR will run into problem with MPLS network where label is no longer just an abstract concept with no physical reality, but has to match the underlying resources. This 'fitting the data' requirement and SRGB can contradict each other, leading to much headache. And if one has to add a controller to SR to do TE, then one will run into all the scaling problems of centralized routing that SDN proponents have learnt the hard way. So personally I agree 100% with Greg Ferro's remark here:

    https://packetpushers.net/srx6-snake-oil-or-salvation

  3. Brilliant topic Ivan and the usual gem from Minh : )

    Just a couple of comments:

    Some years ago I attended a vendor's course on Segment Routing tailored for my company and the SR advocate made his debut saying that LDP didn't support ECMP ..... :(..... some of us nearly fell off our chairs but it was such a macroscopic idiocy that we didn't want to embarrass him and so we said nothing... the worrying bit though is that most of the audience didn't fall off any chair as it went through pretty unnoticed... this is just to say that there's always a lot of fertile ground around for any new technology...

    Regarding the operational fragility SRGB offers us ... I'd recommend the following reading: https://datatracker.ietf.org/doc/html/draft-ietf-spring-conflict-resolution

    Cheers/Ciao

    Andrea

Add comment
Sidebar