Is MPLS/VPN Too Complex?

Henk Smit made the following claim in one of his comments:

I think BGP-MPLS-VPNs are over-complicated. And you don’t get enough return for that extra complexity.

TL&DR: He’s right (and I just violated Betteridge’s law of headlines)

The history of how we got to the current morass might be interesting for engineers who want to look behind the curtain, so here we go…

To understand the MPLS/VPN design concepts, one has to realize what the alternatives were in late 1990s:

  • Nobody was crazy enough to think about bridging over WAN; we were still recovering from the nightmares caused by people trying to do that with brouters or Ethernet bridges. Carrier Ethernet was nowhere to be seen – WAN Ethernet circuits were simply not a thing yet.
  • The only alternative to the shaky semi-reliable public Internet were various circuit-based WAN technologies like ATM or Frame Relay. There were even people building VPNs over X.251.

Circuit-based WAN technologies were a perfect fit for traditional hierarchical networks, but became an absolute nightmare when you were trying to build an any-to-any network. You had to:

  • Figure out which sites are most likely to communicate and order a separate virtual circuit for every pair of sites. The traffic between other sites would still be exchanged over hub sites.
  • Guesstimate how much traffic you might see between a pair of sites and specify that bandwidth when provisioning the virtual circuit.

Customers hated that, and even smart service providers weren’t exactly happy. They had to deal with a spaghetti mess of point-to-point virtual circuits that were burning precious hardware forwarding resources on every switch on the way2.

Now imagine a vendor coming along saying “imagine you could build an Internet-like service for every customer”:

  • There would be no explicit virtual circuits – routing protocols would sort out where the traffic should go.
  • You could specify ingress- and egress bandwidth per node, do ingress/egress policing or marking on the PE-routers, and let differentiated queuing/dropping in the network core do its job.
  • Customer would need a single routing adjacency on the PE-CE link instead of a complex setup supporting partially-meshed Frame Relay WAN.

No wonder everyone was so enthusiastic about MPLS/VPN.

The early MPLS/VPN implementations were sane (because the nerd knobs weren’t there yet). For example, AT&T offered MPLS/VPN on Frame Relay access links using EBGP as the PE-CE routing protocol. You could use existing EBGP mechanisms (MED or AS Path prepending) to implement primary/backup links, and it all just worked.

And then the feature requests started:

  • I want to use OSPF between PE routers and CE routers (OSPF down bits, extra VPNv4 extended communities)
  • My customer has OSPF backup links, so we need OSPF sham links between PE-routers
  • I want to use EIGRP and I don’t want you to redistribute site routes back into the same site on PE-routers (requiring site-of-origin community)
  • My customer has more than 1000 sites, resulting in crazy AS path handling hacks3
  • I want differentiated QoS for every customer resulted in all sorts of “solutions” too ugly to mention.
  • I want fast failover eventually brought us PIC Edge and associated BGP hacks.
  • My customers want to reinvent Frame Relay with MPLS/VPN because some people wanted to have lower-priced service providing the complex scenarios of the higher-priced service4. Welcome to hub-and-spoke VPN designs.

The list of MPLS/VPN features goes on and on. We have VRF selection using PBR, half-duplex VRF, RT rewrite, Inter-AS MPLS/VPN, RT-based ORF… There’s a story behind every one of these features, often starting with a large service provider forcing a vendor to implement a nerd knob to fix a broken design.

On top of that, service providers lacking the technical skills needed to run an outsourced core IP network increased the bad rap of MPLS/VPN technology. I had several customers who went directly from Frame Relay to Internet-based IPsec VPNs because they didn’t trust the service provider to do a decent job. Those same service providers want to offer managed SD-WAN services today. I wish their customers plenty of luck; they might need it 🥴

In the meantime (it’s been over 20 years since my MPLS/VPN book was published), Carrier Ethernet became a viable alternative, and as much as it hurts me, I’m usually recommending customer-managed routers attached to Carrier Ethernet as a better (and safer) alternative to MPLS/VPN. You’ll find more details in the Choose the Optimal VPN Service webinar.

Revision History

2022-04-05
Added a link to “SD-WAN: A Service Provider Perspective” blog post.

  1. …and paying dearly for every byte transferred. ↩︎

  2. ATM Virtual Path concept introduced some hierarchy, but ATM circuits were awfully expensive (and wasted bandwidth due to cell tax). On the other hand, Frame Relay switches didn’t offer the high speed interfaces; the service providers had to deal with all sorts of complex FR-to-ATM interworking solutions. ↩︎

  3. Instead of implementing 4-byte ASN. Networking industry loves to solve things with hacks. See also: NAT and CGN instead of IPv6. ↩︎

  4. And the SP account team would never say “we don’t do that because we’d be losing money on your stupid design” (aka “we sold it, it’s Ops problem now”). It’s more convenient to wave a large P/O in front of the vendor account team, yell about missing features, and then blame the vendor for resulting complexity. ↩︎

3 comments:

  1. > Carrier Ethernet became a viable alternative, and as much as it hurts me, I’m usually recommending customer-managed routers attached to Carrier Ethernet as a better (and safer) alternative to MPLS/VPN

    If I may ask, how did Carrier Ethernet become a viable alternative that obsoleted L(2|3)VPNs? To my understanding it doesn't really bring much to the table except maybe a somewhat templated design on how to sell certain types of edge handoffs between the SP and customer.

    I take it by offloading the MPLS L(2|3)VPNs to the provider, and the provider handing off using Carrier Ethernet that it's easier to have a standardized set of interface configurations?

    Replies
    1. Carrier Ethernet is L2VPN and might utilize MPLS as the transport layer. See https://en.wikipedia.org/wiki/Carrier_Ethernet

      As for the ease of consuming L2VPN versus L3VPN services (from the end-user perspective): I covered that in details in the Choose the Optimal VPN Service webinar and in the January 2022 Design Clinic. There might be a blog post or two covering that topic as well.

    2. It is not just about consuming. The provisioning by the service provider becomes much simpler with Carrier Ethernet. It can be also fully automated, even over a long chain of multiple service providers. While the prices in a competitive situation cannot cover the cost of complex provisioning schemes. So Carrier Ethernet wins by the simple cost effectiveness reasons.

    3. Carrier Ethernet reduces complexity for the customer, reliance on a specific carrier and cost.

      In terms of security, it comes with less overhead and complexity, if the primary objective is to encrypt the entire IP packet and to have full integrity protection and line-rate DoS resistance (in case you don't use MACsec). If you want to have high assurance security, there are products that also offer the tunneling of the Ethernet-header and advanced traffic flow security. Some of the solutions are also quantum-safe right out of the box (including multipoint support) and a number of solutions offer group key systems for meshed-configurations. There are also solutions, that support multi-tenant scenarios over single and multiple ports. Not with MACsec, though, not even with NSA Type 1 devices.

  2. This is akin to people complaining at the IRS about how the US tax code is too darn complex (features). Manufacturers (IRS) didn’t dream up this complexity for fun. If I may borrow the newest meme fad: My brothers in Christ, We wanted these features. And now it’s too complex? Maybe it’s too complex because We lacked the agency and fortitude to reject unreasonable business demands. We still do, stretched L2 anyone?

    Replies
    1. > people complaining

      Is this directed to me? This is not a complaint, this is just my opinion. As I wrote in my original reply, "I am just another idiot with an opinion".

      > Manufacturers didn’t dream up this complexity for fun.

      FYI, I've spent half of my career working for vendors. So I am not complaining about manufacturers. (The other half was a very long sabbatical). The last network that I had to maintain had 4 transparent bridges, and 1 AGS. That was more than 3 decades ago.

      The rise and deployment of MPLS was during my long sabbatical. When I came back, MPLS had just been declared dead, and SR was on the rise. I have perfect timing. :)

      About complexity. I like simple. I really like simple. I happen to be an author of RFC3784 (ISIS extensions for TE). At the time (1998-1999) I thought to myself: "I hardly understand how this stuff is supposed to work myself. And I am a co-author of this draft. How are operators going to design, deploy and troubleshoot this?"

      During the next years (during my sabbatical) I saw that people were writing books about MPLS, MPLS-RSPV-TE and BGP-MPLS-VPNs. There were blog-post. New features and extensions in the IETF. Clearly I was wrong. MPLS-RSVP-TE was being deployed and was a success.

      However, last year I watched a presentation by Clarence Filsfils about SR and TE. Clarence said: "2.5% of ISPs use(d) MPLS-RSVP-TE. 0% of enterprises use MPLS-RSVP-TE". So maybe my original thought wasn't so wrong after all. :)

  3. Well if its Enterprise size application. Extreme SPB-M is easy and can run L2 L3 & MC. Normally that run on single IS-IS but now it can multi area.

    On service provider arena

    MPLS SR MPLS with ISIS ?

    Replies
    1. In the age of such brilliancy on Enterprises network solutions such as :
      • using nat to route traffic because dynamic routing is to hard to grasp and static routing too cumbersome -Firewalls on about any network segment because of reasons

      simple concepts like a vrf light are already very difficult to grasp to some IT teams so a basic MPLS design witch is a great concept is viewed as very complex/expensive solution.

      Some providers in Europe still find success selling mpls to end customers as a vpn solution and it beats the currently popular automated ipsec tunneling solutions aka sd-something in most customer requirements, the cost is where it is usually lacking, however even in this domain it can be cost effective at scale if well design/managed.

Add comment
Sidebar