On MPLS Forwarding Performance Myths
Whenever I claim that the initial use case for MPLS was improved forwarding performance (using the RFC that matches the IETF MPLS BoF slides as supporting evidence), someone inevitably comes up with a source claiming something along these lines:
The idea of speeding up the lookup operation on an IP datagram turned out to have little practical impact.
That might be true1, although I do remember how hard it was for Cisco to build the first IP forwarding hardware in the AGS+ CBUS controller. Switching labels would be much faster (or at least cheaper), but the time it takes to do a forwarding table lookup was never the main consideration. It was all about the aggregate forwarding performance of core devices.
Anyhow, Duty Calls. It’s time for another archeology dig. Unfortunately, most of the primary sources irrecoverably went to /dev/null, and personal memories are never reliable; comments are most welcome.
It was the mid-1990s, the Internet was taking off, and a few large US ISPs had a bit of a problem. They had too much traffic to use routers as core network devices; ATM switches were the only alternative.
To qualify that, you have to understand how “fast” routers were in those days:
- AGS+ had a 533 Mbps backplane (Cbus, source)
- Cisco 7000 was just a better implementation of the same concepts, and it looks like the CxBus had the same speed as the original Cbus (based on this source).
- Cisco 7500, launched in 1995, had CyBus providing 1.067 Gbps. The entry-level model (7505) had a single bus; higher-end models (7507/7513) had dual-CyBus backplanes, yielding 2.1 Gbps of forwarding performance [^CY].
- Line cards were laughable (by today’s standards). The fastest linecard had two Fast Ethernet ports (200 Mbps or 400 Mbps of marketing bandwidth).
On the other hand, Cisco’s LightStream 1010 ATM switch (also launched in 1995) had 5 Gbps of bandwidth (source). Keep in mind that although LS1010 was an excellent product, it was a late-to-market me-too entry-level switch. Other ATM switches had even higher performance.
The service providers were thus forced to combine routers at the network edge (because they needed IP forwarding) with ATM switches at the network core (to get sufficient aggregate forwarding bandwidth). There were “just” two problems with that approach:
-
The only (working) way to interconnect routers and ATM switches was to build point-to-point ATM virtual circuits (VC) between routers (using a network management system) and then run routing protocols on the full mesh of ATM VCs. In a word, a spaghetti-mess nightmare.
-
The customers were spending money (that Cisco wanted to get) buying ATM switches from other vendors.
Now, imagine you had a technology that would:
- Allow a seamless integration of all devices in the network
- Have a unified control plane running IP routing protocols
- Use labels in the core (reusing ATM VPI/VCI header) to retain the benefits of high-speed ATM forwarding while mapping IP packets into those labels at the slower network edge
- Be available from a single vendor2
And that’s (according to how this was explained to me when I asked, “Who would ever need this?”) why we got Tag Switching. Read the early Tag Switching information (or even the LDP RFC), and you’ll see how heavily biased toward ATM (cell-mode MPLS) it was3. Most of the TDP/LDP complexity comes from dealing with hardware that:
- Could not do IP lookups
- Could not preallocate labels to every prefix in the network because the VPI/VCI forwarding table was limited
- Could not even merge two data streams into a single one. Streams from two ingress routers had to be kept separate until they reached the egress router4.
Not surprisingly, someone quickly figured out that one could use the same concepts on Frame Relay5 and point-to-point links (merging multiple layer-2 transport technologies into a single label space). When the labels were no longer limited to ATM headers, it wasn’t too hard to think of the label stack6, and then get really creative and use the label stack to implement services on top of the transport label-switched paths.
MPLS/VPN was the first such service (and the first time most people heard of MPLS), and the rest is history. ATM is long gone, cell-mode MPLS died even before that, and we’re still using frame-mode MPLS transport and MPLS/VPN technologies.
Revision History
- 2026-02-06
- Fixed the CyBus section based on a comment by Emanuele and LinkedIn comments.
-
Although that same source once claimed SDN was a success 🤷♂️. A grain of salt might be advised. ↩︎
-
And, even better, only work on gear from that single vendor. Why do you think everyone else was so interested in standardizing something that was notably different from Cisco’s initial implementation? ↩︎
-
My MPLS/VPN book had two chapters on MPLS-over-ATM describing cell-mode MPLS (the “real” ATM MPLS) and the frame-mode MPLS over ATM VCs (router-to-router MPLS over ATM). If you don’t have the book on your dusty bookshelf, I’m sure you’ll find a stolen PDF in a dark corner of the Internet. ↩︎
-
Unless your ATM switch supported the VC Merge feature, which requires heavy buffering in transit switches. Yeah, we’re back to the shallow/deep buffer discussion. Some things never change. ↩︎
-
ChatGPT claims the idea is usually credited to Yakov Rekhter. That sounds about right. ↩︎
LS-1010 was a further developed version of A100. Cisco has acquired a smaller company (LightStream) to get this product.
LS-1010 was very much different from the competitors, since it had SVC support.
Thanks a million!
"CyBus—Cisco Extended Bus. A 1.067-gigabits-per-second (Gbps) data bus for interface processors (two CyBuses are used in the Cisco 7507)"
For the Cisco 7513 "The dual-CyBus backplane has 13 slots: six interface processor slots, 0 through 5 (CyBus 0), five interface processor slots, 8 through 12 (CyBus 1)"
I still remember one of my first tag switching POC shortly after I joined Cisco was with a 7200 with special software release (control plane) connected back to back to an ATM LS switch (data plane)
Thank you. Fixed.
Cell-mode MPLS was heavily dependent of ATM PNNI to enable the routers a dynamic set-up of switched virtual circuits (SVCs) within the ATM network.
Not sure when exactly PNNI was invented & implemented and who came up with it, but I guess it must have been in the mid-90s...?
PNNI itself is, iirc, in essence nothing more than the ATM-incarnation of CLNS using OSI addressing and IS-IS routing - one of (if not the only one?) the rare cases to meet CLNS in the wild outside of SONET/SDH networks back in late 1990s / early to mid-2000s...
In frame-mode MPLS, ATM SVCs basically became LSPs and LDP (or RSVP) would take care of the "circuit setup".
There's a whole blog post sitting somewhere around here ;) Without going into too many details:
> Cell-mode MPLS was heavily dependent of ATM PNNI
No. Cell-mode MPLS was a replacement (an alternate control plane) for ATM PNNI. The SVCs were effectively set up with LDP.
> Not sure when exactly PNNI was invented & implemented and who came up with it
ATM Forum, mid-1990s
Never looked into the details of PNNI, but it was supposed to go beyond IS-IS in terms of hierarchical levels and have some sort of constraint-based routing.
> In frame-mode MPLS, ATM SVCs basically became LSPs
No. ATM PVCs or SVCs became end-to-end point-to-point links over which the routers attached to ATM would run LDP and use SNAP/AAL5 encapsulation of MPLS frames to exchange data. ATM switches were not involved in frame-mode MPLS at all.
I'm sorry, you're absolutely right, of course. I somewhat mixed up PNNI & cell-mode MPLS.... I vaguely remember that you could configure NSAP addresses on ATM interfaces of some Cisco routers (never really looked into it), maybe that's why...? Or it's simply that I'm getting old... :-/ To be fair, I've never seen cell-mode MPLS in the wild and it's probably 15+ years since I had to deal with PNNI (with Fore ASX, Nortel Passport 7k & 15k), but now I remember there was a chapter about MPLS in the respective Fore configuration guide.
>> Never looked into the details of PNNI, but it was supposed to go beyond IS-IS in terms of hierarchical levels and have some sort of constraint-based routing. From what I can remember it looked pretty similar / almost identical to CLNS in terms of addressing & areas. Not sure if it just looks the same or if it was really the same address space. There was some constrained based routing, at least for the ATM service type & bandwidth requirements, but I don't think you could do traffic engineering in terms of some sort of path control or link coloring like you can do f.e. with MPLS-TE.
It's been a few years, but yeah, I'm pretty sure it was Yakov who was behind Tag Switching. I've also seen a lot of people claiming that MPLS didn't provide faster forwarding, because it is all done in hardware. However, longest match required 4 lookups in the m-trie that was a 16-8-8 stride with the adjacency table being the 4th in CEF. A single exact match lookup on a label was much faster. In the end, if the forwarding ASIC is fast enough to run wire rate while still performing the 4 lookups, then it becomes moot.
i'm just leaving this software forwarding results herehere.... doing stuff in hardware have power consumption differences, first a dataplane (12w) with mpls, then the same with acls and no mpls at 132...
mc36@noti:/safe/misc/native$ ./p4emu_bench.sh --------------------------- benchmarking ipv4 packets code=42192, int=4, long=8, ptr=8, order=lsb, arch=x86_64, openssl version: OpenSSL 3.5.5 27 Jan 2026 input=1514, rounds=50000000, output=1514, packets=50000000, bytes=75700000000, time=1.877838 pps=26626365.000602, 26.626365 mpps, bps=322498532887.288513, 322.498533 gbps --------------------------- benchmarking ipv6 packets code=42192, int=4, long=8, ptr=8, order=lsb, arch=x86_64, openssl version: OpenSSL 3.5.5 27 Jan 2026 input=1514, rounds=50000000, output=1514, packets=50000000, bytes=75700000000, time=2.135136 pps=23417712.033332, 23.417712 mpps, bps=283635328147.715149, 283.635328 gbps --------------------------- benchmarking vlan packets code=42192, int=4, long=8, ptr=8, order=lsb, arch=x86_64, openssl version: OpenSSL 3.5.5 27 Jan 2026 input=1518, rounds=50000000, output=1514, packets=50000000, bytes=75700000000, time=2.161812 pps=23128745.700366, 23.128746 mpps, bps=280135367922.835144, 280.135368 gbps --------------------------- benchmarking pppoe packets code=42192, int=4, long=8, ptr=8, order=lsb, arch=x86_64, openssl version: OpenSSL 3.5.5 27 Jan 2026 input=1522, rounds=50000000, output=1514, packets=50000000, bytes=75700000000, time=2.222202 pps=22500204.751863, 22.500205 mpps, bps=272522479954.567627, 272.522480 gbps --------------------------- benchmarking mpls packets code=42192, int=4, long=8, ptr=8, order=lsb, arch=x86_64, openssl version: OpenSSL 3.5.5 27 Jan 2026 input=1518, rounds=50000000, output=1518, packets=50000000, bytes=75900000000, time=1.360383 pps=36754355.207320, 36.754355 mpps, bps=446344889637.697632, 446.344890 gbps
+-----------------------------------------------------------+ | Compiler version: 9.13.4 | Created: Thu Feb 5 17:14:29 2026 | Run ID: 90424320bf0040b8 +-----------------------------------------------------------+
ingress MAU Features by Stage
| Stage | Exact | Ternary | Statistics | Meter | Selector | Stateful | Dependency |
| Number | | | | LPF or WRED | (max words) | | to Previous |
| 0 | Yes | Yes | No | No | No (0) | No | match | | 1 | Yes | No | No | No | No (0) | No | match | | 2 | Yes | No | No | No | No (0) | No | match | | 3 | Yes | No | No | No | No (0) | No | match | | 4 | Yes | No | Yes | No | No (0) | No | match | | 5 | Yes | No | Yes | No | No (0) | No | action | | 6 | No | Yes | No | No | No (0) | No | match | | 7 | Yes | Yes | No | No | No (0) | No | match | | 8 | Yes | Yes | No | No | No (0) | No | match | | 9 | Yes | Yes | Yes | No | Yes (1) | Yes | match | | 10 | No | No | No | No | No (0) | No | concurrent |
| 11 | No | No | No | No | No (0) | No | concurrent |
egress MAU Features by Stage
| Stage | Exact | Ternary | Statistics | Meter | Selector | Stateful | Dependency |
| Number | | | | LPF or WRED | (max words) | | to Previous |
| 0 | Yes | Yes | No | No | No (0) | No | match | | 1 | No | No | No | No | No (0) | No | match | | 2 | Yes | No | No | No | No (0) | No | match | | 3 | Yes | Yes | Yes | No | No (0) | No | match | | 4 | Yes | No | No | No | No (0) | No | match | | 5 | No | No | No | No | No (0) | No | match | | 6 | No | No | No | No | No (0) | No | match | | 7 | No | No | No | No | No (0) | No | match | | 8 | No | No | No | No | No (0) | No | concurrent | | 9 | No | No | No | No | No (0) | No | concurrent | | 10 | No | No | No | No | No (0) | No | concurrent |
| 11 | No | No | No | No | No (0) | No | concurrent |
ingress MAU Latency
| Stage | Clock | Predication | Dependency | Cycles Add |
| Number | Cycles | Cycle | to Previous | to Latency |
| 0 | 22 | 13 | match | 22 | | 1 | 20 | 11 | match | 20 | | 2 | 20 | 11 | match | 20 | | 3 | 20 | 11 | match | 20 | | 4 | 20 | 11 | match | 20 | | 5 | 20 | 11 | action | 2 | | 6 | 22 | 13 | match | 22 | | 7 | 22 | 13 | match | 22 | | 8 | 22 | 13 | match | 22 | | 9 | 30 | 13 | match | 30 | | 10 | 30 | 13 | concurrent | 1 |
| 11 | 30 | 13 | concurrent | 1 |
Total latency for ingress: 206
egress MAU Latency
| Stage | Clock | Predication | Dependency | Cycles Add |
| Number | Cycles | Cycle | to Previous | to Latency |
| 0 | 22 | 13 | match | 22 | | 1 | 20 | 11 | match | 20 | | 2 | 20 | 11 | match | 20 | | 3 | 22 | 13 | match | 22 | | 4 | 20 | 11 | match | 20 | | 5 | 20 | 11 | match | 20 | | 6 | 20 | 11 | match | 20 | | 7 | 20 | 11 | match | 20 | | 8 | 20 | 11 | concurrent | 1 | | 9 | 20 | 11 | concurrent | 1 | | 10 | 20 | 11 | concurrent | 1 |
| 11 | 20 | 11 | concurrent | 1 |
Total latency for egress: 172
Worst case ingress table flow
|Stg| Table Name | Run |Weight| Prcnt|
Worst case power for ingress: 9.63 W
Worst case egress table flow
|Stg| Table Name | Run |Weight| Prcnt|
Worst case power for egress: 2.88 W
Total worst case power (4 pipes) : 12.51 W Total worst case power (per pipe) : 3.13 W Input packets per second load : 100%
+-----------------------------------------------------------+ | Compiler version: 9.13.4 | Created: Thu Feb 5 17:17:28 2026 | Run ID: 4034d778067b44f3 +-----------------------------------------------------------+
ingress MAU Features by Stage
| Stage | Exact | Ternary | Statistics | Meter | Selector | Stateful | Dependency |
| Number | | | | LPF or WRED | (max words) | | to Previous |
| 0 | Yes | No | No | No | No (0) | No | match | | 1 | Yes | Yes | Yes | No | No (0) | No | match | | 2 | Yes | No | No | No | No (0) | No | action | | 3 | Yes | Yes | No | No | No (0) | No | match | | 4 | Yes | Yes | No | No | No (0) | No | match | | 5 | Yes | No | No | No | No (0) | No | action | | 6 | Yes | Yes | Yes | No | No (0) | No | match | | 7 | No | Yes | No | No | No (0) | No | match | | 8 | Yes | No | No | No | Yes (1) | Yes | match | | 9 | Yes | Yes | No | No | No (0) | No | match | | 10 | No | No | No | No | No (0) | No | concurrent |
| 11 | No | No | No | No | No (0) | No | concurrent |
egress MAU Features by Stage
| Stage | Exact | Ternary | Statistics | Meter | Selector | Stateful | Dependency |
| Number | | | | LPF or WRED | (max words) | | to Previous |
| 0 | Yes | Yes | No | No | No (0) | No | match | | 1 | Yes | No | No | No | No (0) | No | match | | 2 | Yes | Yes | Yes | No | No (0) | No | match | | 3 | Yes | No | No | No | No (0) | No | match | | 4 | No | No | No | No | No (0) | No | match | | 5 | No | No | No | No | No (0) | No | match | | 6 | No | No | No | No | No (0) | No | match | | 7 | No | No | No | No | No (0) | No | match | | 8 | No | No | No | No | No (0) | No | concurrent | | 9 | No | No | No | No | No (0) | No | concurrent | | 10 | No | No | No | No | No (0) | No | concurrent |
| 11 | No | No | No | No | No (0) | No | concurrent |
ingress MAU Latency
| Stage | Clock | Predication | Dependency | Cycles Add |
| Number | Cycles | Cycle | to Previous | to Latency |
| 0 | 20 | 11 | match | 20 | | 1 | 22 | 13 | match | 22 | | 2 | 22 | 13 | action | 2 | | 3 | 22 | 13 | match | 22 | | 4 | 22 | 13 | match | 22 | | 5 | 22 | 13 | action | 2 | | 6 | 22 | 13 | match | 22 | | 7 | 22 | 13 | match | 22 | | 8 | 30 | 13 | match | 30 | | 9 | 22 | 13 | match | 22 | | 10 | 22 | 13 | concurrent | 1 |
| 11 | 22 | 13 | concurrent | 1 |
Total latency for ingress: 192
egress MAU Latency
| Stage | Clock | Predication | Dependency | Cycles Add |
| Number | Cycles | Cycle | to Previous | to Latency |
| 0 | 22 | 13 | match | 22 | | 1 | 20 | 11 | match | 20 | | 2 | 22 | 13 | match | 22 | | 3 | 20 | 11 | match | 20 | | 4 | 20 | 11 | match | 20 | | 5 | 20 | 11 | match | 20 | | 6 | 20 | 11 | match | 20 | | 7 | 20 | 11 | match | 20 | | 8 | 20 | 11 | concurrent | 1 | | 9 | 20 | 11 | concurrent | 1 | | 10 | 20 | 11 | concurrent | 1 |
| 11 | 20 | 11 | concurrent | 1 |
Total latency for egress: 172
Worst case ingress table flow
|Stg| Table Name | Run |Weight| Prcnt|
Worst case power for ingress: 11.01 W
Worst case egress table flow
|Stg| Table Name | Run |Weight| Prcnt|
Worst case power for egress: 2.30 W
Total worst case power (4 pipes) : 13.32 W Total worst case power (per pipe) : 3.33 W Input packets per second load : 100%
Well, I was not familiar with SP networks back then (or now), but I used to operate an ATM/LANE based enterprise network with LS1010, 6500, 5500 etc.
I think MPLS was closer to the technologies Cisco had developed/acquired and could support. The telecom vendors offering robust ATM were too expensive for many SPs and ATM might have been a bit difficult to support operationally.
If I remember well, doing a hw lookup for the whole IP header was difficult, so the labels would make it faster to switch with the available hw.
So, with MPLS you could get a lot of bandwidth, cheaper and reuse equipment.
Try comparing though the software stack of eg AXD301 vs IOS. Was Cisco even capable to provide such software?
I believe it was the time where cheap IP/ethernet/IT killed the quality and reliability of the telecom world, in favor of the quantity, for better and worse.
What I see nowadays in EVPN/VXLAN is not that different from LANE 25y ago.
Running PNNI instead of IS-IS for connecting the routers inside a SP network does not sound that different also.