More than a year ago, I explained why end-to-end flow-based forwarding doesn’t scale (and Doug Gourlay did the same using way more colorful language) and what the real-life limitations are. Not surprisingly, the gurus that started the whole OpenFlow movement came to the same conclusions and presented them at the HotSDN conference in August 2012 ... but even that hasn’t stopped some people from evangelizing the second coming.
Contrary to what some pundits claim, flow-based forwarding will never scale. If you’ve been around long enough to experience ATM-to-the-desktop failure, Multi-Layer Switching (MLS) kludges, demise of end-to-end X.25, or the cost of traditional circuit switching telephony, you know what I’m talking about. If not, supposedly it’s best to learn from your own mistakes – be my guest.
Before someone starts Moore Law incantations: software-based forwarding will always be more expensive than predefined hardware-based forwarding. Yes, you can push tens of gigabits through a highly optimized multi-core Intel server. You can also push 1,2Tbps through Broadcom chipset at comparable price. The ratios haven’t changed much in the last decades, and I don’t expect them to change in the near future.
The scalability challenges of flow-based forwarding have been well understood (at least within IETF, ITU is living on a different planet) decades ago. That’s why we have destination-only forwarding, variable-length subnet masks and summarization, and Diffserv (with a limited number of traffic classes) instead of Intserv (with per-flow QoS).
The limitations of destination-only hop-by-hop forwarding were also well understood for at least two decades and resulted in MPLS architecture and various MPLS-based applications (including MPLS Traffic Engineering).
There’s a huge difference between MPLS TE forwarding mechanism (which is the right tool for the job), and distributed MPLS TE control plane (which sucks big time). Traffic engineering is ultimately an NP-complete knapsack problem best solved with centralized end-to-end visibility.
MPLS architecture solves the forwarding rigidity problems while maintaining core network scalability by recognizing that while each flow might be special, numerous flows share the same forwarding behavior.
Edge MPLS routers (edge LSR) thus sort the incoming packets into forwarding equivalence classes (FEC), and use a different Label Switched Path (LSP) across the network for each of the forwarding classes.
Please note that this is a gross oversimplification. I’m trying to explain the fundamentals, and (following a great example of physicists) ignore all the details... oops, take the ideal case.
The simplest classification implemented in all MPLS-capable devices today is destination prefix-based classification (equivalent to traditional IP forwarding), but there’s nothing in MPLS architecture that would prevent you from using N-tuples to classify the traffic based on source addresses, port numbers, or any other packet attribute (yet again, ignoring the reality of having to use PBR with the infinitely disgusting route-map CLI to achieve that).
MPLS is just a tool
Always keep in mind that every single network technology is a tool, not a solution (some of them might be solutions looking for a problem, but that’s another story), and some tools are more useful in some scenarios than others ... which still doesn’t make them good or bad, but applicable or inapplicable.
Also, after more than a decade of tinkering, the vendor MPLS implementations leave a lot to be desired. If you hate a particular vendor’s CLI or implementation kludges, blame them, not the technology.
Edge and Core OpenFlow
After this short MPLS digression, let’s come back to the headline topic. Large-scale OpenFlow-based solutions face two significant challenges:
- It’s hard to build resilient networks with centralized control plane and unreliable transport between the controller and controlled devices (this problem was well known in the days of Frame Relay and ATM);
- You must introduce layers of abstraction in order to scale the network.
Martin Casado, Teemu Koponen, Scott Shenker and Amin Tootoonchian addressed the second challenge in their Fabric: A Retrospective on Evolving SDN paper, where they propose two layers in an SDN architectural framework:
- Edge switches, which classify the packets, perform network services, and send the packets across core fabric toward the egress edge switch;
- Core fabric, which provides end-to-end transport.
Not surprisingly, they’re also proposing to use MPLS labels as the fabric forwarding mechanism.
Where’s the beef?
The fundamental difference between typical MPLS networks we have today and the SDN Fabric proposed by Martin Casado et al. is the edge switch control/management plane: FEC classification is downloaded into the edge switches through OpenFlow (or some similar mechanism).
Existing MPLS implementations or protocols have no equivalent mechanism, and a mechanism for a consistent implementation of a distributed network edge policy would be highly welcome (all of my enterprise OpenFlow use cases fall into this category).
Finally, is MPLS NAT?
Now that we’ve covered MPLS fundamentals, I have to mention another pet peeve that annoys me: let’s see why it’s ridiculous to compare MPLS to NAT.
As explained above, MPLS edge routers classify ingress packets into FECs, and attach a label signifying the desired treatment to each of the packet. The original packet is not changed in any way; any intermediate node can get the raw packet content if needed.
NAT, on the other hand, always changes the packet content (at least the layer-3 addresses, sometimes also layer-4 port numbers), or it wouldn’t be NAT.
NAT breaks transparent end-to-end connectivity, MPLS doesn’t. MPLS is similar to lossless compression (ZIP), NAT is similar to lossy compression (JPEG). Do I need to say more?