Edge and Core OpenFlow (and why MPLS is not NAT)

More than a year ago, I explained why end-to-end flow-based forwarding doesn’t scale (and Doug Gourlay did the same using way more colorful language) and what the real-life limitations are. Not surprisingly, the gurus that started the whole OpenFlow movement came to the same conclusions and presented them at the HotSDN conference in August 2012 ... but even that hasn’t stopped some people from evangelizing the second coming.

The Problem

Contrary to what some pundits claim, flow-based forwarding will never scale. If you’ve been around long enough to experience ATM-to-the-desktop failure, Multi-Layer Switching (MLS) kludges, demise of end-to-end X.25, or the cost of traditional circuit switching telephony, you know what I’m talking about. If not, supposedly it’s best to learn from your own mistakes – be my guest.

Before someone starts Moore Law incantations: software-based forwarding will always be more expensive than predefined hardware-based forwarding. Yes, you can push tens of gigabits through a highly optimized multi-core Intel server. You can also push 1,2Tbps through Broadcom chipset at comparable price. The ratios haven’t changed much in the last decades, and I don’t expect them to change in the near future.

Scalable architectures

The scalability challenges of flow-based forwarding have been well understood (at least within IETF, ITU is living on a different planet) decades ago. That’s why we have destination-only forwarding, variable-length subnet masks and summarization, and Diffserv (with a limited number of traffic classes) instead of Intserv (with per-flow QoS).

The limitations of destination-only hop-by-hop forwarding were also well understood for at least two decades and resulted in MPLS architecture and various MPLS-based applications (including MPLS Traffic Engineering).

There’s a huge difference between MPLS TE forwarding mechanism (which is the right tool for the job), and distributed MPLS TE control plane (which sucks big time). Traffic engineering is ultimately an NP-complete knapsack problem best solved with centralized end-to-end visibility.

MPLS architecture solves the forwarding rigidity problems while maintaining core network scalability by recognizing that while each flow might be special, numerous flows share the same forwarding behavior.

Edge MPLS routers (edge LSR) thus sort the incoming packets into forwarding equivalence classes (FEC), and use a different Label Switched Path (LSP) across the network for each of the forwarding classes.

Please note that this is a gross oversimplification. I’m trying to explain the fundamentals, and (following a great example of physicists) ignore all the details... oops, take the ideal case.

The simplest classification implemented in all MPLS-capable devices today is destination prefix-based classification (equivalent to traditional IP forwarding), but there’s nothing in MPLS architecture that would prevent you from using N-tuples to classify the traffic based on source addresses, port numbers, or any other packet attribute (yet again, ignoring the reality of having to use PBR with the infinitely disgusting route-map CLI to achieve that).

MPLS is just a tool

Always keep in mind that every single network technology is a tool, not a solution (some of them might be solutions looking for a problem, but that’s another story), and some tools are more useful in some scenarios than others ... which still doesn’t make them good or bad, but applicable or inapplicable.

Also, after more than a decade of tinkering, the vendor MPLS implementations leave a lot to be desired. If you hate a particular vendor’s CLI or implementation kludges, blame them, not the technology.

Edge and Core OpenFlow

After this short MPLS digression, let’s come back to the headline topic. Large-scale OpenFlow-based solutions face two significant challenges:

  • It’s hard to build resilient networks with centralized control plane and unreliable transport between the controller and controlled devices (this problem was well known in the days of Frame Relay and ATM);
  • You must introduce layers of abstraction in order to scale the network.

Martin Casado, Teemu Koponen, Scott Shenker and Amin Tootoonchian addressed the second challenge in their Fabric: A Retrospective on Evolving SDN paper, where they propose two layers in an SDN architectural framework:

  • Edge switches, which classify the packets, perform network services, and send the packets across core fabric toward the egress edge switch;
  • Core fabric, which provides end-to-end transport.

Not surprisingly, they’re also proposing to use MPLS labels as the fabric forwarding mechanism.

Where’s the beef?

The fundamental difference between typical MPLS networks we have today and the SDN Fabric proposed by Martin Casado et al. is the edge switch control/management plane: FEC classification is downloaded into the edge switches through OpenFlow (or some similar mechanism).

Existing MPLS implementations or protocols have no equivalent mechanism, and a mechanism for a consistent implementation of a distributed network edge policy would be highly welcome (all of my enterprise OpenFlow use cases fall into this category).

Finally, is MPLS NAT?

Now that we’ve covered MPLS fundamentals, I have to mention another pet peeve that annoys me: let’s see why it’s ridiculous to compare MPLS to NAT.

As explained above, MPLS edge routers classify ingress packets into FECs, and attach a label signifying the desired treatment to each of the packet. The original packet is not changed in any way; any intermediate node can get the raw packet content if needed.

NAT, on the other hand, always changes the packet content (at least the layer-3 addresses, sometimes also layer-4 port numbers), or it wouldn’t be NAT.

NAT breaks transparent end-to-end connectivity, MPLS doesn’t. MPLS is similar to lossless compression (ZIP), NAT is similar to lossy compression (JPEG). Do I need to say more?


  1. It just makes sense. SDN is MPLS but with centralization. This could be the thing that makes MPLS consumable on a broader scale.
    1. Nah, it doesn't matter. MPLS will remain prevalent in WAN, people like the two of us will use it for scalable VRFs (and the lazy ones will opt of EVN), but we don't need it all over the DC (did I just say that? ;).
    2. Ivan, would you please elaborate on this comment(maybe with another blog :) )?

      Why will this not be naturally attractive say to the traditional SPs becoming cloud providers (E.g. ATT..) who are familiar with MPLS/VPNs?

      Also, in your mind, what are the key differences between contrail systems(juniper) approach and this paper?
    3. Does it look to you that the "Fabric Controller" is nothing but the IGP in MPLS and the "Edge Controller" is MP-BGP?

      If that's the case.. doesn't it mean that the Fabric Controller should also control some aspect of the Edge node or somehow communicate with Edge Controller to provide IGP equivalent at those Edge nodes?

      BTW..Love your blogs, very educational for me.
  2. "but we don't need it all over the DC (did I just say that? ;)"

    careful, you're likely to offend Derick with that one.
    1. or one hop from the VM.
  3. Not sure if we are talking here about applying OpenFlow (or strictly speaking OpenFlow-based SDN) for service provider (carrier) networks, not datacenters. If so, then I think we should distinguish between access, metro and core part of the network:
    - For core, today MPLS is deployed, and will remain, since it is a very conservative part of the network.
    - For metro/access, today (H-)VPLS or Ethernet (PBB) or a combination is typically deployed in many carriers.
    OpenFlow-based SDN may make sense in the metro/access part (not in the core), as an alternative to H-VPLS or Ethernet. Dataplane could remain MPLS if required (no need to reinvent). Advantages ? No-vendor specific hardware (lower cost?) while providing flexibility on the control plane.
    This will not be a short-term change though, from my perspective, but it is an interesting topic.
  4. MPLS label also provides for scalability/performance and flexibility/evolution in the sense that no matter how many header fields would have to be dealt with for whatever reason, a single 20-bit label will do it now and in the future.
  5. One must also mention here Juniper's QFabric/ Stratus which came up with a similar overall approach around 4-5 years back (the core ideas behind the product would've been firmed up much earlier than the launch).


    There may be hiccups with the uptake of the product in the market for whatever reasons, but the basic model was spot-on.
Add comment