Response: On the Death of OpenFlow

On November 7th SDx Central published an article saying “OpenFlow is virtually dead.” There’s a first time for everything, and it’s a real fun reading a marketing blurb on a site sponsored by SDN vendors claiming the shiny SDN parade unicorn is dead.

On a more serious note, Tom Hollingsworth wrote a blog post in which he effectively said “OpenFlow is just a tool. Can we please find the right problem for it?

The Easy Part: What’s Wrong

It’s immediately obvious to anyone who survived the scrutiny of RFC 1925 Rule 4 that the idea of centralized control plane has no merits on planet Earth (please note that centralized control is a totally different beast that makes perfect sense).

In particular, it’s really hard to:

  • Detect non-trivial link failures in milliseconds (that’s why we have BFD);
  • Respond to real-time events in reasonable timeframe;
  • Respond to control-plane requests (ARP/ND) from a very large number of hosts;
  • Run chatty edge protocols (LLCP, LACP, STP …) on a large number of ports.

There’s a reason the number of STP instances any large modular switch can support is limited. I also had an interesting discussion recently with someone who was actually involved in building a switch control plane, and it’s amazing how many more hurdles and showstoppers are hidden behind the scenes. The things I listed above are just the tip of the iceberg.

Of course there were people who tried to prove grumpy old farts wrong and/or tried to change the laws of physics.

Some of them woke up from their hype-induced stupor, added the necessary extensions to OpenFlow, got a working product, and lost interoperability and purist control/data plane separation while doing that. Reality is hard.

What Can I Solve with OpenFlow?

OK, so what problems could I solve with OpenFlow? There are quite a few things that don’t require control-plane protocols, are not time-sensitive (as in “this has to be done in 2 msec”), and need no real-time response to failures. A few examples:

  • Programmable traffic tapping
  • Flexible endpoint (host) authentication
  • Per-user packet filters installed into edge devices
  • Interesting load balancing scenarios of long-lived elephant flows

You might have realized that most problems listed above fall into “programmable ACL/PBR” category. You can use OpenFlow to solve them, but you could also use BGP FlowSpec or a number of vendor-specific tools.

Coho Data that Tom mentioned in his blog post is using OpenFlow to program a switch in front of its scale-out storage farm – a perfect example of load balancing of long-lived elephant flows. More details in the SDN Use Cases webinar.

You might think that DDoS mitigation falls into the same category. Well, it might but the real challenge is the number of filtering rules you’d need which usually preclude the use of a hardware solution.

In any case, people who know what they’re doing try to implement extremely fast packet drops as close to the server NIC as possible to solve the DDoS mitigation challenge. Others still try to solve the same problem with OpenFlow. I wish them luck.

What about Fabrics?

The network fabrics were a particularly alluring OpenFlow use case. My cynical take on that: it was easy to figure out Total Addressable Market and get VC funding that way.

Before you start telling me how Google uses OpenFlow to build their fabrics, read this – they built their own HFR with OpenFlow, not a data center fabric.

However, let’s assume you want to build your network fabric with pure OpenFlow 1.3 (no extensions to make your life easier, so you can use switches from almost any vendor). What kind of fabric could you build? These would be the prerequisites:

  • No control-plane protocols;
  • No real-time response to topology change events;
  • No real-time response to link failures. You’d either use a single uplink or a pre-computed backup path.

So what did you just build? A fancy programmable patch panel (here’s another one), and that’s exactly what some service providers need in their access networks. No wonder they still talk about using OpenFlow in their deployments, it’s a perfect tool for their particular problem. Does that imply that it will solve all the problems you have? Probably not.

Want to Know More?

Start exploring the SDN resources @ ipSpace.net, and watch SDN webinars.

6 comments:

  1. When I first got back to the US in 2011, I was confronted by the SDN division of responsibilities in the network being all the rage in the academic research community, that by separating networking into a control plane and a data plane "SOLVED GREAT THINGS!". I was non-plussed. "Great. Now you have two wires. That helps, how?"

    A steaming pile of research dollars was out there chasing the concept.

    Cynically I put it down to finding a way to get a generation of grad students to write provably correct conf files, or that more likely everyone had lost the ability to program at a low enough level (hardware and software), to be able to make a difference at the dataplane. The algorithms we've developed in the #bufferbloat project - all seemingly impossible to fund in this environment - apply at the dataplane - and are actually useful and deployable, today, and I can see pouring them directly into hardware, soon. I am glad to have mostly ignored the whole SDN thing.
    Replies
    1. And then some people say I'm cynical ;) Totally agree with you.
    2. You don't have to call yourself cynical, experienced would do ;-)
  2. Hybrid SDN would make sense. And yes, this dynamic ACL/PBR.
    This is not tied to OpenFlow, though. BGP FlowSpec is not widely implemented yet. PCEP is too specific to MPLS.
    So sometimes only OpenFlow is available.
    But implementations are poor and there is no progress. Specifications are coming out, but noone is taking care. Certification is impractical. The ONF is under restructuring, because it is actually dead now...

    The biggest problem with alternative PBR solutions, that you could also need efficient triggering and redirecting traffic to the controller. This is provided by OpenFlow, but not by BGP FlowSpec. So they are not fully replacing each other...
    Replies
    1. Bela - BGP-FS si available in every major vendor's BGP stack as well as in ODL BGP stack and soon to be on ONOS, not sure about details there though. The focus of PCEP has always been - path provisioning, not FEC bindings, however there are some additions, being co-author of all the relevant drafts I can provide more info, if of interest.

      Wrt configuration - PBR is an ephemeral state that could be perfectly provisioned using any existing or new configuration protoocol - Netconf/Restconf/gRPC, etc
    2. OF, while could work quite well when used properly - policy configuration (ACL's, PBR, etc), and this is how it is being used in most products that run in production. When it is used to configure per flow forwarding it hits all the limitations, as described above by Ivan, and more and is absolutely of no use.. not every problem needs a hammer ;-)
Add comment
Sidebar