Response: On the Death of OpenFlow
On November 7th SDx Central published an article saying “OpenFlow is virtually dead.” There’s a first time for everything, and it’s a real fun reading a marketing blurb on a site sponsored by SDN vendors claiming the shiny SDN parade unicorn is dead.
On a more serious note, Tom Hollingsworth wrote a blog post in which he effectively said “OpenFlow is just a tool. Can we please find the right problem for it?”
The Easy Part: What’s Wrong
It’s immediately obvious to anyone who survived the scrutiny of RFC 1925 Rule 4 that the idea of centralized control plane has no merits on planet Earth (please note that centralized control is a totally different beast that makes perfect sense).
In particular, it’s really hard to:
- Detect non-trivial link failures in milliseconds (that’s why we have BFD);
- Respond to real-time events in reasonable timeframe;
- Respond to control-plane requests (ARP/ND) from a very large number of hosts;
- Run chatty edge protocols (LLCP, LACP, STP …) on a large number of ports.
There’s a reason the number of STP instances any large modular switch can support is limited. I also had an interesting discussion recently with someone who was actually involved in building a switch control plane, and it’s amazing how many more hurdles and showstoppers are hidden behind the scenes. The things I listed above are just the tip of the iceberg.
Of course there were people who tried to prove grumpy old farts wrong and/or tried to change the laws of physics.
Some of them woke up from their hype-induced stupor, added the necessary extensions to OpenFlow, got a working product, and lost interoperability and purist control/data plane separation while doing that. Reality is hard.
What Can I Solve with OpenFlow?
OK, so what problems could I solve with OpenFlow? There are quite a few things that don’t require control-plane protocols, are not time-sensitive (as in “this has to be done in 2 msec”), and need no real-time response to failures. A few examples:
- Programmable traffic tapping
- Flexible endpoint (host) authentication
- Per-user packet filters installed into edge devices
- Interesting load balancing scenarios of long-lived elephant flows
You might have realized that most problems listed above fall into “programmable ACL/PBR” category. You can use OpenFlow to solve them, but you could also use BGP FlowSpec or a number of vendor-specific tools.
Coho Data that Tom mentioned in his blog post is using OpenFlow to program a switch in front of its scale-out storage farm – a perfect example of load balancing of long-lived elephant flows. More details in the SDN Use Cases webinar.
You might think that DDoS mitigation falls into the same category. Well, it might but the real challenge is the number of filtering rules you’d need which usually preclude the use of a hardware solution.
In any case, people who know what they’re doing try to implement extremely fast packet drops as close to the server NIC as possible to solve the DDoS mitigation challenge. Others still try to solve the same problem with OpenFlow. I wish them luck.
What about Fabrics?
The network fabrics were a particularly alluring OpenFlow use case. My cynical take on that: it was easy to figure out Total Addressable Market and get VC funding that way.
Before you start telling me how Google uses OpenFlow to build their fabrics, read this – they built their own HFR with OpenFlow, not a data center fabric.
However, let’s assume you want to build your network fabric with pure OpenFlow 1.3 (no extensions to make your life easier, so you can use switches from almost any vendor). What kind of fabric could you build? These would be the prerequisites:
- No control-plane protocols;
- No real-time response to topology change events;
- No real-time response to link failures. You’d either use a single uplink or a pre-computed backup path.
So what did you just build? A fancy programmable patch panel (here’s another one), and that’s exactly what some service providers need in their access networks. No wonder they still talk about using OpenFlow in their deployments, it’s a perfect tool for their particular problem. Does that imply that it will solve all the problems you have? Probably not.
Want to Know More?
Start exploring the SDN resources @ ipSpace.net, and watch SDN webinars.
A steaming pile of research dollars was out there chasing the concept.
Cynically I put it down to finding a way to get a generation of grad students to write provably correct conf files, or that more likely everyone had lost the ability to program at a low enough level (hardware and software), to be able to make a difference at the dataplane. The algorithms we've developed in the #bufferbloat project - all seemingly impossible to fund in this environment - apply at the dataplane - and are actually useful and deployable, today, and I can see pouring them directly into hardware, soon. I am glad to have mostly ignored the whole SDN thing.
This is not tied to OpenFlow, though. BGP FlowSpec is not widely implemented yet. PCEP is too specific to MPLS.
So sometimes only OpenFlow is available.
But implementations are poor and there is no progress. Specifications are coming out, but noone is taking care. Certification is impractical. The ONF is under restructuring, because it is actually dead now...
The biggest problem with alternative PBR solutions, that you could also need efficient triggering and redirecting traffic to the controller. This is provided by OpenFlow, but not by BGP FlowSpec. So they are not fully replacing each other...
Wrt configuration - PBR is an ephemeral state that could be perfectly provisioned using any existing or new configuration protoocol - Netconf/Restconf/gRPC, etc