Scalability of OpenFlow Control Plane Network
I got an interesting question from one of my readers:
If every device talking to a centralized control plane uses an out-of-band channel to talk to the OpenFlow controller, isn’t this a scaling concern?
A year or so ago I would have said NO (arguing that the $0.02 CPU found in most networking devices is too slow to overload a controller or reasonably-fast control-plane network).
In the meantime, the last generations of data center switches got decent CPUs (even multi-core ones) and way more bandwidth between forwarding ASICs and CPUs, so the control-plane network overload might be a real consideration. Still, we’re probably far away from the point where 1GE-per-device control-plane network would become a bottleneck.
Well-designed solutions that use proactive flow setupsexchange small amounts of information between an OpenFlow switch and a controller. On the other hand, if your controller sets up flows in response to unknown traffic punted to the controller (reactive mode), you’ll face so many scalability challenges that you won’t even notice the potential bandwidth limitations of the control-plane network.
This does not mean that you can ignore the problem though. Some control-plane traffic has to be sent to the controller and represents a nice attack vector that can be used to bring down the switch or even the controller without proper control-plane policing.
Finally, you might be concerned with the controller scalability. Building scale-out controller architecture where each controller instance controls a subset of the network is a well-understood problem that is relatively easy to solve if you’re OK with eventual consistency across the controller cluster, but then your controller-based network won’t behave any different from classical networks, so why bother.
A shipping example of this architecture is VMware NSX, and I’m not aware of any other OpenFlow-based controller that would have the same functionality – if you know more, please send me an email.
For more technical details, watch my OpenFlow Deep Dive webinar.
Happy to sit down with you (are you at Interop?) and chat about this in person but I think there's a few things worth clarifying here:
1) Dataplane to control plane policing is something that all switches and routers (not just OpenFlow switches and controllers) have to deal with. This is one of the reasons why all well-implemented switches rate limit all data plane traffic that can cause load on the control plane (e.g., ICMP, IP options, routing protocol traffic, STP control traffic, etc.). Most folks have war stories of "they sent a lot of X traffic and then the supervisor CPU went to 100% and everything stopped responding". Mine was with packets with the IP Record Route option set :-) Fortunately, modern hardware is quite good at this and has lots and lots of knobs to tune to exactly which traffic classes should have which priorities and rate limits.
2) The bottleneck is in practice actually between ASIC and local switch CPU, not between switch CPU and OpenFlow controller. In theory one could try to optimize this system for higher performance, but as you correctly call out, with the appropriate control plane policing, high data rates from data to control plane actually not needed.
3) Given the above two points, I'm fairly sure that all vendors of networking gear have some level of data <--> control plane policing. Big Switch definitely implements this and certainly spends a lot of time thinking about correct policing behavior and testing to verify robustness here -- I can only imagine other vendors do the same.
Hope this helps clear things up a bit -- happy to talk more in person.
Thanks for an extensive comment. We're in perfect agreement apart from "other vendors doing the same" part.
I've only seen control-plane policing of OpenFlow traffic documented in Cisco IOS and NEC ProgrammableFlow (of course I might have missed something).
However, this means that you should have minimum 3 cluster members.
This architecture is extremely similar to the Cisco Grapevine platform used in APIC-EM. I have the suspicion that Grapevine also uses the RAFT algorithm (or PAXOS) for full consistency.