Control-plane policing in OpenFlow networks
The Controller-Based Packet Forwarding in OpenFlow Networks post generated the obvious question: “does that mean we need some kind of Control-Plane Protection (CoPP) in OpenFlow controller?” Of course it does, but things aren’t as simple as that.
The weakest link in today’s OpenFlow implementations (like NEC’s ProgrammableFlow) is not the controller, but the dismal CPU used in the hardware switches. The controller could handle millions packets per second (that’s the flow setup rate claimed by Floodlight developers), the switches usually burn out at thousands of flow setups per second.
The CoPP function thus has to be implemented in the OpenFlow switches (like it’s implemented in linecard hardware in traditional switches), and that’s where the problems start – OpenFlow doesn’t have a usable rate-limiting functionality till version 1.3, which added meters.
OpenFlow meters are a really cool concept – they have multiple bands, and you can apply either DSCP remarking or packet dropping at each band – that would allow an OpenFlow controller to closely mimic the CoPP functionality and apply different rate limits to different types of control- or punted traffic. Unfortunately, no hardware switch available on the market supports OpenFlow 1.3 yet, and even when the first OpenFlow 1.3 switches start appearing, they might not support meters (or meters on flows sent to the controller).
In the meantime, proprietary extensions galore – NEC had to use one to limit unicast flooding in its ProgrammableFlow switches.
OpenFlow has the potential to add to the control plane work a switch has to perform. We really should use larger CPUs in our switches (always a cost/margin choice for a vendor) and I fully agree with you that a consistent mechanism to control control plane traffic is a must. OpenFlow or otherwise.
Ordered lists of TCAM requiring table re-writes/re-ordering depending on the spacing from the agent is comical to watch. L2 reactive forwarding as much as it pains me to think of it, is probably one of the few options. NPUs are putting up some pretty good numbers but I will believe it when I see it.
Ive been trying to find time to proof a hashtable capturing high volume pps into the CP to trigger something. I dunno tho, Ops at the day job finds BUM traffic only when a problem has gone on long enough to trigger trouble and the pcap uncovers a crippling unicast flooding result. Maybe client agents are the right idea lol :0?
Cheers, great videos with NEC.
-Brent