OpenFlow/SDN is not a silver bullet
Last autumn Todd Hoff (the author of the fantastic High Scalability blog) asked me to write a short article explaining the scalability challenges SDN and OpenFlow in particular might be facing. It took me “a while”, but I finally got it done – the OpenFlow/SDN Is Not a Silver Bullet for Network Scalability article was published last Monday.
I like the high-end router analogy - you have a CPU running all the control plane tasks and downloading the results (FIB) to the linecards. an OpenFlow controller is in fact skipping the CPU, downloading the info from a centralized location directly to the linecards but this 'info' is (usually) a flow something much more granular than a mac address or an IP route and the distance tends to be a bit larger compared with the cpu location:) as stretched out so many times.
moreover, each high-end router vendor will tell you how more and more tasks are offloaded to the linecards. this includes control plane functions and also data path info (i.e. downloading the alternate FIB entry along with the best FIB entry to speed up the convergence).
OpenFlow will definitely catch its niche, running on access or a specific service (hybrid, while infrastructure tasks are controlled by plain old CP) . i don't see it going beyond.
What would be great for my networks would be to then be able to use TOR gear instead of traditional
6k/MX which could be done today but for some reason vendors hate MPLS in most of their TOR gear which drives me crazy.
I have been working on trying to write something on the native to SDN integration points for an SDN gateway to the real world. What are your thoughts on the marrying of the two worlds? Quagga/Routflow in software carrying the IGP/label base and advertising the SDN topologies? One crazy thing I never thought of that Nick Bastin was patient enough to help me through on IRC, was advertising a default route. Everything matches on that so every bit of garbage from a host would match and could burn down the TCAM. Reckon not much difference with route servers today?
There are a ton of challenges to a branch in the tree, but we already have a laundry list with what we have today. I have no idea how huge enterprises can continue managing distributed systems at the rate of growth we are on without the equal amount of resources being added. However, it is good for the job market and with every year that goes by without coherent abstraction, the complexity keeps stacking that will give us lots of job security :) Maybe I don't have an operational vision, but more virtual chassis and cli scripting doesn't do it for me because of interop weakness.
It is North vs. South, mom vs. dad, Playstation vs. Xbox, Capt Ahab vs. Moby Dick, History channel vs. Discovery channel, Jerry vs Newman, some soccer/football rivalry in Europe, chaos. People will get so worked up over this for the next few years, there should be some amazing nerd rages to witness. I hope it gets recorded. Thanks!
Cheers,
Brad
The true revelation to many operators is finding out that a lot of traffic may not need any protection. This is actually valid for a lot of "google-like" traffic, e.g. for background index synchronization. In this case, you can go for link utilization to 100% using whatever TE method you prefer - IGP costing, MPLS TE, centralized programming via Openflow. Some methods are more effective, of course, in terms of optimizing link utilization.
So the real magic here is TE, which is mathematically a multicommodity max-flow problem (NP complete, of course - otherwise it would have been boring). One serious challenge to do TE efficiently is dynamic nature of traffic demands, which optimally requires end-host application signaling their intent (e.g. sending something like Tspecs to the controller). That's where Google did a huge step forward, implementing distributed bandwidth brokering and binding this in their (at least some) applications. Say hello to the new RSVP :)
Openflow is just a programming API for their own boxes, the real achievement is centralized scheduling based on application demand matching. Such scheduling has been known forever (e.g. recall crossbar fabric schedulers), but never implemented for the distributed "application-network" complex. Of course there was research done in that direction, but hey who cared it was just research :) I think I posted links on BW brokering before.