OpenFlow/SDN is not a silver bullet

Thursday, June 7, 2012 20:17 CEST

OpenFlow/SDN is not a silver bullet

Last autumn Todd Hoff (the author of the fantastic High Scalability blog) asked me to write a short article explaining the scalability challenges SDN and OpenFlow in particular might be facing. It took me “a while”, but I finally got it done – the OpenFlow/SDN Is Not a Silver Bullet for Network Scalability article was published last Monday.

SDN
OpenFlow

4 comments:

Ofer 07 June 2012 22:12

great article...

I like the high-end router analogy - you have a CPU running all the control plane tasks and downloading the results (FIB) to the linecards. an OpenFlow controller is in fact skipping the CPU, downloading the info from a centralized location directly to the linecards but this 'info' is (usually) a flow something much more granular than a mac address or an IP route and the distance tends to be a bit larger compared with the cpu location:) as stretched out so many times.
moreover, each high-end router vendor will tell you how more and more tasks are offloaded to the linecards. this includes control plane functions and also data path info (i.e. downloading the alternate FIB entry along with the best FIB entry to speed up the convergence).

OpenFlow will definitely catch its niche, running on access or a specific service (hybrid, while infrastructure tasks are controlled by plain old CP) . i don't see it going beyond.

Brent Salisbury 07 June 2012 23:44

Good read Ivan, not too grumpy sir but some good sprinkles for good measure :) I need an Ivan fanboy t-shirt or something. I keep following a similar path in my mind for enterprise. Taking the typical core/distribution/access model that have a pair of boxes acting as HA L3 gateways or LERs if running MPLS and integrating an SDN distribution block or island for the L2 that is south of that.

What would be great for my networks would be to then be able to use TOR gear instead of traditional
6k/MX which could be done today but for some reason vendors hate MPLS in most of their TOR gear which drives me crazy.

I have been working on trying to write something on the native to SDN integration points for an SDN gateway to the real world. What are your thoughts on the marrying of the two worlds? Quagga/Routflow in software carrying the IGP/label base and advertising the SDN topologies? One crazy thing I never thought of that Nick Bastin was patient enough to help me through on IRC, was advertising a default route. Everything matches on that so every bit of garbage from a host would match and could burn down the TCAM. Reckon not much difference with route servers today?

There are a ton of challenges to a branch in the tree, but we already have a laundry list with what we have today. I have no idea how huge enterprises can continue managing distributed systems at the rate of growth we are on without the equal amount of resources being added. However, it is good for the job market and with every year that goes by without coherent abstraction, the complexity keeps stacking that will give us lots of job security :) Maybe I don't have an operational vision, but more virtual chassis and cli scripting doesn't do it for me because of interop weakness.

It is North vs. South, mom vs. dad, Playstation vs. Xbox, Capt Ahab vs. Moby Dick, History channel vs. Discovery channel, Jerry vs Newman, some soccer/football rivalry in Europe, chaos. People will get so worked up over this for the next few years, there should be some amazing nerd rages to witness. I hope it gets recorded. Thanks!

Brad Hedlund 09 June 2012 02:29

On the topic of Nicira NVP being "scalable" -- Where is the evidence of that? The NVP data sheet claims "the system is architected to scale to .. tens of thousands of servers" -- but that's just marketing statement. Where is the actual number of supported servers stated as a real number? I'm having trouble finding that. VMware publishes such information in a way thats clear and easy to find. Why not Nicira? Or did I not look hard enough?

Cheers,
Brad

Petr Lapukhov 09 June 2012 19:52

To begin with, the "classic" statement "SONET protection halves capacity while stat-mux networks not" is not entirely correct. You always see IP network cores over-provisioned all the time exactly for the same purpose - to protect capacity from link failures. Take MPLS TE bandwidth protection for example. This problem has nothing to do with SONET/SDH per se, but rather with capacity planning process.

The true revelation to many operators is finding out that a lot of traffic may not need any protection. This is actually valid for a lot of "google-like" traffic, e.g. for background index synchronization. In this case, you can go for link utilization to 100% using whatever TE method you prefer - IGP costing, MPLS TE, centralized programming via Openflow. Some methods are more effective, of course, in terms of optimizing link utilization.

So the real magic here is TE, which is mathematically a multicommodity max-flow problem (NP complete, of course - otherwise it would have been boring). One serious challenge to do TE efficiently is dynamic nature of traffic demands, which optimally requires end-host application signaling their intent (e.g. sending something like Tspecs to the controller). That's where Google did a huge step forward, implementing distributed bandwidth brokering and binding this in their (at least some) applications. Say hello to the new RSVP :)

Openflow is just a programming API for their own boxes, the real achievement is centralized scheduling based on application demand matching. Such scheduling has been known forever (e.g. recall crossbar fabric schedulers), but never implemented for the distributed "application-network" complex. Of course there was research done in that direction, but hey who cared it was just research :) I think I posted links on BW brokering before.

Add comment

Recent posts in the same categories

SDN

OpenFlow

4 comments: