There’s a Difference between Scaling and Not Being Stupid

I was listening to one of the HP SDN Packet Pushers podcasts in which Greg made an interesting comment along the lines of “people say that OpenFlow doesn’t scale, but what HP does with its IMC is it verifies the amount of TCAM in the switches, checks whether it can install new flows, and throws an alert if it runs out of TCAM.

That’s plain common sense (and I’m glad at least one vendor came to the conclusion that needs to be done), however there’s a huge difference between being ignorant of limitations and not being scalable. For example, you can have the smartest controller in the world, but if you want to use it to install a flow per application, the solution won’t scale no matter what, and you’ll have a few thousand application flows going over a 1Tbps switch. Good luck with that ;)

Or you might have a different controller that doesn’t allow you to do something that stupid, but cannot scale beyond 50 physical switches due to control-plane limitations. The solution might be more than good enough when you start your deployment, but you’ll inevitably hit some pretty rough roadblocks when you try to add capacity. Or the controller has a hard limit on the number of flows it can support, which will limit the number of endpoints (example: virtual machines) you can deploy in your environment.

In any case, sooner or later you’ll hit a fundamental limit (be it implementation or architectural one) of every single centralized architecture, and that’s the real scalability problem that we should be discussing.

As I wrote before, loosely coupled architectures that stay away from centralizing data- or control plane scale much better than those that believe in centralized intelligence. Most vendors with real-life experience have already figured that out – check the architectural details of Cisco’s DFA or ACI, Juniper’s QFabric or Contrail, and to a slightly lesser extent Nuage VSP; even Big Switch Networks is going down that same route.

In the meantime, it will probably take a few more years before everyone realizes the inherent limits of centralized control plane ideas that you simply cannot fix regardless of how brilliant your implementation is (see also RFC 1925, sections 2.4 and 2.11), and refocus from unicorns-in-the-sky dreams to real-life problems we can solve.

Want to know more?

Finally, if you happen to be attending Interop Las Vegas, drop by our SDN workshop!

3 comments:

  1. Can PLUMgrid be compared to Nuage VSP in term of scalability? or it has all the burdens of a totally centralized SDN?

    They don't seem to use the intermediate "VSC"-thing that Nuage does, but they push the flow paths into their vswitches like Nuage.
    Replies
    1. No idea. They have zero public documentation on their web site, which makes me totally uninterested in whatever it is they happen to be doing.
  2. Seems like there are 2 issues with the current line of thinking.

    1) Centralized Controller model.
    2) OF scaling.

    With #1, seems like Centralized model such as ODL does not fly while well distributed model is required. Container world have embraced distributed model well. OVN taking the container world approach for VM's. I am not sure how OVN is different than using BGP RR model (a distributed model again).

    With #2, OF scaling is real problem for the underlay.
Add comment
Sidebar