With Open Networking Foundation adamantly promoting their definition of SDN, and based on experiences with previous (now mostly extinct) centralized architectures, one has to ask a simple question: does it make sense?

Here’s what I thought in May 2014; for more details, read the Packet Forwarding 101 blog posts and watch the How Networks Really Work webinar.

Also note that most of the solutions I listed as successful (limited) implementations of centralized control plane in 2014 withered in the meantime, and we’re back to distributed networking (this time often coupled with automated configuration deployment).

Does Centralized Control Plane Make Sense?

A friend of mine sent me a challenging question:

You've stated a couple of times that you don't favor the OpenFlow version of SDN due to a variety of problems like scaling and latency. What model/mechanism do you like? Hybrid? Something else?

Before answering the question, let’s step back and ask another one: “Does centralized control plane, as evangelized by ONF, make sense?

A Bit of History

As always, let’s start with one of the greatest teachers: history. We’ve had centralized architectures for decades, from SNA to various WAN technologies (SDH/SONET, Frame Relay and ATM). They all share a common problem: when the network partitions, the nodes cut off from the central intelligence stop functioning (in SNA case) or remain in a frozen state (WAN technologies).

One might be tempted to conclude that the ONF version of SDN won’t fare any better than the switched WAN technologies. Reality is far worse:

  • WAN technologies had little control-plane interaction with the outside world (example: Frame Relay LMI), and those interactions were run by the local devices, not from the centralized control plane;
  • WAN devices (SONET/SDH multiplexers, or ATM and Frame Relay switches) had local OAM functionality that allowed them to detect link or node failures and reroute around them using preconfigured backup paths. One could argue that those devices had local control plane, although it was never as independent as control planes used in today’s routers.
Interestingly, MPLS-TP wants to reinvent the glorious past and re-introduce centralized path management, yet again proving RFC 1925 section 2.11.

The last architecture (that I remember) that used truly centralized control plane was SNA, and if you’re old enough you know how well that ended.

Would Central Control Plane Make Sense in Limited Deployments?

Central control plane is obviously a single point of failure, and network partitioning is a nightmare if you have a central point of control. Large-scale deployments of ONF variant of SDN are thus out of question. But does it make sense to deploy centralized control plane in smaller independent islands (campus networks, data center availability zones)?

Interestingly, numerous data center architectures already use centralized control plane, so we can analyze how well they perform:

  • Juniper XRE can control up to four EX8200 switches, or a total of 512 10GE ports;
  • Nexus 7700 can control 64 fabric extenders with 3072 ports, plus a few hundred directly attached 10GE ports1;
  • HP IRF can bind together two 12916 switches for a total of 1536 10GE ports;
  • QFabric Network Node Group could control eight nodes, for a total of 384 10GE ports.

NEC ProgrammableFlow seems to be an outlier – they can control up to 200 switches, for a total of over 9000 GE (not 10GE) ports… but they don’t run any control-plane protocol (apart from ARP and dynamic MAC learning) with the outside world. No STP, LACP, LLDP, BFD or routing protocols.

One could argue that we could get an order of magnitude beyond those numbers if only we were using proper control plane hardware (Xeon CPUs, for example). I don’t buy that argument till I actually see a production deployment, and do keep in mind that NEC ProgrammableFlow Controller uses decent Intel-based hardware. Real-time distributed systems with fast feedback loops are way more complex than most people looking from the outside realize (see also RFC 1925, section 2.4).

Does Central Control Plane Make Sense?

It does in certain smaller-scale environments (see above)… as long as you can guarantee redundant connectivity between then controller and controlled devices, or don’t care what happens after link loss (see also wireless access points). Does it make sense to generate a huge hoopla while reinventing this particular wheel? I would spend my energy doing something else.

I absolutely understand why NEC went down this path – they did something extraordinary to differentiate themselves in a very crowded market. I also understand why Google decided to use this approach, and why they evangelize it as much as they do. I’m just saying that it doesn’t make that much sense for the rest of us.

Finally, do keep in mind that the whole world of IT is moving toward scale-out architectures. Netflix & Co are already there, and the enterprise world is grudgingly doing the first steps. In the meantime, OpenFlow evangelists talk about the immeasurable revolutionary merits of centralized scale-up architecture. They must be living on a different planet.

More on SDN and OpenFlow

To learn more about the realities of OpenFlow and SDN, watch the ipSpace.net SDN webinars.


  1. These were obviously marketing numbers: latest Nexus OS Verified Scalability guide claims a Nexus 9500 switch (with a much more powerful CPU than the Nexus 7700) supports up to 1536 Fabric Extender server interfaces. ↩︎

Latest blog posts in What Is SDN? series

6 comments:

  1. OpenFlow does not make sense. The L2 and most L3 functions of switching need not be centrally processed at line rate. This also doesn't align with need or demand.

    NFV makes much more sense in conjunction with programmatic switch control. I would be very happy if a network vendor would build a management platform to which all switches could be registered and managed with a graphical and programmatic interface.

    Open the UI, create a data flow with VLAN, route, and QOS parameters and click go. The management platform issues the commands to the switches to configure them to support the designed flow. The management platform then tests the flow and provides notification that it has been successful. Of course, the management platform also monitors and reports on switch performance/activity.

    This management platform can then be integrated with hypervisors to allow provisioning of workloads and networks through the same wizard. The network doesn't need to be intelligent, it needs to be obedient.
  2. Of course, enterprise wireless networks have a centralized controller and can support thousands of access points.
    Replies
    1. At what speeds and aggregate bandwidth? And don't forget that in most cases all the traffic gets hauled back to the controller. See also

      http://blog.ipspace.net/2013/09/openflow-fabric-controllers-are-light.html
  3. Does not a big chassis has similar construct of centralized controller model that you talk about here ?. RP is the centralized controller and the line-cards are dumb switches programmed by the RP ?
  4. You're absolutely right. And how many networks have you seen built with a single big chassis?
    Replies
    1. Not one but atleast two for redundancy purpose where each chassis would typically have 2 RPM's (Primary & Standby) and both chassis talk to each other using some protocols (federation) or using vPC . Similarly mechanisms can be applied to centralized Openflow model as well right ?. One could have 2 or more controllers in their network, each controller supports HA and throw in federation among the controllers.

      My point out here is that SPOF & Network partitioning for centralized controller model could be solved by borrowing ideas from chassis world.
Add comment
Sidebar