More on Centralized Control and SDN

After I wrote a comment on a LinkedIn discussion in the Carrier Ethernet group (more details here), Vishal Sharma wrote an interesting response, going into more details of distinction between centralized control and centralized control plane.

He started with a nice summary of my view:

What I understood from what you is that it is ok to have a centralized entity to have (to use a much overused phrase) a "single pane of glass" view of the network. And, presumably, the central controller may have obtained this view by amalgamating inputs from various sources.

Couldn’t agree more. Numerous SDN architectures use this approach.

Could it get the control plane details, for example, by acting as a peer of the CP running on the existing devices (switches/routers) in the network, so it has the same view of the network as they do, even if the control plane itself is not centralized in the controller per se?

That’s exactly what many SDN solutions are doing.

Most of them use plain BGP, for example Microsoft’s data center solution (see Centralized Routing Control in BGP Networks Using Link-State Abstraction for more details), Netflix’ traffic analysis solution, or Border6 Non-Stop Internet.

Some other solutions use BGP-LS (North-Bound Distribution of Link-State and TE Information Using BGP), for example Juniper’s NorthStar controller.

A centralized control plane, on the other hand, is the notion that all of the control computations be centralized in a single entity, which then programs elements in the (distributed) forwarding/data plane. And, your thought is that this latter entity does not make sense in the real world.

It’s not the notion of centralized computation that’s problematic. After all, tools like Cariden MATE or Juniper’s NorthStar controller use centralized computation, and you could argue that every BGP route reflector or route server (used by numerous Internet Exchange Points) do the same.

The real problem is in the other tasks that the control plane has to do, like detecting byzantine link failures, sending periodic messages to external devices, or running host-to-network protocols like ARP/ND. Those tasks don’t scale.

I have to admit that (if I understood what you said above correctly) this is certainly a contrarian viewpoint, since, for most people, SDN is about centralizing the control plane itself. Now, we do have the notion of a "logically centralized" control plane, but centralized none-the-less. So, some light on this would be much appreciated!

You might call my viewpoint contrarian, I call it realistic – and almost everyone who had to build and ship a production-grade product agrees with me.

For more details, go to the product-specific part of the previous blog post on this topic.

The real problem (as I see it) is that people who talk about centralized control plane don’t really understand all the implications of this concept. You either have centralized control plane (including all the complications I mentioned above and in the previous blog post) or you don’t. You can’t have it both ways.

You could, of course, offload the periodic control plane functionality to edge nodes, and still run central path computation. Juniper’s QFabric did exactly that, as did most Frame Relay, SONET/SDH and ATM networks. The SDN Architecture document from ONF mentions this approach (and the real-life scalability concerns) very explicitly in sections 4.2 and 4.3. Let me quote straight from section 4.3.4 of that document (which more-or-less says the same things I’ve been saying for years)

Although a key principle of SDN is stated as the decoupling of control and data planes, it is clear that an agent in the data plane is itself exercising control, albeit on behalf of the SDN controller. Further, a number of functions with control aspects are widely considered as candidates to execute on network elements, for example OAM, ICMP processing, MAC learning, neighbor discovery, defect recognition and integration, protection switching.

A more nuanced reading of the decoupling principle allows an SDN controller to delegate control functions to the data plane, subject to a requirement that these functions behave in ways acceptable to the controller; that is, the controller should never be surprised. This interpretation is vital as a way to apply SDN principles to the real world.

However, do keep in mind that the current set of tools you could use (primarily OpenFlow) doesn’t include a standard way of delegating control (at least not in OpenFlow 1.5), so anyone who solved this problem did it using proprietary extensions.

More to Explore

You might want to read my other SDN- and OpenFlow-related blog posts. For even more details, explore my SDN webinars and other SDN resources:

Latest blog posts in Distributed Systems series

3 comments:

  1. Great write up. Absolute centralised control-plane is difficult and does just not scale. Have you tried building an OpenFlow network and actually building out a management network in a constrained environment so that the nodes can talk to the controller? Have you tried running that network with meaningful information without chopping off the branch you're sitting on? Hah. Good luck. Not fun. My own views on this are 'make things easy'. So many answers aren't there for architecture on a distributed nature. We have *some* of the required tools.

    I do happen to think that edge nodes will go more down the pub/sub route for sharing information with points of awareness scattered through an environment. Information en-masse is only relevant as per your requirement. In which case you can stream the information that is relevant to your scenario. It's no secret that some major products use pub/sub systems for telemetry and awareness of information. A centralised decision engine could subscribe to a set of events or streams and then push desired outcomes or decisions back in to the data bus instead of programming each node independently from a centralised control-plane controller and forwarding path management in the form of a centralised controller.

    This of course all points to proprietary mechanisms and I don't see any path for standardised control-plane architectures for fabrics or software networking yet. If there is work taking place in this field I would be keen to see it.

    Gah - spit balling here, but posts like this stir the mind!
    Replies
    1. Not necessarily. There is a standardized control-forwarding plane protocol for this, ForCES. It's an IETF standardized protocol although currently not widely adopted. We have done a very limited implementation of subscribing to events and receiving only what is relevant to the "centralized control plane" and we have achieved very good results. (http://dx.doi.org/10.1109/NETSOFT.2015.7116181)
  2. Again, great article and follow up from your previous post. The OpenFlow camp took the notion of centralize control plane to literally and like lemmings followed blindly down that path. There is great value in centralized computation and service orchestration, but the notion that clearly locatizable functions like local state also be centralized is nonsensical and wont scale; we learn that in distributed computing class 101.
    What you really want is central computation and the knowledge of the controller to be able to give each device type enough information that it needs to act semi-autonomously for those localized functions, but ensure that changes to topology and service or events that effect these are integrated back into the central computational controller. This is what we've done in the CPLANE NETWORKS Controller.
Add comment
Sidebar