A long-time reader sent me a series of questions about the impact of WAN partitioning in case of an SDN-based network spanning multiple locations after watching the Architectures part of Data Center Fabrics webinar. He therefore focused on the specific case of centralized control plane (read: an equivalent of a stackable switch) with distributed controller cluster (read: switch stack spread across multiple locations).
He started with…
You said that the Centralized control has one IP address per subnet and if the two DC’s lose connectivity, it will result in one DC completely unreachable. Is it because the Controller only exists in one DC?
That’s how most stackable switch solutions work, see for example HP IRF (mentioned here because they were one of the few vendors “brave” enough to advertise Cross-DC Switch Stack stupidity).
The location of the controller cluster and its operations (active/active or active/standby) are architectural decisions, and different architectures went different ways. For example, VMware NSX-V has controllers spread across multiple locations, while VMware NSX-T requires all controllers to be in one location.
The behavior under network partitioning (this is what DCI link failure really is) also depends on controller architecture (does it implement control plane or only management plane) and quality of its implementation. Shutting down the minority part of the partitioned network is the most brutal approach to solving this problem. There are better solutions – here are some of the typical behaviors observed in the wild:
- Complete shutdown of minority part of the network (most stackable switches);
- Complete shutdown of control plane with data plane operating as long as there are no topology changes or control-plane requests that cannot be handled locally. NSX-T uses this approach, and Big Switch made it a bit better by offloading LACP and ARP to the edge switches.
- Minority part of the network reverting to non-controller mode. This is NSX-V approach – minority site uses fabric-wide flood-and-learn.
- Minority part of the network becoming read-only. This is Cisco ACI approach, and works well only when the controllers remain a management-plane component. The moment you introduce control plane to the controller you’re almost forced to go back to one of the previous approaches.
- Minority part of the network losing write access to shared objects. This is how Cisco ACI Multi-Site controller and NSX-T federation work. They both deploy a full-blown controller cluster on each site, and use an umbrella system to synchronize configurable objects across sites. Each location remains fully operational and manageable, and you can even create local objects when undergoing network partition. I expect Microsoft Azure orchestration system to work in a similar way.
- No impact when each location becomes an independent management-plane entity. This is how AWS regions are implemented.
When choosing a multi-site controller solution always ask yourself “what happens when the inter-site link fails?” and “am I OK with that behavior?”
You won’t find the answer to the first question in vendor whitepapers for obvious reasons. You’ll have to dig deep into the product documentation; or you could find the answer for the most common data center controller products in VMware NSX, Cisco ACI or Standard-Based EVPN webinar.
More to Explore
A quick search for controller failure on my blog resulted in these blog posts:
- Impact of Controller Failures in Software-Defined Networks
- Controller Cluster Is a Single Failure Domain
- How Hard Is It to Think about Failures?
- On SDN Controllers, Interconnectedness and Failure Domains
- OpenFlow Fabric Controllers Are Light-years Away from Wireless Ones
You will also find the impacts of controller failures discussed in these webinars: