I’m still convinced that architectures with centralized control planes (and that includes solutions relying on OpenFlow controllers) cannot scale. On the other hand, Big Switch Networks is shipping Big Cloud Fabric, and they claim they solved the problem. Obviously I wanted to figure out what’s going on and Andy Shaw and Rob Sherwood were kind enough to explain the interesting details of their solution.
This is a deep dive article focusing on scalability. If you’re looking for information on what Big Switch launched last week, read the excellent summary by Ethan Banks.
Every data center fabric solution trying to use centralized control plane faces (at least) three significant showstoppers on the path to true scalability:
- Linecard protocols. Running STP and LACP on thousands of interfaces is hard when you have to do it in real time using the dismal CPUs in existing hardware devices;
- Fast failure detection. Relying on light loss to detect link failure is overly simplistic. Eventually you’ll hit a faulty transceiver that will blackhole the traffic until someone figures out what the problem is, particularly since the OpenFlow control plane network usually doesn’t share fate with the data plane. The only way to solve this one is to run some OAM protocol between adjacent switches, and doing that through the controller every 100 msec with packet-out and packet-in messages won’t get you very far in terms of scalability.
- ARP. OpenFlow protocol includes no mechanism that would allow packet generation (or automatic responses) in the controlled switches – the controller has to deal with all control-plane protocols, including generating the ARP responses.
For more details on OpenFlow capabilities, shortcomings and scalability challenges, view the OpenFlow Deep Dive webinar.
Big Switch Networks claims they have solved all three problems with OpenFlow extensions. They run ARP and LACP proxies in their OpenFlow agent, which also includes BFD-like functionality:
- ARP tables are downloaded into switches with OpenFlow (probably using a special table ID and very particular flow matching format that specifies VLAN/segment, destination IP and MAC addresses instead of the matching entries), and the switch runs a local ARP agent that uses those tables to reply to the incoming ARP requests.
- LACP sessions are still run between the OpenFlow controller and external network devices, but once an LACP session is established, the LACP proxy in the physical switch takes over and talks to the external device until there’s a change in LACP status, at which time the OpenFlow controller takes over and figures out what needs to be done.
If the above description sounds like DLSw+ local-ack, you just dated yourself ;)
Extending OpenFlow to get the functionality you need to engineer your product sounds like the way to go to get things done, but it also flushes the OpenFlow-based vendor interoperability down the drain. At this moment, Big Cloud Fabric works with physical switches that are capable of running Switch Light OS. Numerous whitebox switches can do that, as can some switches made by Dell Force 10, but you cannot take just any OpenFlow switch and use it to build Big Cloud Fabric, which was the initial nirvana promised (and never delivered) by Open Networking Foundation and the whole orthodox OpenFlow/SDN movement.
I see two ways how this conundrum might evolve: either everyone else wakes up and realizes you need functionality similar to what Big Switch Networks implemented to scale OpenFlow-based fabrics (good luck with that), or we give up the whole controller-to-switch interoperability concept and focus on hardware/software separation (controller vendor software running on standard hardware platform made by multiple ODM/OEMs). Just keep in mind you might be locked into a single-vendor architecture one way or another and tread carefully.
From theory to practice
Do you know enough about OpenFlow? If you want to move beyond industry press “wisdom”, you (RFC 2119) MUST watch my 6-hour deep dive into intricacies of OpenFlow.