Big Cloud Fabric: Scaling OpenFlow Fabric
I’m still convinced that architectures with centralized control planes (and that includes solutions relying on OpenFlow controllers) cannot scale. On the other hand, Big Switch Networks is shipping Big Cloud Fabric, and they claim they solved the problem. Obviously I wanted to figure out what’s going on and Andy Shaw and Rob Sherwood were kind enough to explain the interesting details of their solution.
Long story short: Big Switch Networks significantly extended OpenFlow.
Every data center fabric solution trying to use centralized control plane faces (at least) three significant showstoppers on the path to true scalability:
- Linecard protocols. Running STP and LACP on thousands of interfaces is hard when you have to do it in real time using the dismal CPUs in existing hardware devices;
- Fast failure detection. Relying on light loss to detect link failure is overly simplistic. Eventually you’ll hit a faulty transceiver that will blackhole the traffic until someone figures out what the problem is, particularly since the OpenFlow control plane network usually doesn’t share fate with the data plane. The only way to solve this one is to run some OAM protocol between adjacent switches, and doing that through the controller every 100 msec with packet-out and packet-in messages won’t get you very far in terms of scalability.
- ARP. OpenFlow protocol includes no mechanism that would allow packet generation (or automatic responses) in the controlled switches – the controller has to deal with all control-plane protocols, including generating the ARP responses.
For more details on OpenFlow capabilities, shortcomings and scalability challenges, view the OpenFlow Deep Dive webinar.
Big Switch Networks claims they have solved all three problems with OpenFlow extensions. They run ARP and LACP proxies in their OpenFlow agent, which also includes BFD-like functionality:
- ARP tables are downloaded into switches with OpenFlow (probably using a special table ID and very particular flow matching format that specifies VLAN/segment, destination IP and MAC addresses instead of the matching entries), and the switch runs a local ARP agent that uses those tables to reply to the incoming ARP requests.
- LACP sessions are still run between the OpenFlow controller and external network devices, but once an LACP session is established, the LACP proxy in the physical switch takes over and talks to the external device until there’s a change in LACP status, at which time the OpenFlow controller takes over and figures out what needs to be done.
If the above description sounds like DLSw+ local-ack, you just dated yourself ;)
Extending OpenFlow to get the functionality you need to engineer your product sounds like the way to go to get things done, but it also flushes the OpenFlow-based vendor interoperability down the drain. At this moment, Big Cloud Fabric works with physical switches that are capable of running Switch Light OS. Numerous whitebox switches can do that, as can some switches made by Dell Force 10, but you cannot take just any OpenFlow switch and use it to build Big Cloud Fabric, which was the initial nirvana promised (and never delivered) by Open Networking Foundation and the whole orthodox OpenFlow/SDN movement.
I see two ways how this conundrum might evolve: either everyone else wakes up and realizes you need functionality similar to what Big Switch Networks implemented to scale OpenFlow-based fabrics (good luck with that), or we give up the whole controller-to-switch interoperability concept and focus on hardware/software separation (controller vendor software running on standard hardware platform made by multiple ODM/OEMs). Just keep in mind you might be locked into a single-vendor architecture one way or another and tread carefully.
With new OF switches coming out that can support million flows - I think adding an arp rule shouldn't be a problem.
I recently came accross this article about building an openflow router with OVS:
http://dtucker.co.uk/hack/building-a-router-with-openvswitch.html
They used a flow table 105 "ARP responder" to send out ARP replies for their virtual default gateway address.
Couldn't this be used to send any arbitrary ARP reply?
The controller could install a flow entry into the "ARP responder" flow table every time the switch receives an unknown ARP request ...
They used OF 1.3. But of course, the switch needs to have match fields and set-field/copy-field actions implemented for ARP fields.
KR
Also. look at the sample flow tables (at the bottom of the post). Table 105 uses NXM flows (Nicira extensions).
I would love to see ARP handling within standard OpenFlow, but it's not there yet...
It is, however, easier to enforce consistent policy across the whole network, which _could_ make certain exploits harder and/or impossible. See, for example, http://blog.ipspace.net/2012/10/ipv6-first-hop-security-ideal-openflow.html
Note: I'm not saying BSN is doing anything along those lines, I'm just saying it can be done ;)