Big Cloud Fabric: Scaling OpenFlow Fabric

Monday, February 2, 2015 08:29 +0100

Big Cloud Fabric: Scaling OpenFlow Fabric

I’m still convinced that architectures with centralized control planes (and that includes solutions relying on OpenFlow controllers) cannot scale. On the other hand, Big Switch Networks is shipping Big Cloud Fabric, and they claim they solved the problem. Obviously I wanted to figure out what’s going on and Andy Shaw and Rob Sherwood were kind enough to explain the interesting details of their solution.

Long story short: Big Switch Networks significantly extended OpenFlow.

Every data center fabric solution trying to use centralized control plane faces (at least) three significant showstoppers on the path to true scalability:

Linecard protocols. Running STP and LACP on thousands of interfaces is hard when you have to do it in real time using the dismal CPUs in existing hardware devices;
Fast failure detection. Relying on light loss to detect link failure is overly simplistic. Eventually you’ll hit a faulty transceiver that will blackhole the traffic until someone figures out what the problem is, particularly since the OpenFlow control plane network usually doesn’t share fate with the data plane. The only way to solve this one is to run some OAM protocol between adjacent switches, and doing that through the controller every 100 msec with packet-out and packet-in messages won’t get you very far in terms of scalability.
ARP. OpenFlow protocol includes no mechanism that would allow packet generation (or automatic responses) in the controlled switches – the controller has to deal with all control-plane protocols, including generating the ARP responses.

For more details on OpenFlow capabilities, shortcomings and scalability challenges, view the OpenFlow Deep Dive webinar.

Big Switch Networks claims they have solved all three problems with OpenFlow extensions. They run ARP and LACP proxies in their OpenFlow agent, which also includes BFD-like functionality:

ARP tables are downloaded into switches with OpenFlow (probably using a special table ID and very particular flow matching format that specifies VLAN/segment, destination IP and MAC addresses instead of the matching entries), and the switch runs a local ARP agent that uses those tables to reply to the incoming ARP requests.
LACP sessions are still run between the OpenFlow controller and external network devices, but once an LACP session is established, the LACP proxy in the physical switch takes over and talks to the external device until there’s a change in LACP status, at which time the OpenFlow controller takes over and figures out what needs to be done.

If the above description sounds like DLSw+ local-ack, you just dated yourself ;)

Extending OpenFlow to get the functionality you need to engineer your product sounds like the way to go to get things done, but it also flushes the OpenFlow-based vendor interoperability down the drain. At this moment, Big Cloud Fabric works with physical switches that are capable of running Switch Light OS. Numerous whitebox switches can do that, as can some switches made by Dell Force 10, but you cannot take just any OpenFlow switch and use it to build Big Cloud Fabric, which was the initial nirvana promised (and never delivered) by Open Networking Foundation and the whole orthodox OpenFlow/SDN movement.

I see two ways how this conundrum might evolve: either everyone else wakes up and realizes you need functionality similar to what Big Switch Networks implemented to scale OpenFlow-based fabrics (good luck with that), or we give up the whole controller-to-switch interoperability concept and focus on hardware/software separation (controller vendor software running on standard hardware platform made by multiple ODM/OEMs). Just keep in mind you might be locked into a single-vendor architecture one way or another and tread carefully.

9 comments:

Unknown 02 February 2015 10:20

Thanks for taking the time to talk with us Ivan! If your readers are interested, all of our OpenFlow extensions are actually available in open source. Just check out the descriptions in our "Loxi" OpenFlow library project here: https://github.com/floodlight/loxigen/tree/master/openflow_input

jsicuran 02 February 2015 17:29

Good post Ivan, It just seems with all the "this and that" for control and data plane enhancing and scaling it comes across as just shoveling the slush after a winter mix of snow/rain from one side of the driveway to another. Lets move our old friends of protocols and their FSM, arp, lacp, insert here bla bla bla over here now to do the same thing. Yes I was shoveling some slush today after a Northeast winter blast.

Suresh 01 April 2015 16:51

Thanks Ivan... I read posts closely and I think they are very helpful. However, I realized that Big switch's idea of adding local ARP table + agent to the switch is probably one of the several ways this issue could be resolved.
With new OF switches coming out that can support million flows - I think adding an arp rule shouldn't be a problem.

Ivan Pepelnjak 01 April 2015 16:53

You cannot generate ARP replies with existing OpenFlow actions (if that's what you had in mind). ARP request has to be sent to the controller, and that kills scalability.

Suresh 01 April 2015 21:06

I agree with you that ARP request has to go to the controller for the first time but then controller can install an ARP rule in the OVS switch to handle the future ARP messages. If host moves then controller can purge the old ARP rule and insert the new rule in the new switch

Unknown 10 April 2015 10:34

Hello.

I recently came accross this article about building an openflow router with OVS:
http://dtucker.co.uk/hack/building-a-router-with-openvswitch.html

They used a flow table 105 "ARP responder" to send out ARP replies for their virtual default gateway address.
Couldn't this be used to send any arbitrary ARP reply?
The controller could install a flow entry into the "ARP responder" flow table every time the switch receives an unknown ARP request ...

They used OF 1.3. But of course, the switch needs to have match fields and set-field/copy-field actions implemented for ARP fields.

KR

Ivan Pepelnjak 10 April 2015 10:40

From their description: "In this table we use some OVS-Jitsu to take an incoming ARP Request and turn it into an ARP reply" - seems to be an OVS-specific extension.

Also. look at the sample flow tables (at the bottom of the post). Table 105 uses NXM flows (Nicira extensions).

I would love to see ARP handling within standard OpenFlow, but it's not there yet...

jsicuran 13 April 2015 18:38

Just made me wonder about the security exploits possible. We had arp poising in the past can similar "hi-jinx" be done via deadly OVS/any controller deployed ARP rule?

Ivan Pepelnjak 13 April 2015 19:33

All exploits that were possible in the past are still possible in the brave new world. Just because something is centralized doesn't mean it's secure.

It is, however, easier to enforce consistent policy across the whole network, which _could_ make certain exploits harder and/or impossible. See, for example, http://blog.ipspace.net/2012/10/ipv6-first-hop-security-ideal-openflow.html

Note: I'm not saying BSN is doing anything along those lines, I'm just saying it can be done ;)

Recent posts in the same categories

SDN

data center

fabric

9 comments: