Could We Use OpenFlow for Load Balancing?

It all started with a tweet Kristian Larsson sent me after I published my flow-based forwarding blog post:

My reply was obviously along the lines “just because you could doesn’t mean that you should”:

… which Colin Dixon quickly improved to:

Further into the conversation I told Kristian that OpenFlow is the wrong tool for the job, and he replied with:

It was time for the “I know this great proof but I can’t write it in 140 characters”, so here’s the blog post.

BTW, if you want to argue about capabilities of OpenFlow, you RFC 6919 MUST go through a facts-based deep-dive training like my OpenFlow webinar or read the specs. Gospels delivered at conferences or Open Something events will never tell you the real story.

What Could OpenFlow Do?

Let’s start with the theory.

You can use OpenFlow to match on any part of the packet header and select output port (or do a number of other actions) based on that match, which is good enough to implement load balancing toward anycast server farm (see Direct Server Return blog post and video for more details).

OpenFlow has actions that rewrite IP addresses and port numbers, so you could implement NAT or PAT, including L4 load balancing and SNAT.

OpenFlow has no actions that would work beyond TCP/UDP port numbers. It’s thus impossible to implement any functionality that load balancing vendors love to call application delivery controller in an OpenFlow-controller switch. You can’t even insert payload into a TCP session with OpenFlow, because you can’t touch TCP sequence numbers.

The theoretical limit of OpenFlow 1.5 is thus what F5 calls Fast L4 profile.

Reactive or Proactive?

Keep in mind that someone (= controller) has to set up the load balancing flows. The flows could be preset based on known network topology (= proactive flow setup) or created dynamically based on actual traffic (= reactive flow setup).

In proactive flow setup, the controller creates the flows and stays away from the data plane. In reactive mode, the controller gets involved whenever a new flow needs to be created based on actual traffic load.

Obviously, you don’t have to use session-based flows (that doesn’t work well even in virtual switches). You could use crude IP source+destination-based load balancing, and get the controller involved only when a different source IP address appears in the network.

How realistic is all this?

As always, there’s a huge gap between theory and practice. More about that tomorrow… and of course you’re invited to write your observations in the comments.

Even More OpenFlow

If you need more red-pill exposure to OpenFlow, watch the OpenFlow Deep Dive webinar, attend one of my SDN workshops (which include all my SDN digital content), or register for the OpenFlow Deep Dive session I’ll run at Interop Las Vegas 2016.

5 comments:

  1. Even with simple L3/L4, a pure openflow load-balancer would require the controller be involved in every connection setup. It might be able to outperform a 1997-era LocalDirector, but not much else.

    More realistic is a slow/fast-path approach. E.g., a standalone load-balancer directs the initial connection setup and then uses openflow to fast-path the remainder of the connection. I'm guessing this is how NSX's distributed load-balancer will work.

    HP's SDN App Store does have a Kemp load-balancer adapted for OpenFlow. The solution brief indicates that openflow is used for load and health measurements only, not for connection setup or for fast-path mode.
    Replies
    1. According to Kemp whitepaper, they use information from the SDN controller to identify bottlenecks between the load balancer and individual servers in the server farm.

      Looks like a solution in search of a problem to me. Throwing more bandwidth at the problem is probably cheaper and definitely less complex.
  2. You can use OpenFlow for ECMP routing, which can be used as a crude form of load distribution ( with multiple static routes to the same virtual IP, via different real server IPs ). It is only "balanced" in a statistical sense, there is no mechanism to detect liveness of servers, and the return path may get asymmetrical - which limits the applicability to a select set of use cases
  3. Jen Rexford and team had a fun paper about this idea at HotNets a few years ago -- it's a quick and insightful read:

    Ivan's comment about the possibility of doing this in a storage context are exactly right -- Coho Data (my company) does this in an enterprise storage product today. Using Openflow onto Arista ToRs, we are able to present a single IP address for NFS v3, but then dynamically steer traffic over large numbers of 10Gb storage nodes. This turns out to be a pretty big win in enterprise contexts where you can't change the client storage protocol (e.g. to deploy pNFS), and where you want to scale traffic over large numbers of very fast flash devices.

    If you're interested, I recently did a talk on this aspect of our system, as well as some related challenges facing network connectivity/perf in storage systems..

    @andywarfield
    Replies
    1. Andy any links to this article or this talk? A link to the summarised and technical content will be much appreciated.
      Thanks
Add comment
Sidebar