Should We Use OpenFlow for Load Balancing?

Yesterday I described the theoretical limitations of using OpenFlow for load balancing purposes. Today let’s focus on the practical part and answer another question:

I wrote about the same topic years ago here and here. I know it’s hard to dig through old blog posts, so I collected them in a book.

From the theoretical perspective, you could use an OpenFlow-enabled switch to implement L4 load balancing (up to rewrites to IP addresses and TCP/UDP port numbers), but not an application delivery controller that has to touch either application payload or TCP sequence numbers.

Can’t We Just Use the Controller to Implement ADC?

Sure you could – if you’re willing to send all the user traffic through the controller. I just fail to see what advantage OpenFlow would give you in this particular scenario – it would be easier and faster to put the controller code in the forwarding path without bothering with another layer of indirection (however, do consider RFC 1925 section 2.6 and the bonus points you'd get when using OpenFlow and referencing the great theoretical ideas others had in the past in your 2-column article).

Even the idea of using a hypothetical OpenFlow device with unlimited amount of memory (reeks of Turing machine to me) to implement ADC won’t work (not even in theory) – OpenFlow cannot reach beyond L4 ports.

Meanwhile on Planet Earth

There are at least four limiting factors that quickly kill most load balancing with OpenFlow ideas:

Amount of flow forwarding hardware in OpenFlow switches. Any load balancing functionality is by definition not destination-only forwarding (or we wouldn’t be talking about it) but something akin to PBR, which is implemented in specialized hardware (TCAM or equivalent) in high-speed network devices.

The size of that table is usually ridiculously low (for our current discussion) – maybe you’ll get 2000 entries on a multi-terabit switch.

You’ll find more details in OpenFlow Deep Dive webinar and regularly updated figures in Data Center Fabrics webinar.

TCAM update speed. Most switches can install around 1000 flows per second.

Hardware and OpenFlow implementation limits. Most high-speed forwarding hardware doesn’t support port rewrites. While many chipsets support IP address rewrite, no major vendor that published their OpenFlow implementation details implemented that functionality in their OpenFlow agents.

Yet again, the Data Center Fabrics webinar has the details I could collect from published documentation.

Increased latency. Unless the controller uses proactive flow setup (unlikely in load balancing scenarios), the reactive flow setup creates additional latency (at least the first packet has to go through the controller).

Considering all these limitations, OpenFlow-based load balancing makes sense only when you’re balancing a low number of high-bandwidth sessions toward an anycast server farm (so the load balancing switch doesn’t have to do address- or port rewrites). QED.

Even More OpenFlow

You’ll find even more implementation details and real-life limitations in the OpenFlow Deep Dive webinar, my SDN workshops, or in the OpenFlow Deep Dive session I’ll run at Interop Las Vegas 2016.

8 comments:

  1. If you want a L4 LB you are probably better off playing with ECMP, the hashing algorithm (enabling the resilient feature to avoid rehashing), next-hop groups and a script to test your services and enable/disable available next-hops. At least compared to using openflow : )
  2. Perhaps forward looking this is a discussion of P4 rather than OpenFlow, possibly in concert with whatever Barefoot Networks is doing.
    Replies
    1. The real question is "what do you want your LB to do", and if you want to do anything beyond baseline 5-tuple load balancing, hardware is not the way to go... at which point all the discussions about OpenFlow/P4/whatever become just a nice excuse to generate 2-column articles.

      L7 load balancing is a solved problem both at small scale and at Google/Facebook scale. An architecture using centralized control plane is the wrong tool for this job.

      Long-lived high-band sessions are an obvious exception.
  3. Really good post.

    I agree the real question is really what do you want to accomplish with LB balancing. Would you point me to articles that describe what are the main use cases for LB?

    Openflow doesn't really do L4 forwarding efficiently as you explained. But it MAY(or not) be a better programming interface than scripts. I have the feeling that there's still value in OpenFlow Based LB because it's cheap. One could use ECMP groups to spread load to anycast server farms as you mentioned, and then use a software load-balancer to do the rest of the job. I'd assume that'd be much cheaper than 100K Vendor Hardware.
    Replies
    1. You might find something relevant in this slide set:

      http://content.ipspace.net/bin/list?id=Scalable

      I also covered L4-7 load balancing in more details in here:

      http://my.ipspace.net/bin/list?id=DC30

      ... or you could start with Wikipedia ;)
  4. Load balancers also monitor the service health of the destination server, so unless you integrate monitors into the controller then this isn't going to be easy at all. Maybe find a programmer to integrate OpenFlow into Linux HAproxy...

    Amazon must be doing something SDN-like for their ELB services, so it's an interesting question that someone out there has probably managed to do.

    In the real world it will be much easier (maybe cheaper too) to just install an F5/A10/Netscalar/HAproxy solution.
  5. All these discussion had took place 2 years backs! So, now do we have any solutions for minimizing the latency ? has the limitations on theoretical part decreased ?
    Replies
    1. Nothing has changed in the last 2 years... apart from everyone moving from OpenFlow to the next shiny new thing.
Add comment
Sidebar