Should We Use OpenFlow for Load Balancing?
Yesterday I described the theoretical limitations of using OpenFlow for load balancing purposes. Today let’s focus on the practical part and answer another question:
@colin_dixon @ioshints and for a fair comparison: Would a $100k OF switch be able to act as proper LB?
— Kristian Larsson (@plajjan) December 3, 2015
I wrote about the same topic years ago here and here. I know it’s hard to dig through old blog posts, so I collected them in a book.
From the theoretical perspective, you could use an OpenFlow-enabled switch to implement L4 load balancing (up to rewrites to IP addresses and TCP/UDP port numbers), but not an application delivery controller that has to touch either application payload or TCP sequence numbers.
Can’t We Just Use the Controller to Implement ADC?
Sure you could – if you’re willing to send all the user traffic through the controller. I just fail to see what advantage OpenFlow would give you in this particular scenario – it would be easier and faster to put the controller code in the forwarding path without bothering with another layer of indirection (however, do consider RFC 1925 section 2.6 and the bonus points you'd get when using OpenFlow and referencing the great theoretical ideas others had in the past in your 2-column article).
Even the idea of using a hypothetical OpenFlow device with unlimited amount of memory (reeks of Turing machine to me) to implement ADC won’t work (not even in theory) – OpenFlow cannot reach beyond L4 ports.
Meanwhile on Planet Earth
There are at least four limiting factors that quickly kill most load balancing with OpenFlow ideas:
Amount of flow forwarding hardware in OpenFlow switches. Any load balancing functionality is by definition not destination-only forwarding (or we wouldn’t be talking about it) but something akin to PBR, which is implemented in specialized hardware (TCAM or equivalent) in high-speed network devices.
The size of that table is usually ridiculously low (for our current discussion) – maybe you’ll get 2000 entries on a multi-terabit switch.
You’ll find more details in OpenFlow Deep Dive webinar and regularly updated figures in Data Center Fabrics webinar.
TCAM update speed. Most switches can install around 1000 flows per second.
Hardware and OpenFlow implementation limits. Most high-speed forwarding hardware doesn’t support port rewrites. While many chipsets support IP address rewrite, no major vendor that published their OpenFlow implementation details implemented that functionality in their OpenFlow agents.
Yet again, the Data Center Fabrics webinar has the details I could collect from published documentation.
Increased latency. Unless the controller uses proactive flow setup (unlikely in load balancing scenarios), the reactive flow setup creates additional latency (at least the first packet has to go through the controller).
Considering all these limitations, OpenFlow-based load balancing makes sense only when you’re balancing a low number of high-bandwidth sessions toward an anycast server farm (so the load balancing switch doesn’t have to do address- or port rewrites). QED.
. @plajjan It _could_ be used for a small number of high-bandwidth connections (iSCSI/NFS sessions). Useless for regular LB.
— Ivan Pepelnjak (@ioshints) December 3, 2015
Even More OpenFlow
You’ll find even more implementation details and real-life limitations in the OpenFlow Deep Dive webinar, my SDN workshops, or in the OpenFlow Deep Dive session I’ll run at Interop Las Vegas 2016.
L7 load balancing is a solved problem both at small scale and at Google/Facebook scale. An architecture using centralized control plane is the wrong tool for this job.
Long-lived high-band sessions are an obvious exception.
I agree the real question is really what do you want to accomplish with LB balancing. Would you point me to articles that describe what are the main use cases for LB?
Openflow doesn't really do L4 forwarding efficiently as you explained. But it MAY(or not) be a better programming interface than scripts. I have the feeling that there's still value in OpenFlow Based LB because it's cheap. One could use ECMP groups to spread load to anycast server farms as you mentioned, and then use a software load-balancer to do the rest of the job. I'd assume that'd be much cheaper than 100K Vendor Hardware.
http://content.ipspace.net/bin/list?id=Scalable
I also covered L4-7 load balancing in more details in here:
http://my.ipspace.net/bin/list?id=DC30
... or you could start with Wikipedia ;)
Amazon must be doing something SDN-like for their ELB services, so it's an interesting question that someone out there has probably managed to do.
In the real world it will be much easier (maybe cheaper too) to just install an F5/A10/Netscalar/HAproxy solution.