OpenFlow and the State Explosion
While everyone deeply involved with OpenFlow agrees it’s just a low-level tool that can’t solve problems we couldn’t solve in the past (just like replacing Tcl with C++ won’t help you prove P = NP), occasionally you stumble across mindboggling ideas that are so simple you have to ask yourself: “were we really that stupid?” One of them that obviously impressed James Hamilton is the solution to load balancing that requires no load balancers.
Before clicking Read more, watch this video and try to figure out what the solution is and why we’re not using it in large-scale networks.
The proposal is truly simple: it uses anycast with per-flow forwarding. All servers have the same IP address, and the OpenFlow controller establishes a path from each client to one of the servers. In its most simplistic implementation, a flow entry is installed in all devices in the path every time a client establishes a session with a server (you could easily improve it by using MPLS LSPs or any other virtual circuit/tunneling mechanism in the core).
Now ask yourself: will this ever scale? Of course it won’t. It might be a good solution for long-lived sessions (after all, that’s how voice networks handle 800-numbers), but not for the data world where a single client could establish tens of TCP sessions per second.
A quick look back confirms that hunch: all technologies that required per-session state in every network device have failed. IntServ (with RSVP) never really took off on a global scale, and ATM-to-the-desktop failed miserably. The only two exceptions are global X.25 networks (they were so expensive that nobody ever established more than a few sessions) and voice networks (where sessions usually last for minutes ... or hours if teenagers get involved).
Load balancers work as well as they do because a single device in the whole path (load balancer) keeps the per-session state, and because you can scale them out – if they become overloaded, you just add another pair of redundant devices with new IP addresses to the load balancing pool (and use DNS-based load balancing on top of them).
Some researchers have quickly figured out the scaling problem and there’s work being done to make the OpenFlow-based load balancing scale better, but one has to wonder: after they’re done and their solution scales, will it be any better than what we have today, or will it just be different?
Moral of the story – every time you hear about an incredible solution to a well-known problem ask yourself: why weren’t we using it in the past? Were we really that stupid or are there some inherent limitations that are not immediately visible? Will it scale? Is it resilient? Will it survive device or link failures? And don’t forget: history is a great teacher.
Using anycast with layer-3 ECMP and source IP addresses as the hash key would probably be adequate for most services (although you have to watch out for clients that change addresses and make sure your backend application can handle them properly). Again, keeping as much state as possible (or at least a session identifier) on the client at Layer 7 can make this scale. No OpenFlow needed. But using anycast seems unnecessarily complicated - multiple stateless scale-out load balancers with DNS round-robin is a simpler and time-proven architecture.
Cookies can't help you with the session state; the only way to get away from the session state is to load balance based on source IP address hash, but that does seem a bit risky. Are you aware of any load balancers that are truly stateless (i.e. no active TCP session state)?
As for anycast, it's one of those "you love them or you hate them" architectures. Some people have got it working, love it and deride any other (DNS-based) solution.
Beside I am still of the "intelligence in the packet/PHB" camp.
You may be getting the folks at Big Switch Networks nervous ;)
Regards..
You can see it now: "FooApp 2.24 requires FooFlow 1.0 to be installed on the enterprise's routers". FooFlow will do something trivial, like compensate for FooApps poor content caching, and just like the rest of FooApp the OpenFlow component will be so badly written it runs wild every now and again.
I do look forward to good OpenFlow applications. For example, integrating a SIP Session Border Controller into an ISP network is a bit of a nightmare at the moment. An architecture with a SBC making the policy choices and tweaking flow admission control at the network edge seems obvious.
Insightful, thought provoking post, as always.
Cheers,
Brad
One of my observations from attending the Open Networking Summit last week was along similar lines: a lot of the use cases, and even academic studies around SDN/Openflow are less 'killer app' (or even headed that way) than simply re-implementing in most cases, and hopefully improving existing ways of solving/configuring network problems. There are several companies that are claiming that they have managed to get their hands around the state 'explosion' problem, but I suspect their focus on narrow use-cases makes this a much simpler problem to solve than for the general case. Hoping I'm wrong.
I do see the promise of SDN, especially in the movement that goes from 'configuring' your network to 'programming' your network. One of the most oft repeated questions I hear is whether we are ready for or need a Dev-Ops movement in networking. Potentially exciting and simultaneously scary thought - we can count on some # of lazy/inept Openflow apps. to make it into production.
Anyway, I do believe there are some truly "stateless" load balancers, that is, they maintain no per-flow map of source-to-destination. The widely used and open-source HAproxy has a stateless layer-3 source hashing mode. I believe there are multiple commercial load balancing appliances that have HAproxy at their core. There are also I believe other commercial solutions with "direct server return" that operate at layer 3 in a hash-based mode.
I agree that load balancers operating at layer 4 or higher must maintain TCP session state for at least the sessions they are currently handling. But that is fine in a scale-out scenario, as the state isn't shared (and likely remains in CPU cache).
Replicating TCP session state to another HA peer device so you can "heal" TCP sessions on failure simply will never scale as you mentioned. That's why sane application layer protocols and user-agents have a sensible disconnect/retry/backoff behavior available. HTTP and browsers make this easy with round-robin DNS, but even Microsoft SQL Server's TDS protocol has automatic disconnect/retry available against a pool of IPs.
If you just can't deal with failover at the application layer, F5 is glad to sell you a pair of $3k servers for $100k so you can do layer-4 load balancing with HA. Just make sure you test all the failure modes thoroughly in concert with with your applications and hardware first.
I’m having a hard time believing in the OpenFlow deployment model with the controller programming the network per-flow. it must introduce pretty bad setup latencies, not suitable for multi gigabit datacenter networks (controller-to-switch RTT, switching ASIC update time, controller performance, etc). I’ve yet to be convinced that it can be done in real time.
the demo itself is rather unconvincing to anyone serious about loadbalancing: not only they ran a test with a very low flow setup rate (~1 flow/s compared to thousands), but also reduced the problem to layer 4 load balancing, while in reality you often need L7 with session tracking, SSL offload, etc. non-naive implementation of that with OF (whatever future revision) is close to impossible, although could be an interesting (& theoretical) brain exercise.