While everyone deeply involved with OpenFlow agrees it’s just a low-level tool that can’t solve problems we couldn’t solve in the past (just like replacing Tcl with C++ won’t help you prove P = NP), occasionally you stumble across mindboggling ideas that are so simple you have to ask yourself: “were we really that stupid?” One of them that obviously impressed James Hamilton is the solution to load balancing that requires no load balancers.
Before clicking Read more, watch this video and try to figure out what the solution is and why we’re not using it in large-scale networks.
The proposal is truly simple: it uses anycast with per-flow forwarding. All servers have the same IP address, and the OpenFlow controller establishes a path from each client to one of the servers. In its most simplistic implementation, a flow entry is installed in all devices in the path every time a client establishes a session with a server (you could easily improve it by using MPLS LSPs or any other virtual circuit/tunneling mechanism in the core).
Now ask yourself: will this ever scale? Of course it won’t. It might be a good solution for long-lived sessions (after all, that’s how voice networks handle 800-numbers), but not for the data world where a single client could establish tens of TCP sessions per second.
A quick look back confirms that hunch: all technologies that required per-session state in every network device have failed. IntServ (with RSVP) never really took off on a global scale, and ATM-to-the-desktop failed miserably. The only two exceptions are global X.25 networks (they were so expensive that nobody ever established more than a few sessions) and voice networks (where sessions usually last for minutes ... or hours if teenagers get involved).
Load balancers work as well as they do because a single device in the whole path (load balancer) keeps the per-session state, and because you can scale them out – if they become overloaded, you just add another pair of redundant devices with new IP addresses to the load balancing pool (and use DNS-based load balancing on top of them).
Some researchers have quickly figured out the scaling problem and there’s work being done to make the OpenFlow-based load balancing scale better, but one has to wonder: after they’re done and their solution scales, will it be any better than what we have today, or will it just be different?
Moral of the story – every time you hear about an incredible solution to a well-known problem ask yourself: why weren’t we using it in the past? Were we really that stupid or are there some inherent limitations that are not immediately visible? Will it scale? Is it resilient? Will it survive device or link failures? And don’t forget: history is a great teacher.