Follow-up: Load Balancers and Session Stickiness

My Why Do We Need Session Stickiness in Load Balancing blog post generated numerous interesting comments and questions, so I decided to repost them and provide slightly longer answers to some of the questions.

Warning: long wall of text ahead.

Cool Stuff First

Lukas Tribus introduced me to PROXY, a protocol proposed by HAproxy team and now implemented by several major web servers. With PROXY a load balancer signals the original client’s IP address to the application server, and because that information is sent before the original client payload, the solution works even when you terminate SSL sessions on the web server, or when IPv6 clients are accessing IPv4 servers.

Thank you!

Know Thy Problem

Lukas and David Barroso pointed out that you can enormously reduce the complexity of load balancing if you use source address hash to select the destination server. This approach works very well in practice assuming that:

  • You’re OK with round robin load balancing (potentially using static weights). If you feel you should adapt server load based on its responsiveness you need more complex solution;
  • You’re OK with occasional session loss, either user session loss – for example losing shopping cart data – or HTTP session disconnect, following a topology change (and rehashing).

There are solutions that minimize the impact of rehashing (like Microsoft’s Ananta that I described in SDN Use Cases webinar, or what Fastly did). Yet again, these solutions do introduce complexity. There is no free lunch.

The Enterprise View

Many enterprise **application owners are not willing to accept the limitations of a scalable load balancing solution, so we can’t use the simple approaches that introduce little or no state in the load balancing components. This is where the excellent set of questions asked by Steven Iveson comes in:

As most browsers open multiple 'parallel' connections to a single (V)IP/site, I wonder if the complexity introduced from a troubleshooting, tracing and monitoring POV are worthwhile. This might inform your thoughts around TCP and HTTP session equality btw.

More traditional web pages use a simple approach:

  • Browser HTTP request triggers a script that creates the HTML page;
  • HTML page references tons of additional resources (scripts, images, CSS stylesheets).

Dynamic web pages might start with a simple loader page (or a page showing some initial information) and download additional information in the background (see: AJAX).

In any case, based on information collected by HTTP Archive very small percentage of the payload needed to display a web page is HTML, and the user session state is relevant only to scripts executed on the web server, not to static resources (images…) downloaded from the web server.

Furthermore, web pages commonly download components from multiple domains anyway, which makes troubleshooting both easier (from load balancing perspective) and more interesting at the same time.

Ditto for the overhead of multiple servers being required to perform lookups against the shared state 'database'. Presumably they would have to do so for every request in case the state had changed (via a connection on another server). This obviously won't be an issue with HTTP/2.0 use.

HTTP requests are stateless, and HTTP/2.0 does not change that. Every single web server scripting environment uses the same base approach to implement session state in for the server-side scripts:

  • Identify user session ID using a browser cookie or some other mechanism (URL parameter);
  • Read saved session state (default: from a disk file, recommended: from a shared in-memory database);
  • Execute the script;
  • Save the modified session state.

Typical shared state database solutions use memcached or some other high performance in-memory key/value store. These solutions are usually fast enough – a single Memcached instance can handle 200K requests/second, and it’s very easy to set up a share-nothing scale-out cluster.

HTTP/2.0 won’t help much if you’re using process- or thread-based web servers like Apache. These servers consume fixed amount of resources per connection and thus try to close client sessions as soon as the client stops communicating with the server.

The availability of the shared state 'database' itself needs to be addressed. So more HA considerations, more complexity, etc.

True. However, these tools tend to be surprisingly robust and easy to deploy. Also keep in mind that you’re looking for best-effort high performance and not transactional consistency anyway.

The load balancer is required (for just the load balancing) anyway (although there are other ways) so why not avoid all the above issues? Perhaps the ability to not lose state when a server fails is enough of a benefit?

I firmly believe you should design your solution in a way that (A) minimizes complexity, (B) minimizes state and (C) handles whatever state is necessary in the right place and in optimal way.

Keeping application-level state in a network component is (IMHO) the wrong architecture, more so when it’s relatively easy to get the right solution up and running.

As for not losing state, if you’re looking for transactionally consistent user session state, then you need to store the session data (or the parts that matter, like shopping cart content) into a real transactional database.

The ridiculously expensive load balancer option is taken because that's what the CIO/org demand and allow themselves to be sold, not because it's what's needed. That's a whole different subject and problem.

In some cases, you're forced to buy the ridiculously expensive load balancer because that's the only way to make stuff handed over from development work. I just wanted to point out there are other options, so you might want to go and have a chat with app developers (or web server admins) and save your company some money and yourself some extra work (and wonderful troubleshooting experience).

OTOH, I know organizations who bought expensive load balancer instead of using open source software so they'd have someone to sue when it fails. At least some of those same organizations use open source software to run the applications. Go figure...

View from the Other Side

Dmitri Kalintsev (a long-time friend now working for a load balancing vendor) added the perspective from the other side:

Deviating somewhat from the topic of session stickiness and swinging to the question "why would you buy an expensive LB". I'd say there are reasons beyond just distributing traffic to backend nodes that prompt people to spill $$$. LB, or rather "ADC" can, and often does, provide a bunch of additional functionality that either doesn't belong in the app, or is shared. For example, generating server response performance metrics feed for your ops, URI request routing, WAF/DLP, SSO, etc.

As always, I agree with some of his arguments, and disagree with others:

  • Generating performance metrics from a load balancer perspective is obviously a very good thing, as it’s the device in the application stack closest to the user.
  • I wrote about WAFs almost seven years ago and my perspective hasn’t changed.

Implementing request routing in front of the application stack might make sense, but then you have to ask yourself:

  • Where is your single source of truth?
  • Where will you look when you need to figure out how the application stack works?
  • Who configures the load balancer (and thus request routing) and who’s responsible for what?
  • How will you troubleshoot the application errors and how many teams will that involve?

Embedding load balancing and request routing functionality into the application stack makes perfect sense. The best way to implement it is to use a dedicated virtual load balancing instance per application.

Splitting application functionality across centralized devices managed by different teams is what we’ve been doing for years. We’ve seen the results, and they are not exactly encouraging.

On apps using open source: you're supposedly setting up an LB to protect your app users from backend failures, which is what your LB is there for. In my book this means that LB's availability is more important than that of an individual scale-out backend server. Which means that you may see risk profiles differently, and be more willing to pay for one and not the other.

That’s a perfectly valid argument, but so far I never heard it from end-customers when I asked them that same question. It’s also interesting that most people running large-scale web properties use simple open source load balancers.

It seems “it depends” is still the correct answer ;)

More information

You might find the load balancing section in Data Center 3.0 webinar useful, and the slides for my university course are still online.

For a wider view focused on architecture and design, explore the Building Next-Generation Data Center course.


  1. > Embedding load balancing and request routing functionality into the application stack makes perfect sense.

    We're in full and complete agreement here. :)

    Goes for the WAF, too. The patching argument you made in your linked WAF post is probably less relevant today with software appliances and deployment models that create complete application stack and do blue/green handover.
  2. Just to clarify my comment about HTTP/2.0 - I was referring to the fact that it only opens a single tcp connection to a site/fqdn and thus there's no need to accommodate for the likelihood of a single client having active connections to multiple different servers (and possibly changing state via those connections).
    1. While I completely agree with that, do keep in mind that Apache-based web servers tend to close connections quickly (because idle connections burn threads), so the next request from the same user might open a new HTTP/2.0 session
  3. Yes indeed, although just using the same settings and not adjusting an Apache configuration to accommodate the differences between how 1.1 and 2.0 work wouldn't be wise. That won't stop many though. Cheers
Add comment