So You’re an Open Source Shop? Really?

Thursday, November 6, 2014 10:23 +0100

So You’re an Open Source Shop? Really?

I carried out an interesting quiz during one of my Interop workshop:

How many use Linux-based servers? Almost everyone raised their hands;
How many use Apache or Tomcat web servers? Yet again, almost everyone.
How many run applications written in PHP, Python, Ruby…? Same crowd (probably even a bit more).
How many use Nginx, Squid or HAProxy for load balancing? Very few.

Is there a rational explanation for this seemingly nonsensical result?

The open-source load balancing products are as mature as their operating system and web application counterparts, they’re used by some of the largest web properties out there, and yet most enterprises prefer to use centralized hardware appliances. It definitely seems like at least some parts of networking remain stuck in the mainframe world, or maybe I’m missing something fundamental.

Have you tried using open-source load balancers? Do you run them in production? Did they work or did they fail badly? Did they lack features you badly need? Which features? Please share your experiences in the comments.

Recent posts in the same categories

data center

load balancing

9 comments:

Steven Iveson 06 November 2014 10:59

Ah, a subject close to my heart. Load balancing isn't for the server guys now is it? And we all know how conservative those network guys are.

In all seriousness, this really is a matter of fear of the unknown and an unwillingness to learn about the possibilities and then trust them in production. I hope this changes as network engineers are further exposed to both Linux and virtual appliances in general.

Other products you could add to your list are;

* Zen Load Balancer
* Balance
* Pound

Unknown 06 November 2014 11:02

Hi Ivan.

I don´t agree 100% ... Here in denmark, nginx and haproxy is used ALOT.

So the company i work in we do shared hosting apache,iis etc. All of our competitors also use some sort of opensource load balancing ( mostly nginx )

We try to not use netscaler, f5 etc. Because of price but also creating a dedicated hardware/vm loadbalancer moves configuration away from the APP people to the network people :)
Trust me the APP people way! better at load balancing their own APP.

Replies

Ivan Pepelnjak 06 November 2014 12:47

Yeah, I would expect to see plenty of open-source load balancing solutions in ISP/hosting environments. Commercial load balancers are just too expensive to be justified there.

Anonymous 06 November 2014 12:52

Last time I looked, VMware vShield Edge (got to love that capitalisation!) is in fact HAProxy under the hood. I learnt this when raising a support case and watching VMware troubleshoot it. So, more people are using it than may realise they are!

Anders 06 November 2014 18:23

I personally do favor both IPVS and nginx, but both in their respective fields and taking their respective strengths and weaknesses into account.

Commercial load balancers also often just don't offer exactly what's needed. For example, we're running loadbalancers who do perform highlevel backend availability checks and accordingly announce (or withdraw) a specific route for internal anycasted IP addresses. By installing those systems in multiple data centers, this results in a very high availability and low-latency. At worst, a failed service in one DC is automatically provided by the same service from a different DC - at some additional latency, but still accessible.
Most commercial loadbalancers in such a situation simply would like to pretend to be a DNS server and reply with low-TTL DNS-RRs - which consistently fails with Java applications (who ignored DNS-TTLs for years now), let alone give no solution if the service you're trying to load balance is your DNS service.

Please also don't forget about IPVS: Linux IP Virtual Server does look more like "typical" L2/L3-level load balancers and uses techniques like direct server return ("gatewaying" or "direct routing" in IPVS lingo). A smallish box with a 100 Mbit network connection can easily handle Gbits of traffic with dozens of backend servers - as any outgoing traffic (replies) doesn't pass the loadbalancer at all.

Proxy-based systems like nginx and haproxy are also very capable in terms of accepting thousands of connections in parallel, which may give you some relief if you're accustomed to an web servers who are easily taken by slowloris-attacks. nginx/haproxy do also terminate and create new backend connections, making them extremely flexible on your network: your backend servers cound be anywhere, there are no restrictions Have your balancer on your network, some backend nodes on AWS, some backend nodes in a colo somewhere around your corner, it doesn't matter (ignoring latency). Nginx/haproxy also do offer tons of L7-features which may result in a more complex configuration, but a very high benefit on the actual application. For example, static content could be served from a specifically tuned backend farm, while dynamic content is served by a different farm.
However, doing so does have impact on deployment and where or how to debug errors. If operations and development are a close team and share their knowledge, such setups do work out fine - otherwise, it may be hard to draw a line where and how some issue is going to be addressed. And some implementation and design issues do open up new questions.

Most interesting projects are where multiple balancing solutions needed to be united: proxy-based systems couldn't handle the bandwidth, packet-based systems couldn't deliver the needed features. Trying to to that with a commercial load balancer can be a task of its own.

Anonymous 06 December 2014 16:52

(At the risk of necroposting...)

The place I currently work in would likely count as an "open source shop" but sends the bulk of its traffic through a commercial load balancer. However, most of the *new* traffic on the network is going through HAProxy. New *clients* get HAProxy installed on them - the servers run health checks locally that write status to ZooKeeper, the clients discover what services are available by reading from ZooKeeper and configure HAProxy; they then connect to HAProxy on localhost to reach the servers.

HAProxy also has the advantage of better health checks for PostgreSQL than our commercial load balancer, but that's an aside.

Replies

Ivan Pepelnjak 06 December 2014 17:06

Fantastic - just the right way to go ;)) Thanks for sharing!

Anonymous 18 January 2015 13:42

I´m just curious. What kind of firewalls are being used in your datacenters? Is it the usual pair of Cisco/Fortigate/PaloAlto/Checkpoint/wahtever devices or does anyone take something like pfsense?

Anonymous 06 February 2015 10:55

Typically from what I have seen:

-ASA as internal FW, SRX for external network... sometimes DMZ are protected by checkpoint or SRX.
-Fortigate for branch or regional offices
-PA haven't seen in my enterprise NW experience of 6 years ^_^

Add comment