Designing any scalable web architecture would be incomplete without investigating â€œload balancersâ€. There used to be a time when selecting and installing load balancers was an art by itself. Not anymore.
A lot of organizations today, use Apache web servers as a proxy server (and also as a load balancer) for the backend application clusters. Though Apache is the most popular web server in the world, it also considered over-weight if all you want to do is proxy a web application. The huge codebase which apache comes with and the separate modules which need to be compiled and configured with it, could soon become a liability.
HAProxy is a tiny proxying engine which doesnâ€™t have all the bells and whistles of apache, but is highly qualified to act as a HTTP/TCP proxy server. Here are some of the other wonderful things I liked about it
- Extremely tiny codebase. Just two runtime files to worry about, the binary and the configuration file.
- Compiles in seconds. 10 seconds the last time I did it.
- Logs to syslog by default
- Can load balance HTTP as well as regular TCP connections. Can easily load balance most non-HTTP applications.
- Can do extremely detailed performance (and cookie capture) logging. It can differentiate backend processing time from the end-user request completion time. This is extremely helpful in monitoring performance of backend services.
- It can do sticky load balancing out of the box
- It can use application generated cookies instead of self-assigned cookies.
- It can do health monitoring of the nodes and automatically removes them when health monitors fail
- And it has a beautiful web interface for application admins who care about number.
A few other notes
- HAProxy really doesnâ€™t serve any files locally. So its definitely not a replacement for your apache instance if you are using it to serve local files.
- It doesnâ€™t do SSL, so you sill need an SSL engine in front of it if you need secure http.
- HAProxy is not the only apache replacement. Varnish is a strong candidate which can also do caching (with ESI). And while you are at it, do take a look at Perlbal which looked interesting.
- Live HAProxy stats page
- HAProxy Manual
- HAProxy Architecture guide
- Other HAProxy docs
- HA cluster using HAProxy
Finally a sample configuration file with most of the features I mentioned above configured for use. This is the entire thing and should be good enough for a production deployment with minor changes.
log loghost logfac info
listen http_proxy 0.0.0.0:8000
option httpchk HEAD /app/health.jsp HTTP/1.0
cookie SERVERID insert
capture cookie JSESSIONID len 50
capture request header Cookie len 200
capture request header Host len 50
capture request header Referer len 200
capture request header User-Agent len 150
capture request header Custom-Cookie len 15
appsession JSESSIONID len 32 timeout 3600000
server server1_name server1:8080 weight 1 cookie server1_name_cookie check inter 60000
server server2_name server2:8080 weight 1 cookie server2_name_cookie check inter 60000