August 27, 2007

Loadbalancer for horizontal web scaling: What questions to ask before implementing one.

A single server, today, can handle an amazing amount of traffic. But sooner or later most organizations figure out that they need more and talk about choosing between horizontal and vertical scaling. If you work for such an organization and also happen to manage networking devices, you might find a couple of loadbalancers on your desk one day along with a yellow sticky note with a deadline on it.

Loadbalancers, by definition, are supposed to solve performance bottlenecks by distributing or balancing load between different components its managing. Though you would normally find loadbalancers in front of a webserver, a lot of different individuals have found other interesting ways of using it. For example I know some organizations have so much network traffic that they can't use just one sniffer or firewall to do their job. They ended up using some high end loadbalancers to intelligently loadbalance network traffic through multiple sniffers/firewalls attached to them. These are rare, so don't worry about it.

But in general, if you are having a web scalability issue, you would probably start looking at hardware solutions first. And if you do, here are my recommendation on questions you need to ask before you investigate how to set it up.

  • Is the loadbalancer really needed ?

    • Identify the bottlenecks in the system. Adding webservers/appservers will not solve the problem if database is the bottleneck.

      • BTW, If you can replicate read only database to multiple servers, you might be able to use a loadbalancer to balance that traffic...

    • If most of the traffic is due to static content, investigate apache's ability to set better cachable headers. Or use a CDN(Content distribution network) like akamai/limelight to offload cachable objects.

    • DNS and SMTP are examples of protocols which are designed to be automatically loadbalanced. If an SMTP server fails to respond, the DNS protocol will help an SMTP client to find alternative SMTP servers for a destination. If your organization has controls over the protocol they are using, they should investigate the possibility of using DNS as the loadbalancing mechanism.

  • Is it a software or a hardware loadbalancer?

    • Most people can't think of loadbalancer as anything else but a dedicated hardware. Interestingly, chances are you are already using software loadbalancers in your network without knowing it (like the DNS loadbalancer I mentioned above). I've seen a decline in commercial web software loadbalancers in the recent years, which has been replaced by open source components like mod_jk loadbalancer, perlbal,etc.

    • In this particular writeup, I'm going to focus on hardware loadbalancer to keep it short.

  • Do you need failover ?

    • If you need loadbalancer failover, you should be buying in pairs

    • Some loadbalancer work in Active-Active mode, where both loadbalancers can be functional, while others allow only Active-Passive mode.

    • The term "failover" can mean many things. Make you you ask the right questions to understand what a loadbalancer really offers.

      • It could mean TCP session failover where every single TCP connection will be failed over.

      • It could also mean HTTP session failover (where one session is defined by one unique Cookie). If a loadbalancer supports only this mode, every single user connected to the loadbalancer will notice a blip when the primary loadbalancer dies. More often than not, this is what a loadbalancer vendor usually provides. And unfortunately not everyone in pre-sales is aware of this "minor" detail.

  • How do you want to do configuration management ?

    • Not all brands are made equal. Some are easy to manage and others aren't. I personally prefer CLI over GUI on any day.

    • But more than the CLI/GUI, I want the ability to revision control and compare different versions of configuration files. The current brand we use at work, unfortunately, doesn't provide an easy way to do this. If you are supporting a big operation with multiple loadbalancers and have to replicate same/similar setup in multiple places, then this is something you shouldn't compromise on.

  • How many servers are we talking about ?

    • If your loadbalancer device has ports on it to attach the servers physically to it, you should make sure you don't have more servers than the number of ports

    • In most cases, however, you can add a 100BaseT switch and add more servers to it. But regardless, having an approximate number of servers will help you decide some of the other questions later.

  • Will all of these server be part of the same farm/cluster ?

    • Some organizations setup different websites on the same loadbalancer and set it up in such a way that the loadbalancer distributes load to different farm/cluster of servers depending on which domain name the client requested for.

    • Some loadbalancers can also inspect the URL and send requests with different paths (in the same domain) to different server farms. For example "/images" could go to one server and "/signup" could go to another one.

    • There are still others who might keep a set of servers on standby mode (not active) waiting to be brought up automatically in an event of problems with the primary clusters.

  • Will all of the servers have the same weights when loadbalancing ?

    • For example are there some servers which should get more traffic than others because they are faster ?

  • Is session stickyness important ?

    • If a user ends up on a particular webserver, is there any reason why that user should continue to stay on that webserver ? There are different kinds of session stickiness which I can think off.

      • The most popular kind is probably IP based stickiness where traffic from same IP always goes to the same webserver. This was a great way of loadbalancing until companies like AOL decided that they will loadbalance outgoing traffic using different proxy servers, effectively changing source IP address of traffic coming from the same client.

      • My favorite session stickiness mechanism is Cookies. Cookies can be used as session tracking IDs to associate a user session with a particular webserver. There are many different ways of implementing this of which these are the few interesting ways I've used

        • Allow the loadbalancer to set a cookie for you in each session without the tracking cookie.

        • Most web application servers like PHP and java use cookie names like "PHPSESSIONID" or "JSESSIONID" which is an excellent session identifier which a loadbalancer can track.

        • There are a few other interesting cookie options... but I'd rather not discuss it here at this moment.

    • If you really need session stickyness, you should investigate further if your application is really horizontally scalable. Most often, sticky session feature is used as a bandaid to temporarily give an impression of scalable web app design, which could, in future, prove disastrous.

    • Session stickiness comes with some other configuration baggage as well.

      • You need to decide how long an idle session should be considered active before its shutdown. If you have a lot of traffic and if you set this session timeout to be too high, it can quickly fill up your memory.

      • If possible set the timeout to be as close to the application timeout on your app server.

  • Does the traffic have to be encrypted using SSL ?

    • Some loadbalancers have built in SSL engines.

    • Others have capabilities to offload them to SSL accelerators.

    • One could also set it up in such a way that traffic is decrypted on the apache server after the loadbalancing is done. Please be cautious here. If you decide to do SSL decryption on the webserver, you are effectively disallow the loadbalancer to inspect HTTP packets which can be otherwise used to make intelligent routing decisions

  • Do you need compression ?

    • Some Load balancers which come with SSL engines support on-the-fly compression which can significantly speed up user experience if you have a lot of compressible objects.

  • Debugging

    • Whether you like it or not, one day you will have some problem with your loadbalancer and you will be asked by their support team to get some sniffs for them. This usually is a painful process, especially if the equipment is in production network. One feature which can simplify this is allowing the device to sniff on itself. This is not a must-have, but probably a like-to-have feature.

  • Other services

    • Checklist for other services... if you want

      • NTP - Have its time always be in sync with rest of your network

      • Syslog - Ability to send syslog messages.

      • Mail - Ability to send mails when problems happen

      • SNMP - Monitoring and traps

  • External factors

    • A standard sized company investing in loadbalancers usually don't invest on a single loadbalancer. They usually buy a pair for production network, and another pair for QA or staging networks. By the time you add the support costs, it gets very expensive. If you make an investment like that, make sure the company selling that to you is still alive a couple of years down the line to support you.

    • Loadbalancer's are not as reliable as phones. Finding bugs in a loadbalancer is much easier than you think. If you don't have a reliable support team who is willing to help you patch the code fast enough, you might see some downtime or performance issues

    • At the same time, if the company does release a lot of patches on a weekly or monthly basis, you should find out how stable its code is. I usually ask how long it has been since the last stable release.


Israel LHeureux said...

Great post! Just want to add my $0.02.

For even larger scaling, at least one loadbalancer (Juniper DX series)supports a notion of Active-N, instead of merely active-active or active-standby.

With just Active-Active, you have to keep each SLB's load below 50%, or you won't have failover. (51%+51%=crash)

ActiveN splits the SLB load among up to 64 individual SLBs, with cascading failover, and it can even work with just one public VIP:Port.

(Disclaimer: I used to work for the SLB company pre-Juniper acquired them, but the product is still worth checking out.)

Advanced users might want to investigate other features as well, such as SSL + HTTP compression + Muliplexing back-end connections as well as flexibly rewriting requests or response data (headers and content, both in and out) and on-board caching. All of these can really improve performance, but different vendors support them differently.

Always test an SLB in your environment, with your architecture and your data. And do external, real performance analysis, not simulated "tests" on a LAN.


Malcolm said...

Nice post, one of the best I've read. Logical and to the point.
I always stress to customers that if the application is designed to be horizontally scaled then growth and performance is easy to handle.
Chucking in a load balancer without any forward planning is daft.
I also think cookies should be in the application persistence layer not on the load balancer which is a bit of a sticky plaster approach.

John Chen said...

Good insights that could save a bit a grief and time.