Problems in Loadblancing
With the expansion of internet, the userbase of most sites are growing exponentially. However the speed of the servers themselves are not growing fast enough. It is hence logical to conclude that these services have to be setup on multiple servers. Depending on what kind of service you are providing this could be a trivial task.
Problem 1
However, some applications are very touchy about which server the client connects to after the first hit. It is possible that the service itself is not scalable enough to allow user to switch between the two servers without interrupting the service. This requires some sort of session management to allow users to stick to one server after they log in.
Solution 1
There are three primary ways of getting this done. The first and foremost way is to resolve this issue is to setup loadbalancer to loadbalance using the Source-IP of the client. This will make sure that the client browser always goes to the same server for that particular session. This solution will work for most service providers.
Problem 2
However if the client IP address is the IP of a proxy server, it would mean that all the clients behind that IP might end up on the same server, which would create an undesirable effect of overloading that particular server. An even worse senario is that if different URLs are being obtained using distributed cache engines, makeing the same client reach the the same server using different cache engines, which could still break the application. (since each cache engine has its own IP which could end up on different servers)
Solutions 2
The second solution which is more sane on this front is the usage of Cookies. Most loadbalancers today are capable of understanding Cookies set by the servers and redirect the client to the right one. Most of them also have the capability of providing their own cookies to do the job.
Some of the good places to look for more information on cookies would be places like these http://www.cookiecentral.com/ , http://www.epic.org/privacy/internet/cookies/ , http://www.cis.ohio-state.edu/rfc/rfc2109.txt , http://developer.netscape.com/docs/manuals/communicator/jsguide4/cookies.htm
What is a cookie ?
A Cookie is nothing but a small piece of text set on your browser by the server and is sent back to the everytime your browser connects to it. I won't be telling you anything new when I say that this is a security/privacy problem, since the server can literally track you every time you login and log out.
Problem
There are a few problems with this implementation however. I still cant get everything working yet. Donno why, but here is a gist of problems I've noticed till now.
The first and probably the biggest problem in implementing this solution is that many security-aware organization/users are switching off cookies in thier browsers. This will almost always break applications which are cookie dependent. Sites like doubleclick have a lot of offer in this problem. About which you could read more at http://www.epic.org/privacy/internet/cookies/
Make sure your webserver is configured to EXPIRE your dynamic pages. http://www.mnot.net/cache_docs/#IMP-SERVER
Even if I ignore the first problem, I still couldn't get the cookie working properly with some of the cache engines over the net. Some of the cache engines don't use EXPIRE tag to cache at all. Making it difficult for the server to force an expiry.
Third problem is dependent on which kind of loadbalancer you are running. Some loadbalancers, like Resonate, F5 and I think Arrowpoint too, work better when they themselves give out the cookies. Most of the proxy implementation check for cookies when the browser sends back a request with a cookie attached. However the actuall issuing of a cookie happens in the previous GET/POST request when server replies back with a set-cookie. Resonate has a design issue due to which it can't handle this cookie (they call this problem "the first hit bug"). However I've noticed similar problems with other loadbalancers too. The solutions ofcourse is to ignore the server set cookies and use cookies set by the loadblanacer for loadbalancing.
Solution 3
The Third solution to this entire problem is to tag the URL itself with an ID which changes with each session. For example take this url for example
http://security.royans.net/test.html?THISISASESSIONID=1234
As u notice, even though I have a static html page, the URL has a ID attached to it which can be used when the client connects back. "Referer URL" always lists the last URL which sent the client browser to the new link. This value can be effectively used by a loadbalancer to track a user and keep him/her on the same server.
Though the problem is simple, there are lots of hurdles attached in implementing the right solution. The info I gathered was based on my experience. I'll be pleased to correct any factual errors in this document. Contributions for more info about loadbalancer implementation is always welcome
Problem 1
However, some applications are very touchy about which server the client connects to after the first hit. It is possible that the service itself is not scalable enough to allow user to switch between the two servers without interrupting the service. This requires some sort of session management to allow users to stick to one server after they log in.
Solution 1
There are three primary ways of getting this done. The first and foremost way is to resolve this issue is to setup loadbalancer to loadbalance using the Source-IP of the client. This will make sure that the client browser always goes to the same server for that particular session. This solution will work for most service providers.
Problem 2
However if the client IP address is the IP of a proxy server, it would mean that all the clients behind that IP might end up on the same server, which would create an undesirable effect of overloading that particular server. An even worse senario is that if different URLs are being obtained using distributed cache engines, makeing the same client reach the the same server using different cache engines, which could still break the application. (since each cache engine has its own IP which could end up on different servers)
Solutions 2
The second solution which is more sane on this front is the usage of Cookies. Most loadbalancers today are capable of understanding Cookies set by the servers and redirect the client to the right one. Most of them also have the capability of providing their own cookies to do the job.
Some of the good places to look for more information on cookies would be places like these http://www.cookiecentral.com/ , http://www.epic.org/privacy/internet/cookies/ , http://www.cis.ohio-state.edu/rfc/rfc2109.txt , http://developer.netscape.com/docs/manuals/communicator/jsguide4/cookies.htm
What is a cookie ?
A Cookie is nothing but a small piece of text set on your browser by the server and is sent back to the everytime your browser connects to it. I won't be telling you anything new when I say that this is a security/privacy problem, since the server can literally track you every time you login and log out.
Problem
There are a few problems with this implementation however. I still cant get everything working yet. Donno why, but here is a gist of problems I've noticed till now.
The first and probably the biggest problem in implementing this solution is that many security-aware organization/users are switching off cookies in thier browsers. This will almost always break applications which are cookie dependent. Sites like doubleclick have a lot of offer in this problem. About which you could read more at http://www.epic.org/privacy/internet/cookies/
Make sure your webserver is configured to EXPIRE your dynamic pages. http://www.mnot.net/cache_docs/#IMP-SERVER
Even if I ignore the first problem, I still couldn't get the cookie working properly with some of the cache engines over the net. Some of the cache engines don't use EXPIRE tag to cache at all. Making it difficult for the server to force an expiry.
Third problem is dependent on which kind of loadbalancer you are running. Some loadbalancers, like Resonate, F5 and I think Arrowpoint too, work better when they themselves give out the cookies. Most of the proxy implementation check for cookies when the browser sends back a request with a cookie attached. However the actuall issuing of a cookie happens in the previous GET/POST request when server replies back with a set-cookie. Resonate has a design issue due to which it can't handle this cookie (they call this problem "the first hit bug"). However I've noticed similar problems with other loadbalancers too. The solutions ofcourse is to ignore the server set cookies and use cookies set by the loadblanacer for loadbalancing.
Solution 3
The Third solution to this entire problem is to tag the URL itself with an ID which changes with each session. For example take this url for example
http://security.royans.net/test.html?THISISASESSIONID=1234
As u notice, even though I have a static html page, the URL has a ID attached to it which can be used when the client connects back. "Referer URL" always lists the last URL which sent the client browser to the new link. This value can be effectively used by a loadbalancer to track a user and keep him/her on the same server.
Though the problem is simple, there are lots of hurdles attached in implementing the right solution. The info I gathered was based on my experience. I'll be pleased to correct any factual errors in this document. Contributions for more info about loadbalancer implementation is always welcome
Comments
You won't catch Google/Msn/Slashdot or any other high performance web site using something as privative as a load balancer for preserving state. Persistence will be built into the application from the ground up taking into account all the performance issues surrounding that and dealing with them.
It is a trivial task to implement persistence for web applications on any platform apache or IIS (using a combination of URL/Cookie/Database).
When you need to scale you just stick a fast pair of layer 4 load balancers on the front of the farm and grow. It really shouldn't matter what IP address people are coming from and on a properly designed site it never will.