Scalability Stories (15th Sept) Mysql Proxy, Cluster Fire System, Facebook apps and Twitter

September 15th, 2007

There have been a lot of interesting stories from last week for me to share. If you have interesting links you want to add to this post please forward them to me or post a comment to this post.

  • Sun is planning to acquire majority stake of “Cluster File Systems, Inc“. [ Talk on Lustre File System ]
    • Sun intends to add support for Solaris Operating System (Solaris OS) on Lustre and plans to continue enhancing Lustre on Linux and Solaris OS across multi vendor hardware platforms. As previously announced in July 2007, Sun also plans to deliver Lustre servers on top of Sun’s industry-leading open source Solaris ZFS solution, which is one of the fastest growing storage virtualization technology in the marketplace.
  • Making Facebook Apps scale on cheap : An interesting writeup By Surj Patel about Scalability issues Facebook itself and the 3rd Party apps on it have. Also discusses EC2 and S3 as an alternative solution to scale in a cost effective way.
    • Welcome to Amazon and S3 and EC2 — processing power (EC2) and storage (S3) on demand. These services let you access computational power and storage only when you need it and, better yet, pay only for what you use. The last time I checked, it was 10 cents an hour for the server, 10 cents for every gigabyte of data written and 18 cents per gigabyte read out – all for a virtual box with 1.7Ghz x86 processor/1.75Gbytes of RAM/250Mbs of bandwidth. Nor are you limited to one usage; use as many as you need or want and can afford.
  • Interesting blog post on Google Reader Numbers. They have made significant progress lately and thanks to the scalable architecture they now store 10 terabytes of raw feed data from 8 million feeds in their index.
  • Todd Hoff has an interesting writeup on Scaling Twitter: Making Twitter 10000 Percent Faster. And an interview with Biz Stone (Co-Founder of Twitter) here.
  • If you use Mysql and your app is not yet designed to handle federated database architecture, you should take a look at a new product in development called “Mysql Proxy
      The most powerful feature is Read/Write Splitting which allows you to scale a application which is unaware of replication automatically cross several slaves without changes to your application. Instance Scale Out we say. The Proxy also became a 1st class citizen in the MySQL world with full docs, win32 support and easy to install.

Popularity: 31%

Scaling Powerset using Amazon’s EC2 and S3

September 13th, 2007

The first thing most doc-com companies do before going public is setup an infrastructure to provide the service. And though it might sound straight forward to most of you, it can be a very expensive affair. To come up with the right kind of infrastructure for any new service a few key architectural decisions have to be made.

  1. Design the infrastructure and architecture to handle the traffic peaks (not average)
  2. Design with long term scaling in mind
  3. Design power and cooling infrastructure to support the servers
  4. Hire Hardware+Systems+network support staff ( more if 24/7 operations are required)
  5. Add buffers to support failures and short term grow requirements
  6. Take into account lead times of ordering and procuring hardware (which can be weeks if not months)..
  7. And a few others.. which I won’t bore you with here.

The point is that the initial capital investment can be in millions even before the first customer starts using the service. And once the capital investment is made, it is very difficult to scale down the operations if the plans change.
Powerset Inc is a secretive search startup with ambitions of out-smarting Google in its own turf. Based in San Francisco, this search startup is working on building a better search engine using natural language processing capabilities to understand the users question a little better before answering it. And just like any other search company its technology is a CPU hungry beast just waiting to be unleashed. Powerset could have gone the way most dot-com companies have gone, but instead they decided to try out Amazon’s EC2 (Elastic Cloud Computing) and S3(Simple Storage Service) to augment their computational needs.

Powerset has been repoted to be testing a 400 server instance EC2 cluster with Hadoop running Map/Reduce and HDFS (Hadoop Distributed File system). This does get a little tricky on EC2 because of absence of persistent storage (OS is re-initialized after every reboot). So they use a combination of HDFS and remote copying process to sync the data to their local network. Since there is no charge to move data from EC2 to S3, they have been thinking about implementing a native HDFS and S3 interface to move data around within Amazon’s network itself.

EC2 is charged on a per instance per hour usage basis, which means Powerset can bring new nodes online during heavy demand and shut off unused nodes at a flick of a switch at night. Powerset guys also built their own EC2 image configured to automatically join the HDFS cluster after every boot up. In an event of a node failure, Hadoop can take care of data replication, and EC2 takes care of replacing the failed node with a new one.

Amazon EC2 costs 10 cents an hour per instance. If you have to run a 400 node cluster for 1 month thats only about 30000. Based on the performance benchmarks, it looks like the actual CPU throughput from each of the EC2 instance is roughly equivalent of 1Ghz PIII. 72 dollars a month for that kind of server is not too cheap, but just like car leasing, atleast u don’t have to pay upfront and manage it.

So lets do the math. A regular AMD 64bit dual core, 2 cpu server with about 8GB of ram costs about 10000 USD which excluds the cost of hosting, power, cooling and maintainance. Based on some comments on Amazon forum this is about 2 to 3 times faster than the EC2’s infrastructure.If you had to replace the CPU computation power of this new hardware with 8 to 12 server instances on EC2, you would have spent about 700 to 800 dollars a month. It will take a company using EC2 infrastructure close to 12 months before they would have to pay 10000 towards EC2 computational services for the same amount of computation power. And remember that 10000 amount didn’t take into account colocation, power, cooling and general server administration which can be significant as well. Also remember that 12 months is actually 12 months of actual computational usage.. which could over a period of 2 to 4 years depending on how often the instances are used.

However, I also have to point out that there are a few things to look out for. The maximum physical memory available is about 1.7GB which is relatively tiny if you are used to 8 to 16 GB of ram on a 64 bit hardware. And though CPU/Memory might scale horizontaly for some applications, cross-server communication can be extreemly expensive for some. Unless your application is designed to scale horizontally with under 1.7 GB of ram, I would seriously advice you against using EC2 until you figure out how to change that.

I’ve blogged about both S3 and EC2 before and it continues to facinate me. Success of companies like Slideshare.net, and the decision of companies like Powerset to use AWS is something which I’ll watch closely over time.

References
Links about Hadoop, and how to use it on Amazon EC2/S3

Other References about Powerset and Amazon

Popularity: 55%

P2P network scalability

September 11th, 2007

Youtube is said to be pushing about 25 petabytes per month which is about 77 Gbps sustained data rate on an average. The bandwidth usage at the peaks would be even higher. Thanks to Limelight networks, Youtube doesn’t really need to scale or provision for that kind of bandwidth and based on the some reports from 2006 it had cost them close to 4 million a month back then. Youtube and services like that have to invest a lot in their infrastructure before they can really launch their service and though using shared Content delivery networks is not ideal, its probably not a bad deal. In Youtube’s case, it helped them survive until Google bought it out.

Newer Internet television service providers, however need not build their services around the traditional CDN model. Joost Network architecture presentation from Colm MacCarthaigh is an interesting example to discuss to prove my point. Joost was founded by the same guys who founded Kazaa and Skype . Kazaa was one of notorious P2P file sharing application (used the FastTrack protocol) which died after RIAA revolt. Skype, as it happens, also has its roots in P2P network [ Skype protocol , Skype scalability problems ] and has been doing pretty good over the years. So its no surprise that Joost chose P2P model again to distribute part of the content to its users. Joost has a cluster of servers which serve as “original seeders” or all content, and rely on the P2P network to distribute the popular content. The number of Joost servers, however, is not small because it still also has to address the “long tail” of requests which are not among the popular content.

Two of the most important network optimization ground rules, which I noticed from the talks, was that they decided against using firewalls or loadbalancers in its network. Thats good, because the firewalls and loadbalancers wouldn’t have kept up with the bandwidth anyway. But even more impressive was that they designed the entire P2P application/network-algorithm to intelligently find and peer with nodes and supernodes closest to them. Joost tries to do this this in two different ways. The first one is using IP address (prefix aware) as proximity sensors (two IPs which start with similar set of numbers/octets will probably be in the same network). The second way to detect proximity is using Network AS Numbers which can work irrespective of what the IP addresses start with. [ Colm also mentioned about AS proximity detection below ]

A comment to blog @ ipdev.net by Colm himself
We have many gigs of transit, and are adding more. I’m not sure who claimed it’s near HD quality, I like to think it’s about NTSC, sometimes better, never quite PAL.We have some efforts in the code to save transit costs, there is very very basic prefix awareness, and we’re adding AS-level awareness using live BGP data. I have looked at adding AS adjacency information, ie prefer AS-adjacent peers, but it’s a lot of work and the US internet is relatively poorly mapped, so I don’t think this will come soon.

Its possible that Joost might still require CDNs to serve the long-tail content, but the work they have done to build the P2P infrastructure would not only save them an a lot of mulah in the long run but would also allow them to easily scale to be larger than any of the current CDNs if they do get that big.

Interestingly companies like Microsoft are not sitting idle watching the world go by. Microsoft has been working on something called Avalanche and I think they already have a prototype client out which you can download and try it out yourself.

Microsoft Secure Content Downloader

Some MSCD clients may be connected to each other via peer connections, forming a ‘cloud’ of clients. Pieces of the file you are downloading are sent through these peer connections between clients, as well as through connections with the file server. As a member of the cloud, your computer both serves as a client and server to other members of the cloud. Data destined for the cloud may be routed through your computer and sent to other cloud members. The other cloud members connected to you will be able to access only pieces of the file you are downloading via MSCD – they have no access to any other data on your computer.

You are only connected to other clients while you are downloading a file via MSCD. When the file has finished downloading – or when you pause or cancel the download, or exit the application – you disconnect from the cloud. Once you disconnect from the cloud, you will no longer have any connections to any other members in the cloud and no data will be routed through your computer.The Microsoft Secure Content Downloader (MSCD) is a peer-assisted download manager capable of securely downloading specific files. MSCD is intended for consumers who are downloading from a home PC, or business users whose computers are not behind a corporate firewall. If you use MSCD from behind a corporate firewall, you may be unable to download content, and may adversely affect other clients’ ability to download content.

Of course there are also other rumors that apple is trying this out… but you know how these things go.

Anyway, the point is that in spite of occasional gliches P2P is probably the way to go if you want to cut long term costs of CDN. Personally, I believe that Skype had no other way out. I mean can you think off all the phone calls in the world going through the same first phone exchange in New Haven, Connecticut where it all started ? P2P models are still evolving and its hard to imagine there will be a one-solution-fits-all. But if you know one, please let me know.

Popularity: 38%

Sharding: Different from Partitioning and Federation ?

September 9th, 2007

Ive been hearing this word “sharding” more and more often, and its spreading like fire. Theo Schlossnagle, the author of “Scalable internet architecutres” argues that federation is form of partitioning, and that sharding is nothing but a form of partitioning and federation. Infact, according to him, Sharding has already been in use use for a long time.

I’m not a dba, and I don’t pretend to be one in my free time either, so to understand the differences I did some research and found some interesting posts.

The first time I heard about “Sharding” was on Been Admininig’s blog about Unorthodox approach to database design (Part I and Part II). Here is the exact reference…

Splitting up the user data so that User A exists on one serverwhile User B exists on another server, each server now holds a shard ofthe data in this federated model.

A couple of months ago Highscalability.com picked it up and made it sound (probably unintentionally) that sharding is actually different from Federation and Partitioning. Todd’s post also points at Flickr using sharding.The search for Flickr architecture lead me to Colin Charles’ post about Federation at Flickr: A tour of the Flickr architecture where he does mention shards as a component of Federation key. Again no mention of Sharding being anything new.

Federation Key Components:

  • Shards: My data gets stored on my shard, but the record ofperforming action on your comment, is on your shard. When making acomment on someone elses’ blog
  • Global Ring: Its like DNS, you need to know where to go and whocontrols where you go. Every page view, calculate where your data is,at that moment of time.
  • PHP logic to connect to the shards and keep the data consistent (10 lines of code with comments!)

Based on the discussions on these and other blogs, “Shards” sounds more like a terminology used to describe fragments of data which is federated across multiple databases instead of an architecture by itself. I think Theo Schlossnagle has a valid argument. If any of you disagree I’m interested to hear what you have to say. A clearer definition between sharding and federation would be very helpful as well.

Here are more references to Shard/Sharding.

      Popularity: 49%

      Adventures of scaling eins.de

      September 7th, 2007

      Patrick Lenz, founder and lead developer of freshmeat.net was also responsible for the relaunch of another website eins.de which recently moved from php to ruby. Eins.de site serves about 1.2 million dynamic pages a day. He wrote a series of articles describing how they redesigned the site to scale for growth. I found these articles very informative with a extreemly mature discussion of the colorful world of scalability.

      Here is the final summary of what they ended up after 4 months of optimizations

      Systems optimization

      Code optimization

      Popularity: 22%

      Scalability stories from Sept 6th 2007

      September 6th, 2007
      1. A 55 minute talk by ‘ Stewart Smith’ from MySQL AB, about Mysql Clusters. He talks about NDB storage engine and synchronous replication between storage nodes. Also talks about new features in 5.1 including cluster to cluster replication, disk based data and a bunch of other things. And another Mysql talk on Google about Performance Tuning Best practices for Mysql.
      2. An interesting talk with Leah Culver about how Pownce was created. They use LAMP(Python) stack with Perlbal, Memcached, Django, AIR with Amazon S3 as the backend storage.
      3. Discussion about  High-Availability Mongrel Packs using Seesaw
      4. A blog about “Future of Data Center Computing” mentions Terracotta Sessions. If you read my previous post about “Session, state and scalabililty” and understood the problem, do look at this as a solution as well
      5. EC2 and S3 are being used more than before. Unfortunately because storage on EC2 doesn’t persist across reboots creative ways of keeping the data alive has to be found. Here is a talk about Redundant Mysql Replication using EC2 and S3.
      6. Found an interesting post by IBM engineers on how to setup a web cluster in 5 easy steps

      Blogged with Flock

      Popularity: 22%

      Session, state and scalability

      September 2nd, 2007

      In my other life I work with a medium scale web application which has had many different kinds of growing problems over time. One of the most painful one is the issue about “statelessness”. If I could only give one recommendation to anyone building a brand new web application, I’d say “go stateless“. But going stateless is not the same as going session-less. One could implement a perfectly stateless web architecture which still uses sessions to authenticate, authorize and track user activity. And to complicate matters further, when I say stateless, I really mean that the server should be stateless, not the client.

      Basic authentication
      Most interactive web applications today which allow you to manipulate data in some form require authentication and authorization mechanisms to control access. In the good old days “Basic Authentication” was the most commonly used authentication mechanism. Once authenticated, the browser would send the credentials to the server with every subsequent HTTP request. Over time, cookie and URL based session tracking mechanisms took over, which is what we are using in most of our web based applications today. In a one-server farm, you probably don’t need to pay attention to how these authentication, authorization or tracking mechanisms works. But once you have multiple servers in the farm, you do need to verify that a clients authenticated to one of the server is considered authenticated by all of the servers. Since, in “Basic Authentication”, the browser sends the authentication credentials with every request it doesn’t matter which server your session requests goes to. Each request is treated as a new session and credentials are verified every single time.

      Transition to sessions
      As traffic and security concerns grew, some of the smarter web applications switched to using “sessionids” in cookies and URLs. In this design, servers issue a sessionid (like a train/plane ticket) for every successful authentication and requests the client to send the sessionid instead of the authentication credentials for every subsequent request. This not only reduces the chances of someone sniffing your credentials, it also reduces the work done on the server because all it has to do now is check if the sessionid was issued by it in the last few minutes/hours (and is active), instead of validating the user against the all the users in the database for every single request.

      Session replication
      To understand which HTTP requests are part of which session, the server has to issue a unique HTTP session identifier to each of the user sessions. As it happens most web application servers also maintain an internal “sessionid” for each session, so inserting this sessionid in the cookie/url seemed to be the most natural thing to do. Unfortunately, if you have done any web programming, you will notice that unless you do something special, these default sessionids are local to each application server. Without doing special magic, like setting up application server in clustering mode, or persisting session information on shared resource (database/NFS) it would be difficult for one server to know which sessions are active on other application servers. This is where the scalability problem comes in. Replicating a lot of session information between 2 nodes might be simple, and 10 nodes might be possible.. but this architecture is not very scalable without a little extra work.

      Anatomy of a session
      “Sessions”, have three primary purposes.

      1. It helps group together HTTP requests coming from same browser.
      2. It can cache authentication credentials and key user info after user authenticates.
      3. It allows caching of other information in session which might be temporary in nature.

      If you have ever tried to create a new user account on a website, chances are that you would have gone through a couple of pages before the registration was completed. Most web applications keep the information from intermediate pages in temporary session variables within the application server. Developers love to use temporary session variables for such information and this approach of storing temporary information can grow into a beast which cannot be replicated or shared using shared resource. To design a scalable website now, not only would one have to persist the sessionid and authentication credentials onto some kind of shared resource, one would also have to figure out what to do with these temporary session variables.

      Since sessions are created only once per session, and since authentication happens only once too, it should be possible to keep that kind of information in a central database or shared network cache like memcached. The temporary session variables on the other hand can be very noisy (changes frequently) and may need not be critical enough to be available to the other servers. One needs to find a balance between good performance and reliability. I know organizations which have decided to accept some level of data loss with temporary session variables to improve over all performance.

      But the key thing to remember here is that every time a session is created or authenticated it should be copied over to some shared scalable resource. It could be something like a cluster of memcached servers, or a cluster of replicated database servers. This will guarantee that if the user does switch application servers the requests can be authenticated by checking this shared resource before continuing with what they were doing.

      Session vs State
      The information about where the user is or was within an application at any given moment is called the “state”. If that information is kept only on the server and is not replicated you will probably see errors of some kind when servers fail or switches user to another server. If the application server does persist the sessionid and user credentials in a central database, then, as far as the session is considered it wouldn’t matter which server the user goes to. Most of the new web applications today try to maintain the state information within the browser itself which frees up the server from the responsibility of storing, maintaining and replicating state information across all the servers.

      Final thoughts

      Loadbalancers: If you are doing what I did, you’d be calculating the probability of a user switching servers mid-session. In our environment we noticed about one in 50 servers fail every month. Such disasters will force all sessions created on that server to another server. Occasionally we also see issues with loadbalancer, which can reset all sticky sessions across all servers. This isn’t that bad. But, if you are running a CPU intensive web application, session stickiness can at times make some servers more loaded than others and if your traffic grows enough to need multiple loadbalancers, or have multiple global sites running in active-active mode, you will eventually need the ability handle server switching/failovers.

      REST (Representational State Transfer): The REST architecture does recommend some of the ideas I addressed above, but in its strictest interpretation one is not allowed to create Cookies. In my personal opinion a Cookieless world, though ideal, is not pragmatic.

      Popularity: 33%

      Scalability Stories for Aug 30

      August 30th, 2007
      1. I found a very interesting story on how memcached was created. Its an old story titled “Distributed caching with memcached“. I also found an interesting FAQ on memcached which some of you might like.
      2. Inside Myspace is another old story which follows Myspace’s growth over time. Its a very long and interesting read which shouldn’t be ignored.
      3. Measuring Scalability tries to put numbers to the problem of scalability. If you have to justify the cost of scalability to anyone in your organization, you should atleast skim through this page
      4. I found a wonderful story on the humble architecture of Mailinator and how it grew over time on just one webserver. It receives approx 5 million emails a day and runs the whole operation pretty much in memory with no logs or database to leave traces behind. And here is another page from the creator of Mailinator abouts its stats from Feb.
      5. Finally another very interesting presentation/slide on the topic of “Scalable web architecture” which focuses primary on LAMP architecture.

      Popularity: 19%

      Thoughts on scalability

      August 28th, 2007

      Here is an interesting contribution on the topic from Preston Elder 

      I’ve worked in multiple extremely super-scaled applications (including ones sustaining 70,000 connections at any one time, 10,000 new connections each minute, and 15,000 concurrent throttled file transfers at any one time - all in one application instance on one machine).

      The biggest problem I have seen is people don’t know how to properly define their thread’s purpose and requirements, and don’t know how to decouple tasks that have in-built latency or avoid thread blocking (and locking).

      For example, often in a high-performance network app, you will have some kind of multiplexor (or more than one) for your connections, so you don’t have a thread per connection. But people often make the mistake of doing too much in the multiplexor’s thread. The multiplexor should ideally only exist to be able to pull data off the socket, chop it up into packets that make sense, and hand it off to some kind of thread pool to do actual processing. Anything more and your multiplexor can’t get back to retrieving the next bit of data fast enough.

      Similarly, when moving data from a multiplexor to a thread pool, you should be a) moving in bulk (lock the queue once, not once per message), AND you should be using the Least Loaded pattern - where each thread in the pool has its OWN queue, and you move the entire batch of messages to the thread that is least loaded, and next time the multiplexor has another batch, it will move it to a different thread because IT is least loaded. Assuming your processing takes longer than the data takes to be split into packets (IT SHOULD!), then all your threads will still be busy, but there will be no lock contention between them, and occasional lock contention ONCE when they get a new batch of messages to process.

      Finally, decouple your I/O-bound processes. Make your I/O bound things (eg. reporting via. socket back to some kind of stats/reporting system) happen in their own thread if they are allowed to block. And make sure your worker threads aren’t waiting to give the I/O bound thread data - in this case, a similar pattern to the above in reverse works well - where each thread PUSHING to the I/O bound thread has its own queue, and your I/O bound thread has its own queue, and when it is empty, it just collects the swaps from all the worker queues (or just the next one in a round-robin fashion), so the workers can put data onto those queues at its leisure again, without lock contention with each other.

      Never underestimate the value of your memory - if you are doing something like reporting to a stats/reporting server via. socket, you should implement some kind of Store and Forward system. This is both for integrity (if your app crashes, you still have the data to send), and so you don’t blow your memory. This is also true if you are doing SQL inserts to an off-system database server - spool it out to local disk (local solid-state is even better!) and then just have a thread continually reading from disk and doing the inserts - in a thread not touched by anything else. And make sure your SAF uses *CYCLING FILES* that cycle on max size AND time - you don’t want to keep appending to a file that can never be erased - and preferably, make that file a memory mapped file. Similarly, when sending data to your end-users, make sure you can overflow the data to disk so you don’t have 3mb data sitting in memory for a single client, who happens to be too slow to take it fast enough.

      And last thing, make sure you have architected things in a way that you can simply start up a new instance on another machine, and both machines can work IN TANDEM, allowing you to just throw hardware at the problem once you reach your hardware’s limit. I’ve personally scaled up an app from about 20 machines to over 650 by ensuring the collector could handle multiple collections - and even making sure I could run multiple collectors side-by-side for when the data is too much for one collector to crunch.

      I don’t know of any papers on this, but this is my experience writing extremely high performance network apps

      :)

      [ This post was originally written by Preston on slashdot. Reproduced here with his permission. If you have an interesting comment or a writeup you want to share on this blog, please forward it to me or submit a comment to this post ]

      Tags:

      Popularity: 29%

      Loadbalancer for horizontal web scaling: What questions to ask before implementing one.

      August 28th, 2007

      A single server, today, can handle an amazing amount of traffic. But sooner or later most organizations figure out that they need more and talk about choosing between horizontal and vertical scaling. If you work for such an organization and also happen to manage networking devices, you might find a couple of loadbalancers on your desk one day along with a yellow sticky note with a deadline on it.

      Loadbalancers, by definition, are supposed to solve performance bottlenecks by distributing or balancing load between different components its managing. Though you would normally find loadbalancers in front of a webserver, a lot of different individuals have found other interesting ways of using it. For example I know some organizations have so much network traffic that they can’t use just one sniffer or firewall to do their job. They ended up using some high end loadbalancers to intelligently loadbalance network traffic through multiple sniffers/firewalls attached to them. These are rare, so don’t worry about it.

      But in general, if you are having a web scalability issue, you would probably start looking at hardware solutions first. And if you do, here are my recommendation on questions you need to ask before you investigate how to set it up.

      • Is the loadbalancer really needed ?
        • Identify the bottlenecks in the system. Adding webservers/appservers will not solve the problem if database is the bottleneck.
          • BTW, If you can replicate read only database to multiple servers, you might be able to use a loadbalancer to balance that traffic…
        • If most of the traffic is due to static content, investigate apache’s ability to set better cachable headers. Or use a CDN(Content distribution network) like akamai/limelight to offload cachable objects.
        • DNS and SMTP are examples of protocols which are designed to be automatically loadbalanced. If an SMTP server fails to respond, the DNS protocol will help an SMTP client to find alternative SMTP servers for a destination. If your organization has controls over the protocol they are using, they should investigate the possibility of using DNS as the loadbalancing mechanism.
      • Is it a software or a hardware loadbalancer?
        • Most people can’t think of loadbalancer as anything else but a dedicated hardware. Interestingly, chances are you are already using software loadbalancers in your network without knowing it (like the DNS loadbalancer I mentioned above). I’ve seen a decline in commercial web software loadbalancers in the recent years, which has been replaced by open source components like mod_jk loadbalancer, perlbal,etc.
        • In this particular writeup, I’m going to focus on hardware loadbalancer to keep it short.
      • Do you need failover ?
        • If you need loadbalancer failover, you should be buying in pairs
        • Some loadbalancer work in Active-Active mode, where both loadbalancers can be functional, while others allow only Active-Passive mode.
        • The term “failover” can mean many things. Make you you ask the right questions to understand what a loadbalancer really offers.
          • It could mean TCP session failover where every single TCP connection will be failed over.
          • It could also mean HTTP session failover (where one session is defined by one unique Cookie). If a loadbalancer supports only this mode, every single user connected to the loadbalancer will notice a blip when the primary loadbalancer dies. More often than not, this is what a loadbalancer vendor usually provides. And unfortunately not everyone in pre-sales is aware of this “minor” detail.
      • How do you want to do configuration management ?
        • Not all brands are made equal. Some are easy to manage and others aren’t. I personally prefer CLI over GUI on any day.
        • But more than the CLI/GUI, I want the ability to revision control and compare different versions of configuration files. The current brand we use at work, unfortunately, doesn’t provide an easy way to do this. If you are supporting a big operation with multiple loadbalancers and have to replicate same/similar setup in multiple places, then this is something you shouldn’t compromise on.
      • How many servers are we talking about ?
        • If your loadbalancer device has ports on it to attach the servers physically to it, you should make sure you don’t have more servers than the number of ports
        • In most cases, however, you can add a 100BaseT switch and add more servers to it. But regardless, having an approximate number of servers will help you decide some of the other questions later.
      • Will all of these server be part of the same farm/cluster ?
        • Some organizations setup different websites on the same loadbalancer and set it up in such a way that the loadbalancer distributes load to different farm/cluster of servers depending on which domain name the client requested for.
        • Some loadbalancers can also inspect the URL and send requests with different paths (in the same domain) to different server farms. For example “/images” could go to one server and “/signup” could go to another one.
        • There are still others who might keep a set of servers on standby mode (not active) waiting to be brought up automatically in an event of problems with the primary clusters.
      • Will all of the servers have the same weights when loadbalancing ?
        • For example are there some servers which should get more traffic than others because they are faster ?
      • Is session stickyness important ?
        • If a user ends up on a particular webserver, is there any reason why that user should continue to stay on that webserver ? There are different kinds of session stickiness which I can think off.
          • The most popular kind is probably IP based stickiness where traffic from same IP always goes to the same webserver. This was a great way of loadbalancing until companies like AOL decided that they will loadbalance outgoing traffic using different proxy servers, effectively changing source IP address of traffic coming from the same client.
          • My favorite session stickiness mechanism is Cookies. Cookies can be used as session tracking IDs to associate a user session with a particular webserver. There are many different ways of implementing this of which these are the few interesting ways I’ve used
            • Allow the loadbalancer to set a cookie for you in each session without the tracking cookie.
            • Most web application servers like PHP and java use cookie names like “PHPSESSIONID” or “JSESSIONID” which is an excellent session identifier which a loadbalancer can track.
            • There are a few other interesting cookie options… but I’d rather not discuss it here at this moment.
        • If you really need session stickyness, you should investigate further if your application is really horizontally scalable. Most often, sticky session feature is used as a bandaid to temporarily give an impression of scalable web app design, which could, in future, prove disastrous.
        • Session stickiness comes with some other configuration baggage as well.
          • You need to decide how long an idle session should be considered active before its shutdown. If you have a lot of traffic and if you set this session timeout to be too high, it can quickly fill up your memory.
          • If possible set the timeout to be as close to the application timeout on your app server.
      • Does the traffic have to be encrypted using SSL ?
        • Some loadbalancers have built in SSL engines.
        • Others have capabilities to offload them to SSL accelerators.
        • One could also set it up in such a way that traffic is decrypted on the apache server after the loadbalancing is done. Please be cautious here. If you decide to do SSL decryption on the webserver, you are effectively disallow the loadbalancer to inspect HTTP packets which can be otherwise used to make intelligent routing decisions
      • Do you need compression ?
        • Some Load balancers which come with SSL engines support on-the-fly compression which can significantly speed up user experience if you have a lot of compressible objects.
      • Debugging
        • Whether you like it or not, one day you will have some problem with your loadbalancer and you will be asked by their support team to get some sniffs for them. This usually is a painful process, especially if the equipment is in production network. One feature which can simplify this is allowing the device to sniff on itself. This is not a must-have, but probably a like-to-have feature.
      • Other services
        • Checklist for other services… if you want
          • NTP - Have its time always be in sync with rest of your network
          • Syslog - Ability to send syslog messages.
          • Mail - Ability to send mails when problems happen
          • SNMP - Monitoring and traps
      • External factors
        • A standard sized company investing in loadbalancers usually don’t invest on a single loadbalancer. They usually buy a pair for production network, and another pair for QA or staging networks. By the time you add the support costs, it gets very expensive. If you make an investment like that, make sure the company selling that to you is still alive a couple of years down the line to support you.
        • Loadbalancer’s are not as reliable as phones. Finding bugs in a loadbalancer is much easier than you think. If you don’t have a reliable support team who is willing to help you patch the code fast enough, you might see some downtime or performance issues
        • At the same time, if the company does release a lot of patches on a weekly or monthly basis, you should find out how stable its code is. I usually ask how long it has been since the last stable release.




      Popularity: 26%