August 29, 2007

Scalability Stories for Aug 30

  1. I found a very interesting story on how memcached was created. Its an old story titled "Distributed caching with memcached". I also found an interesting FAQ on memcached which some of you might like.

  2. Inside Myspace is another old story which follows Myspace's growth over time. Its a very long and interesting read which shouldn't be ignored.

  3. Measuring Scalability tries to put numbers to the problem of scalability. If you have to justify the cost of scalability to anyone in your organization, you should atleast skim through this page

  4. I found a wonderful story on the humble architecture of Mailinator and how it grew over time on just one webserver. It receives approx 5 million emails a day and runs the whole operation pretty much in memory with no logs or database to leave traces behind. And here is another page from the creator of Mailinator abouts its stats from Feb.

  5. Finally another very interesting presentation/slide on the topic of "Scalable web architecture" which focuses primary on LAMP architecture.

August 28, 2007

Thoughts on scalability

Here is an interesting contribution on the topic from Preston Elder 

I've worked in multiple extremely super-scaled applications (including ones sustaining 70,000 connections at any one time, 10,000 new connections each minute, and 15,000 concurrent throttled file transfers at any one time - all in one application instance on one machine).

The biggest problem I have seen is people don't know how to properly define their thread's purpose and requirements, and don't know how to decouple tasks that have in-built latency or avoid thread blocking (and locking).

For example, often in a high-performance network app, you will have some kind of multiplexor (or more than one) for your connections, so you don't have a thread per connection. But people often make the mistake of doing too much in the multiplexor's thread. The multiplexor should ideally only exist to be able to pull data off the socket, chop it up into packets that make sense, and hand it off to some kind of thread pool to do actual processing. Anything more and your multiplexor can't get back to retrieving the next bit of data fast enough.

Similarly, when moving data from a multiplexor to a thread pool, you should be a) moving in bulk (lock the queue once, not once per message), AND you should be using the Least Loaded pattern - where each thread in the pool has its OWN queue, and you move the entire batch of messages to the thread that is least loaded, and next time the multiplexor has another batch, it will move it to a different thread because IT is least loaded. Assuming your processing takes longer than the data takes to be split into packets (IT SHOULD!), then all your threads will still be busy, but there will be no lock contention between them, and occasional lock contention ONCE when they get a new batch of messages to process.

Finally, decouple your I/O-bound processes. Make your I/O bound things (eg. reporting via. socket back to some kind of stats/reporting system) happen in their own thread if they are allowed to block. And make sure your worker threads aren't waiting to give the I/O bound thread data - in this case, a similar pattern to the above in reverse works well - where each thread PUSHING to the I/O bound thread has its own queue, and your I/O bound thread has its own queue, and when it is empty, it just collects the swaps from all the worker queues (or just the next one in a round-robin fashion), so the workers can put data onto those queues at its leisure again, without lock contention with each other.

Never underestimate the value of your memory - if you are doing something like reporting to a stats/reporting server via. socket, you should implement some kind of Store and Forward system. This is both for integrity (if your app crashes, you still have the data to send), and so you don't blow your memory. This is also true if you are doing SQL inserts to an off-system database server - spool it out to local disk (local solid-state is even better!) and then just have a thread continually reading from disk and doing the inserts - in a thread not touched by anything else. And make sure your SAF uses *CYCLING FILES* that cycle on max size AND time - you don't want to keep appending to a file that can never be erased - and preferably, make that file a memory mapped file. Similarly, when sending data to your end-users, make sure you can overflow the data to disk so you don't have 3mb data sitting in memory for a single client, who happens to be too slow to take it fast enough.

And last thing, make sure you have architected things in a way that you can simply start up a new instance on another machine, and both machines can work IN TANDEM, allowing you to just throw hardware at the problem once you reach your hardware's limit. I've personally scaled up an app from about 20 machines to over 650 by ensuring the collector could handle multiple collections - and even making sure I could run multiple collectors side-by-side for when the data is too much for one collector to crunch.

I don't know of any papers on this, but this is my experience writing extremely high performance network apps


[ This post was originally written by Preston on slashdot. Reproduced here with his permission. If you have an interesting comment or a writeup you want to share on this blog, please forward it to me or submit a comment to this post ]


August 27, 2007

Loadbalancer for horizontal web scaling: What questions to ask before implementing one.

A single server, today, can handle an amazing amount of traffic. But sooner or later most organizations figure out that they need more and talk about choosing between horizontal and vertical scaling. If you work for such an organization and also happen to manage networking devices, you might find a couple of loadbalancers on your desk one day along with a yellow sticky note with a deadline on it.

Loadbalancers, by definition, are supposed to solve performance bottlenecks by distributing or balancing load between different components its managing. Though you would normally find loadbalancers in front of a webserver, a lot of different individuals have found other interesting ways of using it. For example I know some organizations have so much network traffic that they can't use just one sniffer or firewall to do their job. They ended up using some high end loadbalancers to intelligently loadbalance network traffic through multiple sniffers/firewalls attached to them. These are rare, so don't worry about it.

But in general, if you are having a web scalability issue, you would probably start looking at hardware solutions first. And if you do, here are my recommendation on questions you need to ask before you investigate how to set it up.

  • Is the loadbalancer really needed ?

    • Identify the bottlenecks in the system. Adding webservers/appservers will not solve the problem if database is the bottleneck.

      • BTW, If you can replicate read only database to multiple servers, you might be able to use a loadbalancer to balance that traffic...

    • If most of the traffic is due to static content, investigate apache's ability to set better cachable headers. Or use a CDN(Content distribution network) like akamai/limelight to offload cachable objects.

    • DNS and SMTP are examples of protocols which are designed to be automatically loadbalanced. If an SMTP server fails to respond, the DNS protocol will help an SMTP client to find alternative SMTP servers for a destination. If your organization has controls over the protocol they are using, they should investigate the possibility of using DNS as the loadbalancing mechanism.

  • Is it a software or a hardware loadbalancer?

    • Most people can't think of loadbalancer as anything else but a dedicated hardware. Interestingly, chances are you are already using software loadbalancers in your network without knowing it (like the DNS loadbalancer I mentioned above). I've seen a decline in commercial web software loadbalancers in the recent years, which has been replaced by open source components like mod_jk loadbalancer, perlbal,etc.

    • In this particular writeup, I'm going to focus on hardware loadbalancer to keep it short.

  • Do you need failover ?

    • If you need loadbalancer failover, you should be buying in pairs

    • Some loadbalancer work in Active-Active mode, where both loadbalancers can be functional, while others allow only Active-Passive mode.

    • The term "failover" can mean many things. Make you you ask the right questions to understand what a loadbalancer really offers.

      • It could mean TCP session failover where every single TCP connection will be failed over.

      • It could also mean HTTP session failover (where one session is defined by one unique Cookie). If a loadbalancer supports only this mode, every single user connected to the loadbalancer will notice a blip when the primary loadbalancer dies. More often than not, this is what a loadbalancer vendor usually provides. And unfortunately not everyone in pre-sales is aware of this "minor" detail.

  • How do you want to do configuration management ?

    • Not all brands are made equal. Some are easy to manage and others aren't. I personally prefer CLI over GUI on any day.

    • But more than the CLI/GUI, I want the ability to revision control and compare different versions of configuration files. The current brand we use at work, unfortunately, doesn't provide an easy way to do this. If you are supporting a big operation with multiple loadbalancers and have to replicate same/similar setup in multiple places, then this is something you shouldn't compromise on.

  • How many servers are we talking about ?

    • If your loadbalancer device has ports on it to attach the servers physically to it, you should make sure you don't have more servers than the number of ports

    • In most cases, however, you can add a 100BaseT switch and add more servers to it. But regardless, having an approximate number of servers will help you decide some of the other questions later.

  • Will all of these server be part of the same farm/cluster ?

    • Some organizations setup different websites on the same loadbalancer and set it up in such a way that the loadbalancer distributes load to different farm/cluster of servers depending on which domain name the client requested for.

    • Some loadbalancers can also inspect the URL and send requests with different paths (in the same domain) to different server farms. For example "/images" could go to one server and "/signup" could go to another one.

    • There are still others who might keep a set of servers on standby mode (not active) waiting to be brought up automatically in an event of problems with the primary clusters.

  • Will all of the servers have the same weights when loadbalancing ?

    • For example are there some servers which should get more traffic than others because they are faster ?

  • Is session stickyness important ?

    • If a user ends up on a particular webserver, is there any reason why that user should continue to stay on that webserver ? There are different kinds of session stickiness which I can think off.

      • The most popular kind is probably IP based stickiness where traffic from same IP always goes to the same webserver. This was a great way of loadbalancing until companies like AOL decided that they will loadbalance outgoing traffic using different proxy servers, effectively changing source IP address of traffic coming from the same client.

      • My favorite session stickiness mechanism is Cookies. Cookies can be used as session tracking IDs to associate a user session with a particular webserver. There are many different ways of implementing this of which these are the few interesting ways I've used

        • Allow the loadbalancer to set a cookie for you in each session without the tracking cookie.

        • Most web application servers like PHP and java use cookie names like "PHPSESSIONID" or "JSESSIONID" which is an excellent session identifier which a loadbalancer can track.

        • There are a few other interesting cookie options... but I'd rather not discuss it here at this moment.

    • If you really need session stickyness, you should investigate further if your application is really horizontally scalable. Most often, sticky session feature is used as a bandaid to temporarily give an impression of scalable web app design, which could, in future, prove disastrous.

    • Session stickiness comes with some other configuration baggage as well.

      • You need to decide how long an idle session should be considered active before its shutdown. If you have a lot of traffic and if you set this session timeout to be too high, it can quickly fill up your memory.

      • If possible set the timeout to be as close to the application timeout on your app server.

  • Does the traffic have to be encrypted using SSL ?

    • Some loadbalancers have built in SSL engines.

    • Others have capabilities to offload them to SSL accelerators.

    • One could also set it up in such a way that traffic is decrypted on the apache server after the loadbalancing is done. Please be cautious here. If you decide to do SSL decryption on the webserver, you are effectively disallow the loadbalancer to inspect HTTP packets which can be otherwise used to make intelligent routing decisions

  • Do you need compression ?

    • Some Load balancers which come with SSL engines support on-the-fly compression which can significantly speed up user experience if you have a lot of compressible objects.

  • Debugging

    • Whether you like it or not, one day you will have some problem with your loadbalancer and you will be asked by their support team to get some sniffs for them. This usually is a painful process, especially if the equipment is in production network. One feature which can simplify this is allowing the device to sniff on itself. This is not a must-have, but probably a like-to-have feature.

  • Other services

    • Checklist for other services... if you want

      • NTP - Have its time always be in sync with rest of your network

      • Syslog - Ability to send syslog messages.

      • Mail - Ability to send mails when problems happen

      • SNMP - Monitoring and traps

  • External factors

    • A standard sized company investing in loadbalancers usually don't invest on a single loadbalancer. They usually buy a pair for production network, and another pair for QA or staging networks. By the time you add the support costs, it gets very expensive. If you make an investment like that, make sure the company selling that to you is still alive a couple of years down the line to support you.

    • Loadbalancer's are not as reliable as phones. Finding bugs in a loadbalancer is much easier than you think. If you don't have a reliable support team who is willing to help you patch the code fast enough, you might see some downtime or performance issues

    • At the same time, if the company does release a lot of patches on a weekly or monthly basis, you should find out how stable its code is. I usually ask how long it has been since the last stable release.

August 25, 2007

Feature or a bug ?

Dratz asks: Feature or a bug ?

TypePad architecture: Problems and solutions

TypePad was and probably is one of the first and largest paid blogging service in the world. In a presentation at OSCON 2007 , Lisa Phillips and Garth Webb spoke about TypePad's problems in 2005. Since this is a common problem with any successful company I found it interesting enough to research a little more.

TypePad was, like any other service, initially designed in the traditional way with Linux, Postgres, Apache, mod_perl, perl as the front end and NFS storage for images on a filer. At that time they were pushing close to 250mbps (4TB per day) through multiple pipes and with growing user base, activity and data they were growing at the rate of 10 to 20% per month.

Just before the planned move to newer better data center, sometime in Oct 2005, TypePad started experiencing all kinds of problems due to its unexpected growth. The unprecedented stress on the system caused multiple failures over the next two months which ranged from hardware, software, storage to networking issues. While at times it made reading or publishing services to be completely unavailable, it also caused sporadic performance issues with statistic calculations.

One of the most visible failures was in December of 2005 when during a routine maintenance, in the middle of the process of adding redundant storage, something caused the complete storage cluster to go offline which caused the entire bank of webservers serving the webpages went down . Because they had separate storage cluster for backend database, it wasn't affected by the outage directly.

Its at times like these that most companies fail to communicate with their users. Sixpart, fortunately, understood this early and did its job well.

Today Typepad's architecture is similar to the one of Livejournal with users distributed over multiple master-master mysql replication. They have partitioned the database by UserIDs and have a global database to map UserIDs to partitions. They use Mysql 5.0 with InnoDB and Linux Heartbeat for HA.

The images though they decided to switch from a NFS storage to Perlbal ( Perl-based reverse proxy load balancer and web server) +MogileFS (open source distributed file system) which can scale much better with lower overhead over commodity hardware. Look at the image on the right which how Typepad served images in the transition phase from NFS to MogileFS. Follow the arrows with numbers to see how the requests go through within the network. For an image stored on MogileFS (Mogstored), the app server talks to MogileDB through mod_perl2 first (Step 3,4). MogileDB/mod_perl2 sends a Perlbal internal redirect(Step 5,6,7) to the actual image resource which is located on Mogstored(step 8,9).

Since most of the activity on the blogs are read only operations, it made sense to add memcached early into the process to ease load on a lot of components.

memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

In another interesting approach to scalable architecture they recognized the fact that one of the most write intensive operations was commenting system which made them experiment with "The Schwartz". This technology helped them use a queuing mechanism which could reliably delay write intensive operations to the database effectively allowing it to scale more.

The Schwartz is taglined "a reliable job queue system" and was originally developed as a generic job processing system for Six Apart's hosted services. It is used in production today on TypePad, Livejournal and Vox for managing tasks that can be performed by the system without user interaction.


August 20, 2007

Web storage for backups

I'm contemplating using S3 for backups. Paul Stamantiou has a script ready to go. The thing which convinced me was this chart Paul showed. For 10GB of space he paid under 3 dollars per month. Thats really cheap...

GMail, Microsoft and yahoo all provide extra storage as well. However none of them have stable company supported APIs to allow users to upload data in this form.

How Skype network handles scalability..

There was a major skype outage last week and though there is an "official explaination" and other discussions about it floating around, I found this comment from one of the GigaOm readers more interesting to think about. Now this particular description may not accurately describe the problem (which might be speculation as well) but it does describe , in a few words, how skype's p2p network scales out. You should also take a look at the detailed discussion of the skype protocol here.
Number of Skype Authentication servers:
Count == 50; // Clustered
Number of potential Skype clients:
Count = 220,000,000 // Mostly decentralized
Number of SuperNode clients to maintain network connectivity:
Count = N / 300 at any one time.

• If there are 3.0 million users online then the ratio is 3,000,000 / 300 = 10,000 == Supernodes available
• Supernodes are bootstraps into the network for normal first run clients ("and handle routing of children calls").
• Supernodes maintain the network overlay via a DHT("Distributed Has Table") "type" method. // This is normally very slow and done over UDP
• If a client cannot find a Supernode, regardless of authentication via central server then is NOT allowed on the Skype network.

Lack of Supernodes mean lack of network connectivity regardless of successful login via “central server”.
You CAN be a Supernode but not have full network connectivity because you have only a portion of the “Distributed Index Data aka DHT”.
MOST people that become Supernodes will bail out if they cannot keep a clear route (”aka calls bail out, client restarts and aborts Supernode status, thus booting it’s 300 - 500 Children and putting them into a “Connecting mode”.

Children that are trying to “Connect” are unable to do anything unless they have a “Supernode” as a parent. // No calls, No IM….

The overview of this is as follows:

Skype introduced a flaw into the network that dealt with “routing” and “fucked” the “decentralized data store aka DHT” this in turn ran clients on a RANDOM search of Supernodes which at this point were well booted off of the network.

In the End:
It is a huge cycle, no matter how many bugs they “fix” in the “central servers” it will take many days for N nodes to become Supernodes so they can route X data from peer A to peer B. This is NOT minor, a fix to the centralized server code base to relay data to N Supernodes there is lack there of, resulting of a very segregate network. Right now there are approximatly 10,000 sub Skype networks instead of 1 Single “in sync” network. When this “data store(see DHT) is in sync globally then the Skype network will be again STABLE.

I know this is very broad but, unless magically all of said nodes can recreate the “single overlay (DHT)” then nothing will be in sync. You will see delayed messaged, delayed or incorrect profiles and presence.

My take, in the end is give it 48 more hours and it may be semi-stable, but hey this is what you get with using end users as your own redundancy…


August 18, 2007

Links on scalability, performance and problems

8/19/2007Big Bad Postgres SQL
8/19/2007Scalable internet architectures
8/19/2007Production troubleshooting (not related to scalability)
8/19/2007Clustered Logging with mod_log_spread
8/19/2007Understanding and Building HA/LB clusters
8/12/2007Multi-Master Mysql Replication
8/12/2007Large-Scale Methodologies for the World Wide Web
8/12/2007Scaling gracefully
8/12/2007Implementing Tag cloud - The nasty way
8/12/2007Normalized Data is for sissies
8/12/2007APC at facebook
8/6/2007Plenty Of fish interview with its CEO
8/6/2007PHP scalability myth
8/6/2007High performance PHP
8/6/2007Digg: PHP's scalability and Performance td> : Find out what a website's frontend is built with

This is a very interesting website which allows you to understand the technology behind the websites you visit. Here is more from its about page
BuiltWith is a web site profiler tool. Upon looking up a page, BuiltWith returns all the technologies it can find on the page. BuiltWith’s goal is to help developers, researchers and designers find out what technologies pages are using which may help them to decide what technologies to implement themselves.

BuiltWith technology tracking includes widgets (snap preview), analytics (Google, Nielsen), frameworks (.NET, Java), publishing (WordPress, Blogger), advertising (DoubleClick, AdSense), CDNs (Amazon S3, Limelight), standards (XHTML,RSS), hosting software (Apache, IIS, CentOS, Debian) .

Getting ready for Social Network portability

  • Brad Fitzpatrick and David Recordon, kicked off another round of discussions on aggregating, decentralizing and Social network portability in a post called "Thoughts on the Social Graph". The post is long, but he summarized the problem statement into a few lines..

Users and developers alike are going crazy. There's too many social networks out there to keep track of. Developers want to make more, and users want to join more, but it's all too much work to re-enter your friends and data. We need to lower the amount of pain for both users and developers and let a thousand new social applications bloom.

I've mentioned this problem in the past as well and feel like this is long overdue. Sites like Plaxo and Facebook have taken a step in the right direction, but its not the solution. As I see it the real solution should be something similar to the XMPP standard which opened up the chat protocol to allow decentralized chat networks work with each other.

Also read

August 16, 2007

Microsoft Live ID out : Google going to support OpenID soon... I predict

The other day I briefly mentioned the pain point of the web2.0 world and how consolidation, aggregation and summarization will help reduce some of it. Microsoft today formally announced the availability of Microsoft Live ID as a contender for the providing SSO (single sign on) services in the web 2.0 world. Live ID, incase you didnt know,  is the repackaged version of Microsoft Passport Network, which had failed so badly that it forced Microsoft to pull it out of the market. Here are some examples of how to use other languages like php, perl, python, ruby etc to do authentication using Live ID. Microsoft is not the first one to openly come out with a SSO technology. Liberty Alliance and OpenID are other opensource competitors which have some foothold in this market already.

The move to SSO, in the web 2.0 world, (Single sign on) is bound to happen regardless of how scary some people might find it to be. If you can trust your online bank with 100000 dollars and trust 3 companies you don't really know with your entire credit history, then this shouldn't be that much of a concern. The real question is whether you trust the technology leaders Microsoft, Google, Yahoo  or others like Verisign enough to provide these critical services for you.

In my opinion the reason why OpenID and Liberty Alliance have failed is because of fragmentation of standards and lack of leadership. While Microsoft failed the commercial venture into Authentication services (Microsoft Passport network) it might actually do well as long as it doesn't screw up this time. Not because the they have done a great job in the past, but because the pain is now so unbearable that people are willing to give almost anything a try. But the real kicker is that almost everyone has a microsoft account anyway, so if I had an option to use my Microsoft account to login to a new web 2.0 product, I'll do that in a heart beat. Creating yet another account with a new password and doing the email confirmation thing is not an adventure anymore... ( or may be I'm getting old ).

I predict that Google or Yahoo will soon jump into this with its own suite of authentication services (probably using OpenID or Liberty Alliance) which will then become the next battleground in the web2.0 world. I also predict that in a couple of years after that many of the web services will move towards supporting these forms of authentication services so that users are not forced to create new user accounts with new passwords every single time.

And if my predictions don't really come true... hey, at least I know that I can dream.


August 13, 2007

DNS Rebinding what ?

Everyone who knows what a "DNS Rebinding attack" is please raise your hands. I'm so glad I can't see yours, because I'm ashamed of myself for not knowing this one. For those who are "pretending" not to know please read on.
Browsers use domain names to enforce same-domain policy for a lot of security features. Interestingly depending on which client you are using its possible to set a low DNS TTL and change the IP address such that without a change in domain name a script could interact with another website as long as browser can be made to believe that its still the same domain. To do this, all that the client needs to do is initially server contents from its own server and while the javascript is running, update the DNS such that the javascript can interact with a new domain from where it could steel information for the attacker.

There are some safe gaurds to stop these kinds of attacks, but for most part these kinds of attack can be done easily on the internet today. The browsers are getting smarter though. And the "DNS Rebinding attack" isn't new anyway... its been known for years at least. The way browsers try to defeat this is by limiting the minimum DNS TTL which can be set.

All was well and good until an attacker realized that the browser and plugins inside the browser each have different minimum DNS TTL set. So as long as the browser and plugin can talk to each other, there could be a point in time when the plugin could be talking to the attackers server and the browser could be connected to the real server streaming the information to the attacker through the plugin.

  1. Protecting browsers from rebinding attacks

  2. XSRF^2

  3. Anti-DNS pinning and DNS-rebinding attacks

  4. Defending network against DNS Reminding attacks

August 12, 2007

ETags and loadbalancers

A few weeks ago the company I work with noticed a weird problem with its CDN (Content Delivery Network) provider. They noticed that HEAD requests were being responded to by the CDN edge nodes using objects in the cache which had already expired. Whats worse is that even after an explicit content expiry notification was sent, the HEAD responses were still wrong. Long story short, the CDN provider had to setup bypass rules for the HEAD requests so that it always bypasses the cache. There was a slight performance overhead with this, but the workaround solved the problem.

Now while this was going on, one of the guys at the CDN support helping us mentioned something about Etags and why we should be using it. I didn't understand how Etags would solve the problem if the CDN itself had a bug which was ignoring expiry information, but I said I'll investigate.

Anyway, the traditional way of communicating object expiry is using the Last-Modified timestamp. ETags is another way of doing that, except that its more accurate.
A little more digging explained that ETags is not a hash of the contents of the file, but a combination of file's inode, file size and last-modified timestamp. This is definitely more accurate and I could see why this might be better than just having last-modified timestamp. But what the CDN support guy didn't mention is that if you are serving content from multiple webservers, even if you rsync the content between the servers, the Etags will always be different because rsync or any other standard copy commands don't have control over the inode number used.

A little more search on the net confirmed that this is a problem and that ETags should probably be shut off (or modified such that it doesn't use inodes) on servers behind loadbalancers.


There is a very interesting interview with Markus Frind, the one man army behind the website The site boasts of traffic higher than, about 30 million page views a day, and runs on a single webserver with a couple of database servers. Markus has found interesting ways of surviving different kinds of problems he had. Here is the direct link to interview in wmv format.

Facebook internals

The code leaked during a facebook bug was posted online by an anonymous user. Though the source itself didn't look very damaging, it did damage the brand "facebook". But I won't go into that in this post, and instead I would like to discus the facebook internals here which alex.moskalyuk touched upon.

Alex pointed out that this is not the only code from facebook we have seen. Infact we already know a lot more about how facebook works internally than what most of us would find from the source code to the index.php published yesterday.

  1. PHP - This is no surprise. Though PHP is not developed at faceboook, Alex points out that facebook developers are involved atleast at some level in the development of the php.

  2. Apache - Neither should this be

  3. Mysql - Same here..

  4. Valgrind - This is a suite of tools for debugging and profiling Linux programs. With the tools that come with Valgrind, you can automatically detect many memory management and threading bugs, avoiding hours of frustrating bug-hunting, making your programs more stable. You can also perform detailed profiling, to speed up and reduce memory use of your programs. Other tools related to this which they user are callgrind/Calltree , KCachegrind and OProfile.

  5. APC - Facebook developers have talked about using Alternative PHP Cache in some presentations they have given in the past.

  6. Facebook Thrift - Thrift is a software framework for scalable cross-language services development. It combines a powerful software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, and Ruby. Thrift was developed at Facebook, and its been released as open source. More information can be found in this whitepaper.

  7. Memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load. The use of this shouldn't come as a surprise since most of the new web2.0 companies, especially the ones using php and python have experimented or implemented it at some level.

  8. phpsh is another interesting tool facebook developers use internally. It is an interactive shell for php that features readline history, tab completion, quick access to documentation. It is ironically written mostly in python.

  9. Facebook has released a lot of code to support the facebook platform and to get users to develop for it.

  10. Facebook firefox plugin is the last one I'd like to mention here. This again is open source (since you can see the code once you open up the plugin yourself).


August 11, 2007

Mysql Cluster

"Introduction to MySQL Cluster The NDB storage engine (MySQL Cluster) is a high-availability storage engine for MySQL. It provides synchronous replication between storage nodes and many mysql servers having a consistent view of the database. In 4.1 and 5.0 it's a main memory database, but in 5.1 non-indexed attributes can be stored on disk. NDB also provides a lot of determinism in system resource usage. I'll talk a bit about that."

Technorati Profile

Facebook code leaked.. but was it Hacked too ?

Everyone would be talking about this soon. Someone leaked the source of the index page of facebook on a website called facebook secrets.

Update: Brandee Barker from Facebook responded to Nic on Techcrunch.
Hi Nic-

I wanted to clarify a few things in your story. Some of Facebook’s source code was exposed to a small number of users due to a bug on a single server that was misconfigured and then fixed immediately. It was not a security breach and did not compromise user data in any way. The reprinting of this code violates several laws and we ask that people not distribute it further.

Thanks to you and the TC readers for helping us out on this one.

Brandee Barker

What is not clear is whether this was a hack or was someone inside involved. This is what Nik Cubrilovic from TechCrunch has to say...
"There are a number of clear ramifications here. The first is that the code can be used by outsiders to better understand how the Facebook application works, for the purposes of finding further security holes or bugs that could be exploited. Since Facebook is a closed source application, without access to the code security holes are usually found through a process of black-box testing, whereby an external party will probe the application in an attempt to work out how the application behaves and to try and find potential race conditions. In closed source applications it is common that developers rely on the closed nature of the application to obfuscate poor design elements and the structure of the application. An attacker getting access to the source code more often than not leads to further security holes being discovered. It is for these reasons that it is often claimed that open source software is more secure than closed source software, since there are many more eyes auditing the code and obfuscation can’t be used as a security measure.

The second implication with this leak is that the source code reveals a lot about the structure of the application, and the practices that Facebook developers follow. From just this single page of source code a lot can be said and extrapolated about the rest of the Facebook application and platform. For instance, the structure doesn’t follow any object oriented development practices, and it seems that the application is one large PHP file with a large number of custom functions living in the same namespace (they also seem to be using the Smarty templating engine). "

August 09, 2007

How To Design A Good API and Why it Matters

A very interesting Google talk about designing a good API. This may not seem like a scalability issue, but if you really want to host a horizontally scalable system you need to have a good scalable API design to go with it.
Every day around the world, software developers spend much of their time working with a ... all variety of Application Programming Interfaces (APIs). Some are integral to the core platform, some provide access to widely distributed frameworks, and some are written in-house for use by a few developers. Nearly all programmers occasionally function as API designers, whether they know it or not. A well-designed API can be a great asset to the organization that wrote it and to all who use it. Good APIs increase the pleasure and productivity of the developers who use them, the quality of the software they produce, and ultimately, the corporate bottom line. Conversely, poorly written APIs are a constant thorn in the developer's side, and have been known to harm the bottom line to the point of bankruptcy. Given the importance of good API design, surprisingly little has been written on the subject. In this talk, I'll attempt to help you recognize good and bad APIs and I'll offer specific suggestions for writing good ones.

August 07, 2007

Content Delivery network: Will Price war boost web performance ?

GigaOm has an interesting write up on the commoditization  of the CDN service  and the pricewar raging in the industry. Akamai itself saw a significant stock market drop in the last couple of weeks.
"That burp has come with the increase in the number of competitors, each one trying to cash in on the boom in online video and other digital content. Limelight Networks (LLNW), Level 3 (LVLT), Internap (INAP), CDNetworks, along with new entrants Panther Express and EdgeCast Networks are some of the CDN players currently involved in a catfight with Akamai.  "

CDN is an excellent way of boosting performance and providing PoP in different parts of the world which can benefit by faster content delivery.

August 05, 2007 going public...Why ?

Mashable mentioned that accoona is going public. It says...
Most of Accoona’s revenue comes from its e-commerce business, which operates in North America. It’s online lead generation and search engine services are used in the US, Europe and China. Its search technology was hailed as a viable competitor to other major search engines such as Google, when it launched its Internet service a few years ago. Accoona’s attempt at differentiation is that of its semantic search, incorporating the meaning of words into your queries, allow you to further filter your search results based on your highlighted keywords, and will revise information in real time, offering relevant data such as fax and phone numbers, addresses, etc. for particular information you look up.

My question is... why ? The site itself looks unpleasant to visit, slow to search and has at least a few implementation bugs at least. On top of that I found the advertisements annoying to look at and the search filtering idea, though great, wasn't really implemented in an intuitive way.
Now, all that doesn't really matter if the "AI" part of search was any good. I tried to search for two simple things and compared it with google.

  • How high is mount everest ?

  • Which is the second highest mountain ?

For both of these results, google was spot on... and Accoona's AI based search required Real Intelligence on my part to find the right answer. The other problem is that SuperTarget's 6 filter catagories are insufficient to cover various topics a user could be searching on.

But thats just me talking about it after using the site for 2 minutes.

Interestingly Accoona also runs this website which might be where it really makes money. But its not clear if this website uses any of the AI infrastructure Accoona is investing on.

Update: John Battlelle has an update on According to him this company does more than what meets the eye. But its still not clear why they have all the smoke and mirrors. Also checkout paidContent and the full S-1 filed with SEC is here.
"We are an Internet company engaged in three primary business lines — online-based lead generation, online search in the United States, Europe and China, and e-commerce consumer electronics retailing. Our services assist our users in finding the products, services and information they want, obtaining competitive pricing and making informed buying decisions. We use our expertise in technology, marketing and management to support and create efficiencies across our business lines, which are organized primarily into the following sectors:
• Online-based lead generation — We developed and operate ExchangePlaceTM, which we believe is one of the first U.S. online-based marketplaces that enables consumers to obtain offers from as many as four providers of services in which they are interested and allows providers to bid for the opportunity to contact qualified consumers, or leads, (i.e., those meeting the providers’ criteria), across a range of vertical markets. We believe that these leads are more valuable to providers because of the greater likelihood they will result in sales, thereby resulting in increased returns on investment, or ROI, for those providers.
• Search — We have developed and operate an artificial intelligence driven search engine in the United States, China and Europe. Our business plan contemplates the development of techniques to use our existing technologies to enable our users to better access certain specialized search markets. In addition, we operate a shopping comparison search engine,, that allows shoppers to search for and compare products and prices available at numerous online merchants.
• E-commerce — We operate six Internet retail websites offering primarily a wide selection of consumer electronics and home appliances, backed by customer service and support. According to a report in TWICE, in 2006, the combined revenues of our e-commerce sites made us one of the top 10 consumer-direct electronics retailers in North America by online revenue and one of the top 55 consumer electronics retailers overall."

New Talks and Slides links from Aug 5 2007

If you haven't seen these links before.. you should check this page first "Talks and slides from web architects". But if you have already seen that page... here are the updates from last week.

 PDFCase for Shared Nothing
 PDFThe Chubby Lock Service for Loosely-Coupled Distributed Systems
  Building Highly Scalable Web Applications
1/1/2006SlidesThe Ebay architecture
1/1/2007SlidesPHP & Performance
4/20/2007VideoBrad Fitzpatrick - Behind the Scenes at LiveJournal: Scaling Storytime
5/4/2006 Scalable computing with Hadoop
6/3/2007 Hadoop Map/Reduce
8/3/2007 Introduction to hadoop
6/1/2007SlidesHadoop distributed file system
  Yahoo experience with hadoop
7/25/2007 Meed Hadoop
8/3/2007webpageThe Hadoop Distributed File System: Architecture and Design
7/25/2007BlogYahoo's Hadoop Support
7/18/2007BlogRunning Hadoop MapReduce on Amazon EC2 and Amazon S3
8/3/2007 Interpreting the Data: Parallel Analysis with Sawzall
10/18/2005VideoBigTable: A Distributed Structured Storage System
1/1/2006PDFBigtable: A Distributed Storage System for Structured Data
1/1/2004PDFMapReduce: Simplified Data Processing on Large Clusters
1/1/2003PDFGoogle File System
8/3/2007PDFODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval
8/3/2007PDFSEDA: An Architecture for well conditioned scalable internet services
8/3/2007PDFA scalable architecuture for Global web service hosting service
6/23/2007BlogGetting Started with Drupal
6/23/2007Blog4 Problems with Drupal

Crowdsourcing the google way

Remember googles innovative image labeler idea ? They seem to be doing it again with getting the masses to build maps for Google in india. India unlike US and many other western countries doesn't have well documented maps for its streets. Eicher is the only organization I know about which actively maps and provides printed maps in india.

Here is what Braddy Forrest has to say...

"Google has been sending GPS kits to India that enable locals to make more detailed maps of their area. After the data has been uploaded and then verified against other participant's data it becomes a part of the map. The process is very reminiscent of what Open Street Map, the community map-building project, has been doing. The biggest difference is that the data (to my knowledge) is owned by Google and is not freely available back to the community like it is with OSM."

August 04, 2007

The "me too" phenomenon and Identity theft

A very interesting article from Muhammad Saleem on the "me too" phenomenon. My problem with this phenomenon is that this might make stealing identity easier than before. In this new web 2.0 world, if I need your passwords or mother's maiden name, all I have to do is build an interesting application which you would like to try out at least once. Once I have your password or other key information (which most likely be the same across all your applications), I can shut the side down and do other interesting things. I'm an open advocate of OpenID which attacks some of the issues, but its no silver bullet.
More from Muhammad's blog..
"Everyday a new company announces a 'new' product which is nothing more than the old product with slight modifications or a few small additional features. This mentality is not only bad for users but also for marketers and even the startups.

A prime example of this phenomenon can be witnessed by comparing Dodgeball, Twitter, Jaiku, Tumblr, Pownce and a plethora of other microblogging tools. 90% of the services these different tools offer are the same, and the 10% that differentiates them is not significant enough to make most users switch."

Hadoop and HBase

This may not be a surprise for a lot of people but it was for me. Even though I have been using lucene and nutch for some time, I didn't really know enough about Hadoop and HBase until recently.


  • Scalable: Hadoop can reliably store and process petabytes.

  • Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes.

  • Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid.

  • Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures.

Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS) (see figure below.) MapReduce divides applications into many small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located.

Google's Bigtable, a distributed storage system for structured data, is a very effective mechanism for storing very large amounts of data in a distributed environment.

Just as Bigtable leverages the distributed data storage provided by the Google File System, Hbase will provide Bigtable-like capabilities on top of Hadoop.

Data is organized into tables, rows and columns, but a query language like SQL is not supported. Instead, an Iterator-like interface is available for scanning through a row range (and of course there is an ability to retrieve a column value for a specific key).

Any particular column may have multiple values for the same row key. A secondary key can be provided to select a particular value or an Iterator can be set up to scan through the key-value pairs for that column given a specific row key.

August 02, 2007

Talks and slides from various web architects

For latest set of links go here.

This is a collection of various slides, pdfs and videos about designing scalable websites I collected time. If you have something interesting which might go in here, please let me know.

6/23/2007BlogGetting Started with Drupal
6/23/2007Blog4 Problems with Drupal
6/23/2007VideoSeattle Conference on Scalability: MapReduce Used on Large Data Sets
6/23/2007VideoSeattle Conference on Scalability: Scaling Google for Every User
6/23/2007VideoSeattle Conference on Scalability: VeriSign's Global DNS Infrastucture
6/23/2007VideoSeattle Conference on Scalability: YouTube Scalability
6/23/2007VideoSeattle Conference on Scalability: Abstractions for Handling Large Datasets
6/23/2007VideoSeattle Conference on Scalability: Building a Scalable Resource Management
6/23/2007VideoSeattle Conference on Scalability: SCTPs Reliability and Fault Tolerance
6/23/2007VideoSeattle Conference on Scalability: Lessons In Building Scalable Systems
6/23/2007VideoSeattle Conference on Scalability: Scalable Test Selection Using Source Code
6/23/2007VideoSeattle Conference on Scalability: Lustre File System
6/9/2007SlidesTechnology at
6/9/2007BlogExtreme Makeover: Database or MySQL@YouTube
4/26/2007BlogMysql at Google
4/1/2007SlidesScaling Twitter
4/1/2007SlidesHow we build Vox
4/1/2007SlidesHigh Performance websites
4/1/2007SlidesBeyond the file system design
4/1/2007SlidesScalable web architectures
3/1/2007SlidesScalability set Amazon's servers on fire not yours
3/1/2007SlidesHardware layouts for LAMP installations
3/1/2007VideoMysql scaling and high availability architectures
3/1/2007AudioLessons from Building world's largest social music platform
3/1/2007PDFLessons from Building world's largest social music platform
3/1/2007SlidesLessons from Building world's largest social music platform
11/1/2006PDFLivejournal's backend: history of scaling
11/1/2006SlidesLivejournal's backend: history of scaling
11/1/2006SlidesScalable Web Architectures (w/ Ruby and Amazon S3)
10/26/2006BlogYahoo! bookmarks uses symfony
7/26/2006SlidesGetting Rich with PHP 5
7/26/2006AudioGetting Rich with PHP 5
3/7/2006BlogScaling Fast and Cheap - How We Built Flickr
3/1/2005NewsOpen source helps Flickr share photos
 SlidesFlickr and PHP
 SlidesWikipedia: Cheap and explosive scaling with LAMP
 BlogYouTube Scalability Talk
  High Order Bit: Architecture for Humanity
 PDFMysql and Web2.0 companies
8/3/2007 Building Highly Scalable Web Applications
8/3/2007 Introduction to hadoop
8/3/2007webpageThe Hadoop Distributed File System: Architecture and Design
8/3/2007 Interpreting the Data: Parallel Analysis with Sawzall
8/3/2007PDFODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval
8/3/2007PDFSEDA: An Architecture for well conditioned scalable internet services
8/3/2007PDFA scalable architecuture for Global web service hosting service
7/25/2007 Meed Hadoop
7/25/2007BlogYahoo's Hadoop Support
7/18/2007BlogRunning Hadoop MapReduce on Amazon EC2 and Amazon S3
6/22/2007 LH*RSP2P : A Scalable Distributed Data Structure for P2P Environment
6/12/2007 Scaling the Internet routing table with Locator/ID Separation Protocol (LISP)
6/3/2007 Hadoop Map/Reduce
6/1/2007SlidesHadoop distributed file system
4/20/2007VideoBrad Fitzpatrick - Behind the Scenes at LiveJournal: Scaling Storytime
2/1/2007SlidesInside LiveJournal's Backend (April 2004)
2/1/2007SlidesHow to scale
1/23/2007 Testing Oracle 10g RAC Scalability
1/1/2007SlidesPHP & Performance
12/22/2006 SQL Performance Optimization
10/13/2006 Building_a_Scalable_Software_Security_Practice
5/31/2006 Building Large Systems at Google
5/4/2006 Scalable computing with Hadoop
1/1/2006SlidesThe Ebay architecture
1/1/2006PDFBigtable: A Distributed Storage System for Structured Data
1/1/2006PDFFault-Tolerant and scalable TCP splice and web server architecture
10/18/2005VideoBigTable: A Distributed Structured Storage System
1/1/2004PDFMapReduce: Simplified Data Processing on Large Clusters
8/3/2003PDFGoogle Cluster architecture
1/1/2003PDFGoogle File System
11/1/2002DocImplementing a Scalable Architecture
10/30/2001NewsHow linux saved Millions for Amazon
  Yahoo experience with hadoop
 SlidesScalable web application using Mysql and Java
 SlidesFriendster: scalaing for 1 Billion Queries per day
 BlogLightweight web servers
 PDFMysql Scale out by application partitioning
 PDFReplication under scalable hashing: A family of algorithms for Scalable decentralized data distribution
 ProductClustered storage revolution
 BlogEarly Amazon Series
 WebWikimedia Server info
 SlidesWikimedia Architecture
 SlidesMySpace presentation
 PDFA scalable and fault-tolerant architecture for distributed web resource discovery
8/4/2007PDFThe Chubby Lock Service for Loosely-Coupled Distributed Systems
8/5/2007SlidesReal world Mysql tuning
8/5/2007SlidesReal world Mysql performance tuning
8/5/2007SlidesLearning MogileFS: Buliding scalable storage system
8/5/2007SlidesReverse Proxy and Webserver
8/5/2007PDFCase for Shared Nothing
7/1/2007SlidesA scalable stateless proxy for DBI
1/1/2006SlidesReal world scalability web builder 2006
8/5/2005SlidesReal world web scalability

Scalable web architectures

I've been reading a lot about scalable web architectures lately and made a big enough collection of links to see that this could be interesting to others. Instead of putting all those links here in this blog, I've started a separate blog here If you have an interesting link/links to share please send it over to me.

Youtube scalability

Scalable Internet Architectures

By Theo Schlossnagle


As a developer, you are aware of the increasing concern amongst developers and site architects that websites be able to handle the vast number of visitors that flood the Internet Scalable Internet Architectures (Developer's Library)on a daily basis. Scalable Internet Architecture addresses these concerns by teaching you both good and bad design methodologies for building new sites and how to scale existing websites to robust, high-availability websites. Primarily example-based, the book discusses major topics in web architectural design, presenting existing solutions and how they work. Technology budget tight? This book will work for you, too, as it introduces new and innovative concepts to solving traditionally expensive problems without a large technology budget. Using open source and proprietary examples, you will be engaged in best practice design methodologies for building new sites, as well as appropriately scaling both growing and shrinking sites. Website development help has arrived in the form of Scalable Internet Architecture.

Amazon Link


From the Back Cover

As a developer, you are aware of the increasing concern amongst developers and site architects that websites be able to handle the vast number of visitors that flood the Internet on a daily basis. Scalable Internet Architecture addresses these concerns by teaching you both good and bad design methodologies for building new sites and how to scale existing websites to robust, high-availability websites. Primarily example-based, the book discusses major topics in web architectural design, presenting existing solutions and how they work. Technology budget tight? This book will work for you, too, as it introduces new and innovative concepts to solving traditionally expensive problems without a large technology budget. Using open source and proprietary examples, you will be engaged in best practice design methodologies for building new sites, as well as appropriately scaling both growing and shrinking sites. Website development help has arrived in the form of Scalable Internet Architecture.

August 01, 2007

Book: Building Scalable Web Sites

Building, scaling, and optimizing the next generation of web applications

By Cal Henderson

Learn the tricks of the trade so you can build and architect applications that scale quickly--without all the high-priced headaches and service-level agreements associated with Building Scalable Web Sites: Building, scaling, and optimizing the next generation of web applicationsenterprise app servers and proprietary programming and database products. Culled from the experience of the lead developer, Building Scalable Web Sites offers techniques for creating fast sites that your visitors will find a pleasure to use.
Creating popular sites requires much more than fast hardware with lots of memory and hard drive space. It requires thinking about how to grow over time, how to make the same resources accessible to audiences with different expectations, and how to have a team of developers work on a site without creating new problems for visitors and for each other.
Presenting information to visitors from all over the world
* Integrating email with your web applications
* Planning hardware purchases and hosting options to have as much as you need without breaking your wallet
* Partitioning and distributing databases to support large datasets and simultaneous transactions
* Monitoring your applications to find and clear bottlenecks
* Providing services APIs and using services from other providers to increase your site's reach and capabilities
Whether you're starting a small web site with hopes of growing big or you already have a large system that needs maintenance, you'll find Building Scalable Web Sites to be a library of ideas for making things work.


Buy Now

Product Details

  • Amazon Sales Rank: #6159 in Books
  • Brand: O'Reilly Media
  • Published on: 2006-05-16
  • Format: Illustrated
  • Number of items: 1
  • Binding: Paperback
  • 352 pages