DealNews: Scaling for Traffic Spikes

Last year Dealnews.com unexpectedly got listed dealnews.comon the front page of yahoo.com for a couple of hours. No matter how optimistic one is, unexpected events like these can take down a regular website with almost no effort at all. What is your plan if you get slashdotted ? Are you ok with a short outage ? What is the acceptable level of service for your website anyway.

One way to handle such unexpected traffic is having multiple layers of cache. Database query cache is one, generating and caching dynamic content is another way (may be using a cronjob). Tools like memcached, varnish, squid can all help to reduce the load on application servers.

Proxy servers ( or webservers ) in front of application servers play a special role in dealnews. They understood the limitations of application servers they were using, and the fact that slow client connections means longer lasting tcp sessions to the application servers. Proxy servers, like varnish, could off-load that job and take care of content delivery without keeping application servers busy. In addition Varnish also acts as a content caching service which further reduces load on the application servers.

Dealnews’ content is extremely dynamic because of which the it uses a very low TTL of 5 minutes for most of its pages. It may not look a lot but at thousands of pages per second, such a cache can do miracles. While caching is great, the one thing every loaded website has to go through is figure out how to avoid the “cache stampede” when the TTL expires. “Cache stampede” is what happens when 100s of request requesting the same resource hit the server at the same time forcing the webserver to forward all 100 request to the app server and the database server because the caches were not good.

Dealnews solves this problem by separating content generation from content delivery. There is a process which they run which converts data from more than 300 tables of normalized data, into 30 tables with highly redundant de-normalized data. This data is kept in such a way that the application servers are required to make queries  using primary keys or unique keys only. With such a design a cluster of Mysql DB servers shouldn’t have any problem handling 1000s of queries per second from the front end application servers.

Twitter drives a lot of traffic and since a lot of that data is redundant, it heavily relies on caches. Its actually so much that the site could completely go down if a few memcached servers go down. Dealnews explicitly tested their application with the memcached servers disabled to see what the worst case scenario was for reinitializing cache. They then optimized their app to the point where the response time only doubled from about 0.75 seconds to 1.5 second per page without memcached servers.

Handling 3rd party content could be tricky. Dealnews treats 3rd party content as lower class citizens. They not only load 3rd party at the very end of the page, they also try to use iframes wherever possible to keep loading of those objects from loading of dealnews.com

If you are interested in the video recording or the slides from the talk, click on the following links.

References

One comment on “DealNews: Scaling for Traffic Spikes

Comments are closed.