Archive for the ‘databases’ Category

Scaling technorati - 100 million blogs indexed everyday

Thursday, October 25th, 2007

Indexing 100 million blogs with over 10 billion objects, and with a user base which is doubling every six months, technorati has an edge over most blog search engines. But they are much more than search, and any technorati user can explain you that. I recommend you read John Newton’s interview with David Sifry which I found fascinating. Here are the highlights from the interview if you don’t have time to read the whole thing

  • Current status of technorati
    • 1 terabyte a day added to its content storage
    • 100 million blogs
    • 10 billion objects
    • 0.5 billion photos and videos
    • Data doubling every six months
    • Users doubling every six months
  • The first version was supposed to be for tracking temporal information on low budget.
    • That version put everything in relational database which was fine since the index sizes were smaller then physical memory
    • It worked fine till about 20 million blogs
  • The next generation took advantage of parallelism.
    • Data was broken up into shards
    • Synced up frequently between servers
    • The database size reached largest known OLTP size.
      • Writing as much data as reading
      • Maintaining data integrity was important
        • This put a lot of pressure on the system
  • The third generation
    • Shards evolved
      • The shards were based on time instead of urls
      • They moved content to special purpose databases instead of relational database
    • Don’t delete anything
    • Just move shards around and use a new shard for latest stuff
  • Tools used
    • Green plum - enables enterprises to quickly access massive volumes of critical data for in-depth analysis. Purpose built for high performance, large scale BI, Greenplum’s family of database products comprises solutions suited to installations ranging from departmental data marts to multi-terabyte data warehouses.
  • Should have done sooner
    • Should have invested in click stream analysis software to analyze what clicks with the users
      • Can tell how much time users spend on a feature

Popularity: 78%

Sharding: Different from Partitioning and Federation ?

Sunday, September 9th, 2007

Ive been hearing this word “sharding” more and more often, and its spreading like fire. Theo Schlossnagle, the author of “Scalable internet architecutres” argues that federation is form of partitioning, and that sharding is nothing but a form of partitioning and federation. Infact, according to him, Sharding has already been in use use for a long time.

I’m not a dba, and I don’t pretend to be one in my free time either, so to understand the differences I did some research and found some interesting posts.

The first time I heard about “Sharding” was on Been Admininig’s blog about Unorthodox approach to database design (Part I and Part II). Here is the exact reference…

Splitting up the user data so that User A exists on one serverwhile User B exists on another server, each server now holds a shard ofthe data in this federated model.

A couple of months ago Highscalability.com picked it up and made it sound (probably unintentionally) that sharding is actually different from Federation and Partitioning. Todd’s post also points at Flickr using sharding.The search for Flickr architecture lead me to Colin Charles’ post about Federation at Flickr: A tour of the Flickr architecture where he does mention shards as a component of Federation key. Again no mention of Sharding being anything new.

Federation Key Components:

  • Shards: My data gets stored on my shard, but the record ofperforming action on your comment, is on your shard. When making acomment on someone elses’ blog
  • Global Ring: Its like DNS, you need to know where to go and whocontrols where you go. Every page view, calculate where your data is,at that moment of time.
  • PHP logic to connect to the shards and keep the data consistent (10 lines of code with comments!)

Based on the discussions on these and other blogs, “Shards” sounds more like a terminology used to describe fragments of data which is federated across multiple databases instead of an architecture by itself. I think Theo Schlossnagle has a valid argument. If any of you disagree I’m interested to hear what you have to say. A clearer definition between sharding and federation would be very helpful as well.

Here are more references to Shard/Sharding.

      Popularity: 51%

      TypePad architecture: Problems and solutions

      Saturday, August 25th, 2007

      TypePad was and probably is one of the first and largest paid blogging service in the world. In a presentation at OSCON 2007 , Lisa Phillips and Garth Webb spoke about TypePad’s problems in 2005. Since this is a common problem with any successful company I found it interesting enough to research a little more.

      TypePad was, like any other service, initially designed in the traditional way with Linux, Postgres, Apache, mod_perl, perl as the front end and NFS storage for images on a filer. At that time they were pushing close to 250mbps (4TB per day) through multiple pipes and with growing user base, activity and data they were growing at the rate of 10 to 20% per month.

      Just before the planned move to newer better data center, sometime in Oct 2005, TypePad started experiencing all kinds of problems due to its unexpected growth. The unprecedented stress on the system caused multiple failures over the next two months which ranged from hardware, software, storage to networking issues. While at times it made reading or publishing services to be completely unavailable, it also caused sporadic performance issues with statistic calculations.

      One of the most visible failures was in December of 2005 when during a routine maintenance, in the middle of the process of adding redundant storage, something caused the complete storage cluster to go offline which caused the entire bank of webservers serving the webpages went down . Because they had separate storage cluster for backend database, it wasn’t affected by the outage directly.

      Its at times like these that most companies fail to communicate with their users. Sixpart, fortunately, understood this early and did its job well.

      Today Typepad’s architecture is similar to the one of Livejournal with users distributed over multiple master-master mysql replication. They have partitioned the database by UserIDs and have a global database to map UserIDs to partitions. They use Mysql 5.0 with InnoDB and Linux Heartbeat for HA.

      The images though they decided to switch from a NFS storage to Perlbal ( Perl-based reverse proxy load balancer and web server) +MogileFS (open source distributed file system) which can scale much better with lower overhead over commodity hardware. Look at the image on the right which how Typepad served images in the transition phase from NFS to MogileFS. Follow the arrows with numbers to see how the requests go through within the network. For an image stored on MogileFS (Mogstored), the app server talks to MogileDB through mod_perl2 first (Step 3,4). MogileDB/mod_perl2 sends a Perlbal internal redirect(Step 5,6,7) to the actual image resource which is located on Mogstored(step 8,9).

      Since most of the activity on the blogs are read only operations, it made sense to add memcached early into the process to ease load on a lot of components.

      memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

      In another interesting approach to scalable architecture they recognized the fact that one of the most write intensive operations was commenting system which made them experiment with “The Schwartz“. This technology helped them use a queuing mechanism which could reliably delay write intensive operations to the database effectively allowing it to scale more.

      The Schwartz is taglined “a reliable job queue system” and was originally developed as a generic job processing system for Six Apart’s hosted services. It is used in production today on TypePad, Livejournal and Vox for managing tasks that can be performed by the system without user interaction.

      References

      http://www.sixapart.com/typepad/news/2005/10/to_our_customers.html

      http://www.niallkennedy.com/blog/archives/2005/12/typepad-outage-details.html

      http://www.movabletype.org/documentation/administrator/publishing/publish-queue.html

      Popularity: 27%