June 19, 2010

Pingdom: Software behind facebook

Pingdom has an interesting post which lists the various components which runs facebook. “Exploring the software behind Facebook, the world’s largest site”Facebook

Few interesting statistics listed

    • Facebook serves 570 billion page views per month (according to Google Ad Planner).
    • There are more photos on Facebook than all other photo sites combined (including sites like Flickr).
    • More than 3 billion photos are uploaded every month.
    • Facebook’s systems serve 1.2 million photos per second. This doesn’t include the images served by Facebook’s CDN.
    • More than 25 billion pieces of content (status updates, comments, etc) are shared every month.
    • Facebook has more than 30,000 servers (and this number is from last year!)

I’m not sure facebook is really the “largest site” based on servers alone, but its definitely the largest based on unique users in US.

Slides from a Cassandra talk at Mountain View

Whats not mentioned in the slide was Gary’s reference to the number of key changes in 0.7 version of Cassandra. He thinks beta would be out in a month and that it will address a lot of issues which is currently keeping a lot of Cassandra users away. Few interesting points

  • 0.5, 0.6 use the same version of SSTABLE (to store data on disk), but 0.7 changes that. This will require some kind of migration if 0.7 doesn’t support reading old versions of SSTABLE.
  • until now, one needs 50% disk space available (free) to do compaction operation. This might improve with 0.7
  • 0.7 would probably have more support for avro (instead of thrift). He wonders why thrift hasn’t caught on
  • Vector clocks coming..
  • altering keyspace and column families is not possible on a live system today… might change with future version
  • Compression is being thought about…

He strongly urged users to use client libraries which abstract out the internals of Cassandra’s internal workings. It was convincing enough for me to investigate a move from cassandra’s java lib, to “hector” for my java application.