Showing posts from October 24, 2007

Scaling technorati - 100 million blogs indexed everyday

Indexing 100 million blogs with over 10 billion objects, and with a user base which is doubling every six months, technorati has an edge over most blog search engines. But they are much more than search, and any technorati user can explain you that. I recommend you read John Newton's interview with David Sifry which I found fascinating. Here are the highlights from the interview if you don't have time to read the whole thing Current status of technorati 1 terabyte a day added to its content storage 100 million blogs 10 billion objects 0.5 billion photos and videos Data doubling every six months Users doubling every six months The first version was supposed to be for tracking temporal information on low budget. That version put everything in relational database which was fine since the index sizes were smaller then physical memory It worked fine till about 20 million blogs The next generation took advantage of parallelism. Data was broken up into shard