Archive for October, 2007

Scaling technorati – 100 million blogs indexed everyday

Indexing 100 million blogs with over 10 billion objects, and with a user base which is doubling every six months, technorati has an edge over most blog search engines. But they are much more than search, and any technorati user can explain you that. I recommend you read John Newton’s interview with David Sifry which [...]

Read the rest of this entry »

Scalability stories for Oct 22, 2007

Why most large-scale sites which scale are not written in java ?  -  ( What nine of the world’s largest websites are running on) -  A couple of very interesting blogs to read.
Slashdot’s setup Part 1 – Just in time for the 10 year anniversary.
Flexiscale – Looks like an amazon competitor in the hosting [...]

Read the rest of this entry »

Crawling sucks.

I wrote my first crawler in a few lines of perl code to spider a website recursively about 10 years ago. Two years ago I wrote another crawler in a few thousand lines using java+php and mysql. But this time I wasn’t really interested in competing with google, and instead crawled feeds (rss/atom). Google hadn’t [...]

Read the rest of this entry »

EC2 for everyone. And now includes 64bit with 15GB Ram too.

 
Finally it happened. EC2 is available for everybody. And more than that they now provide servers with 7.5GB and 15GB of RAM per instance. Sweet.  
For a lot of companies EC2 was not viable due to high memory requirements of some of the applications. Splitting up such tasks to use less memory on multiple servers [...]

Read the rest of this entry »

Web Scalability dashboard

[Blogofy: bringing feeds together ]
I took a week’s break from blogging to work on one of my long overdue personal projects. Even though I use Google Reader as my feed aggregator I noticed a lot of folks still prefer a visual UI to track news and feeds. The result of my experimentation of designing such [...]

Read the rest of this entry »