Showing posts from August, 2010

Scalability updates for Aug 27th 2010

My updates have been slow recently due to other things I’m involved in. If you need more updates around what I’m reading, please feel free to follow me on twitter or buzz . Here are some of the big ones I have mentioned on my twitter/buzz feeds. Tools: Real-time Relationship Analytics from large scale graph processing using cassandra   [ code here ] Hadoop 0.21 has been released . The one feature I think is really cool is ability to “append” in hdfs. I found the lack of append feature slightly limiting to what I was trying to do last month. Short intro to flume : Flume is a distributed log collection service which can collect and write logs to HDFS. If you have ever had problems aggregating logs, you should take a look at this. Topsy just upgraded their backend engine and wrote all about it on their blog . Indexing twitter firehose is no small task, and these guys have done an amazing job. Its hard to write incrementing or decrementing counters using an