Scalability updates for Aug 27th 2010
Here are some of the big ones I have mentioned on my twitter/buzz feeds.
- Tools: Real-time Relationship Analytics from large scale graph processing using cassandra [code here]
- Hadoop 0.21 has been released. The one feature I think is really cool is ability to â€œappendâ€ in hdfs. I found the lack of append feature slightly limiting to what I was trying to do last month.
- Short intro to flume : Flume is a distributed log collection service which can collect and write logs to HDFS. If you have ever had problems aggregating logs, you should take a look at this.
- Topsy just upgraded their backend engine and wrote all about it on their blog. Indexing twitter firehose is no small task, and these guys have done an amazing job.
- Its hard to write incrementing or decrementing counters using an eventually-consistent distributed datastore like cassandra. But its not impossible as shown in this example at git.
- Coudkick has been building their business around cassandra. They are now offering their experiences back to the community here and here.
- Riptano has been putting interesting videos online from their cassandra summits.
- Three papers on load balancing.