Author Archive

Scalability links for March 13th 2010

For some reason there has been a disproportionately high number of news items on Cassandra lately. Some of those are included below, but also included are some other interesting updates which you might have missed.

Rackspace and Drizzle: Its time to rethink everything
Haproxy 1.4 – Now supports mysql health checks – This is a big deal [...]

Read the rest of this entry »

Automated, faster, repeatable, scalable deployments

While efficient automated deployment tools like Puppet and Capistrano are a big step in the right direction, its not the complete solution for an automated deployment process. This post will explore some of the less discussed issues which are as important for automated, fast, repeatable scalable deployments. 
Rapid Build and Integration with tests

Use Source control [...]

Read the rest of this entry »

Disaster Recovery: Impressive RPO and RTO objectives set by Google Apps Operations

Unless you are running a fly by night shop, DR (Disaster recovery) should be one of the top issues for your operations team. In a “Scalable architecture” world, the complexity of DR can become a disaster in itself. 
Yesterday Google Announced that it now finally has DR plan for Google Apps. While this is nice, [...]

Read the rest of this entry »

The Reddit problem: Learning from mistakes

Reddit has a very interesting post about what not to do when trying to build a scalable system. While the error is tragic, I think its an excellent design mistakes to learn from.
Though the post lacked detailed technical report, we might be able to recreate what happened. They mentioned they are using MemcacheDB datastore, with [...]

Read the rest of this entry »

Scalability links for Feb 28th 2010

State of current NoSQL databases : A very detailed post about many NoSQL solutions. A lot of work went into this one.
Truth about joins: Google app engine datastore’s limitation of not allowing joins might soon be a thing of the past. Simple joins may now be possible on GAE if you are using Java. Its [...]

Read the rest of this entry »

Cassandra as a communication medium – A service Registry and Discovery tool

Few weeks ago while I was mulling over what kind of service registry/discovery system to use for a scalable application deployment platform, I realized that for mid-size organizations with complex set of services, building one from scratch may be the only option.
I also found out that many AWS/EC2 customers have already been using S3 and [...]

Read the rest of this entry »

Talk on “database scalability”

This is a very interesting talk by Jonathan Ellis on database scalability. He designed and implemented multi-petabyte storage for Mozy and is currently the project chair for Apache Cassandra.

What every developer should know about database scalability, PyCon 2010
View more presentations from jbellis.

Scalability is not improving latency, but increasing throughput
But overall performance shouldn’t degrade
Throw [...]

Read the rest of this entry »

Scalable logging using Syslog

Syslog is a commonly used transport mechanism for system logs. But people sometimes forget it could be used for a lot of other purposes as well.
Take, for example, the interesting challenge of aggregating web server logs from 100 different servers into one server and then figuring out how to merge them. If you have [...]

Read the rest of this entry »

SimpleDB now allows you to tweak consistency levels

We discussed Brewer’s Theorm a few days ago and how its challenging to obtain Consistency, Availability and Partition tolerance in any distributed system. We also discussed that many of the distributed datastores allow CAP to be tweaked to attain certain operational goals.
Amazon SimpleDB, which was released as an “Eventually Consistent” datastore,  today launched a [...]

Read the rest of this entry »

NoSQL in the Twitter world

NoSQL solutions have one thing in common. They are generally designed for horizontal scalability. So its no wonder that lot of applications in the “twitter” world have picked NoSQL based datastores for their persistence layer. Here is a collection of these apps from MyNoSQL blog.

Twitter uses Cassandra
MusicTweets used Redis [ Ref ] – The [...]

Read the rest of this entry »