You donâ€™t have to be Google to use NoSQL
Ted Dziuba has a post about â€œI canâ€™t wait for NoSQL to Dieâ€. The basic argument he makes is that one has to be at the size Google is to really benefit from NoSQL. I think he is missing the point.
Here are my observations.
- This is similar to the argument the traditional DB vendors were making when companies started switching away from the likes of Oracle/DB2 to MySQL. The difference between then and now is that before it was Large established databases vendors against the smaller (open-source) ones, and now its RDBMS vs non-RDBMS datastores.
- Why NoSQL: The biggest difference between an RDBMS and a NoSQL datastore is the fact that NoSQL datastructures have no pre-defined schemas. That doesnâ€™t mean that the developers donâ€™t have to think about the data structure before using a NoSQL solution, but it does provide the opportunity to developers to add new columns which were not thought of at design time with little or no impact on applications using it. You can add and remove columns on the fly on most RDBMS as well, but those changes are usually considered significant. Also keep in mind that while NoSQL datastores could add columns at the row level, RDBMS solutions can only do it at the table level.
- Scalability: There are basically two ways to scale any web application.
- The first way is to build the app and leave the scalability issues for later (let the DBAs to figure out). This is an expensive iterative process which takes time to perfect. The issues around scalability and availability could be so complex that one may not be able to predict all the issues until they get used in production.
- The second way is to train the programmers to architect the database so that it can scale better once it hits production. There is a significant upfront cost, but it pays over time.
- NoSQL is the third way of doing it.
- It restricts programmers by allowing only those operations and data-structures which can scale
- And programmers who manage to figure out how to use it, have found that the these kind of restrictions guarantee significantly higher horizontal scalability than traditional RDBMS.
- By architecting databases before the product is launched, it also reduces the amount of outage and post-deployment migrations.
- High Availability: NoSQL is not just about scalability. Its also about â€œhigh-availabilityâ€ at a cheaper cost.
- While Ted did mention that some of the operations in Cassandra requires a restart, he forgot to mention that it doesnâ€™t require all the nodes to be restarted at the same time. The cassandra datastore continues to be available even without many of its nodes. This is a common theme across most of the NoSQL based datastores. [CASSANDRA-44]
- High availability over long distances with flaky network connection is not trivial to implement using traditional RDBMS based databases.
- You donâ€™t have to be Google to see benefits of using NoSQL.
- If you are using S3 or SimpleDB on AWS or using datastores on Googleâ€™s Appengine then you are already using NoSQL. Many of the smaller startups are actually finding AWS/GAE to be cheaper than hosting their own servers.
- One can still chose to use RDS like RDBMS solution, but they donâ€™t get the benefit of high-availability and scalability which S3/SimpleDB offers out-of-the-box.
- While scalability to terabytes may not be a requirement for many of the smaller organizations, high availability is absolutely essential for most organizations today. RDBMS based solutions can do that, but setting up multi-master replication across two datacenters is non-trivial
- Migration from RDBMS to NoSQL is not simple: I think Ted is right that not everyone will have success in cutting over from RDBMS to non-RDBMS world in one weekend. The reports of websites switching over to NoSQL overnight is sometimes grossly exaggerated. Most of these companies have been working on this for months if not years. And they would do extensive scalability, performance, availability and disaster-recovery tests before they put it in production.
- RDBMS is not going anywhere: I also agree with Ted that RDBMS is not going anywhere anytime soon. Especially in organizations which are already using it. In fact most NoSQL datastores still havenâ€™t figured out how to implement the level of security traditional RDBMS provide. I think thats the core reason why Google is still using it for some of its operational needs.
Finally, its my personal opinion that â€œCloud computingâ€ and commoditization of storage and servers were the key catalysts for the launch of so many NoSQL implementations. The ability to control infrastructure with APIs was a huge incentive for the developers to develop datastores which could scale dynamically as well. While Oracle/MySQL are not going anywhere anytime soon, â€œNoSQLâ€ movement is definitely here to stay and I wonâ€™t be surprised if it evolves more on the way.
- Haters Gonna Hate
- Reddit: learning from mistakes
- Digg: Saying yes to NoSQL; Going steady with Cassandra
- Twitter @ 2009/07 : Up and running with cassandra
- Twitter @ 2010/03 : Ryan King about Twitter and Cassandra
- NoSQL vs RDBMS: Let the flames begin !
- Brewerâ€™s CAP theorem on Distributed systems
- Database scalability
- What is scalability ?
- Thoughts on NoSQL