Posts

Showing posts from March, 2010

You don’t have to be Google to use NoSQL

Image
Ted Dziuba has a post about “ I can’t wait for NoSQL to Die ”. The basic argument he makes is that one has to be at the size Google is to really benefit from NoSQL. I think he is missing the point. Here are my observations. This is similar to the argument the traditional DB vendors were making when companies started switching away from the likes of Oracle/DB2 to MySQL. The difference between then and now is that before it was Large established databases vendors against the smaller (open-source) ones, and now its RDBMS vs non-RDBMS datastores. Why NoSQL : The biggest difference between an RDBMS and a NoSQL datastore is the fact that NoSQL datastructures have no pre-defined schemas. That doesn’t mean that the developers don’t have to think about the data structure before using a NoSQL solution, but it does provide the opportunity to developers to add new columns which were not thought of at design time with little or no impact on applications using it. You can add a

Scalability links for March 20th 2010 – Lots of datastore related items

Interesting news items for the week Subversion at Google Code is now replicated across multiple datacenter, is 3 times faster and uses Paxos algorithm to guarantee consistency VMware hired the Redis creator Amazon S3 Versioning is ready for production    Presentations,  talks and opinions  Pregel – Google’s Graph DB infrastructure The first public talk (as far as I know) on Pregel is scheduled to happen at Sigmod2010 Spanner – Google’s plan to build a single wordwide cluster Itegrating Apache Mahout with Lucense and Solr Redis overview Thoughts on Drizzle   and Thoughts on thoughts on Drizzle From NoSQL Live in Boston HyperGraphDB   Neo4j riak   Tokyo Cabinet MongoDB with Groovy Riak and Cassandra – this started a sh!t storm which has since coole

Spanner: Google’s next Massive Storage and Computation infrastructure

Image
MapReduce , Bigtable and Pregel have their origins in Google and they all deal with “large systems”. But all of them may be dwarfed in size and complexity by a new project Google is working on, which was mentioned briefly (may be un-intentionally) at an event last year. Instead of caching data closer to user, it looks like Google is trying to take “the data” to the user. If you use GMail or a Google Doc service, then with this framework, Google could, auto-magically, “move” one of the master copies of your data to the nearest Google data center without really having to cache anything locally. And because they are building one single datastore cluster around the world, instead of building hundreds of smaller ones for different applications, it looks like they may not don’t need dedicated clusters for specific projects anymore. Below is the gist of “Spanner” from a talk by Jeff Dean at Symposium held at Cornell . Take a look at the rest of the slides if you are in

Pregel: Google’s other data-processing infrastructure

Image
Inside Google, MapReduce is used for 80% of all the data processing needs. That includes indexing web content , running the clustering engine for Google News , generating reports for popular queries ( Google Trends ), processing satellite imagery , language model processing for statistical machine translation and  even mundane tasks like data backup and restore. The other 20% is handled by a lesser known infrastructure called “Pregel” which is optimized to mine relationships from “graphs”. According to wikipedia a “graph” is a collection of vertices or ‘nodes’ and a collection of ‘edges’ that connect pair of ‘nodes’.  Depending on the requirements, a ‘graph’ can be undirected which means there is no distinction between the two ‘nodes’ in the graph, or it could be directed from one ‘node’ to another. While you can calculate something like ‘pagerank’ with MapReduce very quickly, you need more complex algorithms to mine some other kinds of

Product: Northscale memcached and membase server

Image
Northscale, an organization formed by leaders of memcached open source project, today launched a couple of interesting products. The first one is an enhanced distribution of standard opensource memcached server which also does “secure multi-tenancy, dynamic cluster scaling and browser-based administration”. The second product is “ Membase server ” is a distributed key-value datastore build on top of memcached, but uses relational DB for persistence underneath. The features provided seems to be geared towards organizations which lack (or perfer not to spend on) in-house technical expertise to test, develop and manage advanced caching technologies. One of the biggest customers of NorthScale happens to be Zynga , which has been using it to power its FarmVille and Cafe World applications since Dec last year. BTW, they also have a “ labs ” site which gives some idea of what other stuff they have been cooking. Moxi – memcahed proxy libconflate - “ a library

HBase at Adobe

Cosmin Lehene , has a wonderful pair of posts about how a team in Adobe selected, tested and implemented an HBase based datastore for production use. Part 1 Part 2 Its interesting how much they thought about failure and about the backup for backups. And in spite of all that how things still break. Building something new based on cutting-edge technology is not for the faint hearted. It needs to be supported by the organization, lead by those who can see the future, and backed by a team of experts who are always ready for challenges.

Scalable Tools: Murder: a bittorent based, file transfer tool for faster deployments

Image
Deploying to one server can be done with a single script doing a bunch of ssh/scp commands. If you have a few more servers, you can run it in a loop sequentially or fork the processes to do them in parallel. At some point though, it will get unmanageable especially if you have the challenge of updating multiple datacenters at the same time. So how does a company like twitter do their releases ? Murder is an interesting P2P/Bittorent based tool which twitter uses to distribute files during its own software updates. Here are some more details. Murder is a method of using Bittorrent to distribute files to a large amount of servers within a production environment. This allows for scalable and fast deploys in environments of hundreds to tens of thousands of servers where centralized distribution systems wouldn't otherwise function. A "Murder" is normally used to refer to a flock of crows, which in this case applies to a bunch of servers doing something. In order to do a

Scalability links for March 13th 2010

Image
For some reason there has been a disproportionately high number of news items on Cassandra lately. Some of those are included below, but also included are some other interesting updates which you might have missed. Rackspace and Drizzle: Its time to rethink everything Haproxy 1.4 – Now supports mysql health checks – This is a big deal for users using haproxy to loadbalance Mysql servers. Video: Doug cutting : Founding of Hadoop Horizontal scaling:  aka loadbalancers – This is a video. As they A picture is worth a thousand words. How Farmville Scales – The Follow up Appengine joins IPv6 – This is the odd one out. I don’t think anyone other than google is worried about IPv6 At this point. IPv4 is really running out, and IPv6 address space is easy to get. If you have customers on IPv6, its very helpful if the server is also on IPv6. But the infrastructure may not be there yet… Automated, faster, repeatable, scalable deployments   Datas

Automated, faster, repeatable, scalable deployments

Image
While efficient automated deployment tools like Puppet and Capistrano are a big step in the right direction, its not the complete solution for an automated deployment process. This post will explore some of the less discussed issues which are as important for automated, fast, repeatable scalable deployments.  Rapid Build and Integration with tests Use Source control to build an audit trail: Put everything possible in it, including configurations and deployment scripts. Continuous Builds triggered by code check-ins can detect and report problems early Use tools which provide targeted feedback about build failures. It reduces noise and improves over all quality faster Faster the build happens after a check-in, better are the chances for bugs to get fixed quickly. Delays can be costly since broken builds could impact other developers as well Build smaller components (fail fast) Continuous integration tests of all components can detect er

Disaster Recovery: Impressive RPO and RTO objectives set by Google Apps Operations

Image
Unless you are running a fly by night shop, DR (Disaster recovery) should be one of the top issues for your operations team. In a “Scalable architecture” world, the complexity of DR can become a disaster in itself.  Yesterday Google Announced that it now finally has DR plan for Google Apps . While this is nice, one should always take such messages with a pinch of salt, until they prove it that they can do it. Look at the DR plan for Google App engine which was also there, but still suffered more than 2 hour outage because of incomplete documentation, insufficient training and probably lack of someone to make a quick decisive decision at the time of failure. But back to Google Apps for now. These guys are planning for an RPO of 0 seconds , which means multiple datacenters will always be in consistent state all the time.  And they want a RTO to be instant failover as well ! This is an incredible DR plan, and requires technical expertise in all 7 layers of OSI Model to achieve it

The Reddit problem: Learning from mistakes

Image
Reddit has a very interesting post about what not to do when trying to build a scalable system. While the error is tragic, I think its an excellent design mistakes to learn from. Though the post lacked detailed technical report, we might be able to recreate what happened. They mentioned they are using MemcacheDB datastore, with 2GB RAM per node, to keep some data which they need very often. Dont be confused between MemcacheDB and memcached . While memcached is a distributed cache engine, MemcacheDB is actually a persistent datastore. And because they both use the same protocol, applications often use memcached libraries to connect to MemcacheDB datastore. Both memcached and MemcacheDB rely on the clients to figure out how the keys are distributed across multiple nodes. Reddit chose MD5 to generate the keys for their key/value pairs. The algorithm Reddit used to identify which node in the cluster a key should go to could have been dependent on the number of nodes in the system. For