Showing posts from March 1, 2010

The Reddit problem: Learning from mistakes

Reddit has a very interesting post about what not to do when trying to build a scalable system. While the error is tragic, I think its an excellent design mistakes to learn from. Though the post lacked detailed technical report, we might be able to recreate what happened. They mentioned they are using MemcacheDB datastore, with 2GB RAM per node, to keep some data which they need very often. Dont be confused between MemcacheDB and memcached . While memcached is a distributed cache engine, MemcacheDB is actually a persistent datastore. And because they both use the same protocol, applications often use memcached libraries to connect to MemcacheDB datastore. Both memcached and MemcacheDB rely on the clients to figure out how the keys are distributed across multiple nodes. Reddit chose MD5 to generate the keys for their key/value pairs. The algorithm Reddit used to identify which node in the cluster a key should go to could have been dependent on the number of nodes in the system. For