Product: Northscale memcached and membase server

Northscale, an organization formed by leaders of memcached open source project, today launched a couple of interesting products. The first one is an enhanced distribution of standard opensource memcached server which also does “secure multi-tenancy, dynamic cluster scaling and browser-based administration”. The second product is “Membase server” is a distributed key-value datastore build on top of memcached, but uses relational DB for persistence underneath.

The features provided seems to be geared towards organizations which lack (or perfer not to spend on) in-house technical expertise to test, develop and manage advanced caching technologies.

One of the biggest customers of NorthScale happens to be Zynga, which has been using it to power its FarmVille and Cafe World applications since Dec last year.

BTW, they also have a “labs” site which gives some idea of what other stuff they have been cooking.

  • Moxi – memcahed proxy
  • libconflate – “a library that helps you configure large installations of applications easily. It’s designed to be easy to use, safe, and autonomous while still leveraging centralized resources to allow you to have your clusters easily adapt to change.”

The Reddit problem: Learning from mistakes

Reddit has a very interesting post about what not to do when trying to build a scalable system. While the error is tragic, I think its an excellent design mistakes to learn from.

Though the post lacked detailed technical report, we might be able to recreate what happened. They mentioned they are using MemcacheDB datastore, with 2GB RAM per node, to keep some data which they need very often.

Dont be confused between MemcacheDB and memcached. While memcached is a distributed cache engine, MemcacheDB is actually a persistent datastore. And because they both use the same protocol, applications often use memcached libraries to connect to MemcacheDB datastore. redditheader

Both memcached and MemcacheDB rely on the clients to figure out how the keys are distributed across multiple nodes. Reddit chose MD5 to generate the keys for their key/value pairs. The algorithm Reddit used to identify which node in the cluster a key should go to could have been dependent on the number of nodes in the system. For example one popular way to identify which node a key should be on would be to use the “modulo” function. For example key “k” could be stored on the node “n” where “n=k modulo 3”. [ If k=101, then n=2 ]


Though MemcacheDB uses BDB to persist data, it seems like they heavily relied on keeping all the data in RAM. And at some point they might have hit the upper limit on what could be cached in RAM which caused disk i/o which resulted in slower response times. In a scalable architecture one should have been able to add new nodes and the system should have been able to scale.

Unfortunately though this algorithm works beautifully during regular operation, it fails as soon as you add or remove a node (when you change n). At that point you can’t guarantee that all the data you previously stored on node n would still be on the same node.

And while this algorithm may still be ok for “memcached” cache clusters, its really bad for MemcacheDB which requires “consistent hashing”.


Reddit today announced that they have increased RAM on these MemcacheDB servers from 2GB to 6GB, which allows 94% of their DB to be kept in memory. But they have realized their mistake (they probably figured this out long time back) and are thinking about how to fix it. The simplest solution of adding a few nodes requires re-hashing their keys which would take days according to their estimate. And of course just adding nodes without using some kind of “consistent hashing” is still not a scalable solution.

I personally learnt two things

  • Dont mix MemcacheDB and memcached. They are not designed to solve the same problem.
  • Don’t just simply replace memcached with MemcacheDB without thinking twice

There are many different products out there today which do a better job at scaling, so I won’t be surprised if they abandon MemcacheDB completely as well.

Scaling updates for Feb 10, 2010

Lots of interesting updates today.

But would like to first mention the fantastic work Cloud computing group at UCSB are doing to make appengine framework more open. They have done significant work at making appscale “work” with different kinds of data sources including HBase, Cassandra, Voldemort, MongoDB, Hypertable and Mysql and MemcacheDB. Appscale is actively looking for folks interested in working with them to make this stable and production ready.

  • GAE 1.3.1 released: I think the biggest news about this release is the fact that 1000 row limit has now been removed. You still have to deal with the 30 second processing limit per http request, but at least the row limit is not there anymore. They have also introduced support for automatic transparent datastore api retries for most operations. This should dramatically increase reliability of datastore queries, and reduces the amount of work developers have to do to build this auto-retry logic.
  • Elastic search is a lucene based indexing product which seems to do what Solr used to do with the exception that it can now scale across multiple servers. Very interesting product. I’m going to try this out soon.
  • MemcacheDB: A distributed key-value store which is designed to be persistent. It uses memcached protocol, but its actually a datastore (using Berkley DB) rather than cache. 
  • Nasuni seems to have come up with NAS software which uses cloud storage as the persistent datastore. It has capability to cache data locally for faster access to frequently accessed data.
  • Guys at Flickr have two interesting posts you should glance over. “Using, Abusing and Scaling MySQL at Flickr” seems to be the first in a series of post about how flickr scales using Mysql. The next one in the series is “Ticket Servers: Distributed Unique Primary Keys on the Cheap”
    • Finally a fireside chat by Mike Schroepfer, VP of Engineering,  about Scaling Facebook.