Is auto-sharding ready for auto-pilot ?

James Golick makes a point which lot of people miss. He doesn’t believe auto-sharding features NoSQL provides is ready for full auto-pilot yet, and that good developers have to think about sharding as part of design architecture, regardless of what datastore you pick.

If you take at face value the marketing materials of many NoSQL database vendors, you’d think that with a horizontally scalable data store, operations engineering simply isn’t necessary. Recent high profile outages suggest otherwise.

MongoDB, Redis-cluster (if and when it ships), Cassandra, Riak, Voldemort, and friends are tools that may be able to help you scale your data storage to varying degrees. Compared to sharding a relational database by hand, using a partitioned data store may even reduce operations costs at scale. But fundamentally, no software can use system resources that aren’t there.

At the very least one has to understand how auto sharding in a NoSQL works, how easy is it to setup, maintain, backup and restore. “Rebalancing” can be an expensive operation, and if shards are separated by distance or high latency, some designs might be better than others.