Talk on “database scalability”

This is a very interesting talk by Jonathan Ellis on database scalability. He designed and implemented multi-petabyte storage for Mozy and is currently the project chair for Apache Cassandra.

  • Scalability is not improving latency, but increasing throughput
  • But overall performance shouldn’t degrade
  • Throw hardware, not people at the problem
  • Traditional databases use b-tree indexes. But requires the entire index to be in-memory at the same place.
  • Easy bandaid #1– SSD storage is better for b-tree indexes which need to hit disk
  • Easy bandaid #2 – Buy faster server every 2 years. As long as your userbase doesn’t grow faster that Moore’s law
  • Easy bandaid #3 – Use caching to handle hotspots (Distributed)
  • Memcache server failures can change where hashing keys are kept
  • Consistent hashing solves the problem by mapping keys to tokens. The tokens can move around to more or less server. Apps would be able to figure out which keys are where.