March 18, 2010

Spanner: Google’s next Massive Storage and Computation infrastructure

MapReduce, Bigtable and Pregel have their origins in Google and they all deal with “large systems”. But all of them may be dwarfed in size and complexity by a new project Google is working on, which was mentioned briefly (may be un-intentionally) at an event last year.

Instead of caching data closer to user, it looks like Google is trying to take “the data” to the user. If you use GMail or a Google Doc service, then with this framework, Google could, auto-magically, “move” one of the master copies of your data to the nearest Google data center without really having to cache anything locally. And because they are building one single datastore cluster around the world, instead of building hundreds of smaller ones for different applications, it looks like they may not don’t need dedicated clusters for specific projects anymore.

Below is the gist of “Spanner” from a talk by Jeff Dean at Symposium held at Cornell. Take a look at the rest of the slides if you are interested in some impressive statistics on hardware performance and reliability.

  • Spanner: Storage & computation system that spans all our datacenters
    • Single global namespace
      • Names are independent of location(s) of data
      • Similarities to Bigtable: table, families, locality groups, coprocessors,…
      • Differences: hierarchical directories instead of rows, fine-grained replication
      • Fine-grained ACLs, replication configuration at the per-directory level
    • support mix of strong and weak consistency across datacenters
      • Strong consistency implemented with Paxos across tablet replicas
      • Full support for distributed transactions across directories/machines
    • much more automated operation
      • System automatically moves and adds replicas of data and computation based on constraints and usage patterns
      • Automated allocation of resources across entire fleet of machines.




Flow » Blog Archive » Daily Digest for March 20th - The zeitgeist daily said...

[...] Shared Spanner: Google’s next Massive Storage and Computation infrastructure. [...]

Scalability links for March 20th 2010 – Lots of datastore related items | Scalable web architectures said...

[...] Spanner – Google’s plan to build a single wordwide cluster [...]

Chris Alexander - Data Geek’s Dream: Google Spanner said...

[...] more on Spanner, check out these two posts from The Register and Scalable Web Architecture blog. GA_googleFillSlot("article-footer"); Advertise Here [...]

Google and Its Star Fleet Plumbing : Beyond Search said...

[...] is looking to employ many more servers in years to come. The company has designed a system called Spanner to automate management across data centers. Servers may reach from 1 million to 10 million [...]