Spanner: Google’s next Massive Storage and Computation infrastructure
MapReduce, Bigtable and Pregel have their origins in Google and they all deal with “large systems”. But all of them may be dwarfed in size and complexity by a new project Google is working on, which was mentioned briefly (may be un-intentionally) at an event last year. 
Instead of caching data closer to user, it looks like Google is trying to take “the data” to the user. If you use GMail or a Google Doc service, then with this framework, Google could, auto-magically, “move” one of the master copies of your data to the nearest Google data center without really having to cache anything locally. And because they are building one single datastore cluster around the world, instead of building hundreds of smaller ones for different applications, it looks like they may not don’t need dedicated clusters for specific projects anymore.
Below is the gist of “Spanner” from a talk by Jeff Dean at Symposium held at Cornell. Take a look at the rest of the slides if you are interested in some impressive statistics on hardware performance and reliability.
- Spanner: Storage & computation system that spans all our datacenters
- Single global namespace
- Names are independent of location(s) of data
- Similarities to Bigtable: table, families, locality groups, coprocessors,…
- Differences: hierarchical directories instead of rows, fine-grained replication
- Fine-grained ACLs, replication configuration at the per-directory level
- support mix of strong and weak consistency across datacenters
- Strong consistency implemented with Paxos across tablet replicas
- Full support for distributed transactions across directories/machines
- much more automated operation
- System automatically moves and adds replicas of data and computation based on constraints and usage patterns
- Automated allocation of resources across entire fleet of machines.
- Single global namespace
References
Related posts:
You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.



Design Goals: ~10^6 to 10^7 machines, ~10^13 directories, ~10^18 bytes of storage (that’s 888 petabytes), spread at 100s to 100s of location around the world, ~10^9 client machines.Simply staggering. I guess now we also have a formal definition of "Google Scale" which the Google PR people always go on about.
This comment was originally posted on Hacker News
[Translate]
Jeff Dean is like a god around here.
This comment was originally posted on Hacker News
[Translate]
Jeff Dean is like a god around here. Fact: Jeff Dean was forced to invent asynchronous APIs one day when he optimized a function so that it returned before it was invoked.
This comment was originally posted on Hacker News
[Translate]
Jeff Dean is like a god around here. Fact: Jeff Dean was forced to invent asynchronous APIs one day when he optimized a function so that it returned before it was invoked. Also, gcc -O4 sends your code directly to Jeff Dean for a rewrite.
This comment was originally posted on Hacker News
[Translate]
See http://www.google.co.uk/search?q=googol
This comment was originally posted on Hacker News
[Translate]
It’s one machine per 100 to 1000 clients. There’s many ways to interpret that. On one hand, it’s a measure of how much computing Google is doing for clients. On the other, it’s an indication of the level of Google’s costs per client, and indirectly how much revenue they make from those clients.It’s almost government-scale infrastructure. It’s approaching the point where Google is running a machine per street. I wouldn’t be surprised if a government made a grab for the local section of this infrastructure in some country within the next couple of decades, whether it’s to control their citizens, or to wrest control from the corporation on behalf of their citizens.
This comment was originally posted on Hacker News
[Translate]
Must tell my boss about it. "Spanner" is German for peeping tom.
This comment was originally posted on Reddit
[Translate]
[...] Shared Spanner: Google’s next Massive Storage and Computation infrastructure. [...]
[Translate]
Ha! Spanner is my online gaming name.
This comment was originally posted on Reddit
[Translate]
So Google next big platform is called something very similar to "Spammer". It makes a lot of sense for an advertising company that at the same time is the third biggest emali provider in the world.
This comment was originally posted on Reddit
[Translate]
Another PR news article.
This comment was originally posted on Reddit
[Translate]
This has skynet written all over it
This comment was originally posted on Reddit
[Translate]
[...] Spanner – Google’s plan to build a single wordwide cluster [...]
[Translate]
Exactly what I thought. Where’s the friggin source code, google?
This comment was originally posted on Reddit
[Translate]
’spanner’ is also a word you can call someone in Britain to offend them. I think when Linus chose the name ‘git’ for a project he was on to something.
This comment was originally posted on Reddit
[Translate]
You could, you know, [read the slides](http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf)
This comment was originally posted on Reddit
[Translate]
This site is reposting all of reddits comments on it’s own page? Huh?
This comment was originally posted on Reddit
[Translate]
Now we can finally define Google scale
Overheard on hacker news – "Jeff Dean is like a god around here. Fact: Jeff Dean was forced to invent asynchronous APIs one day when he optimized a function so that it returned before it was invoked."
This comment was originally posted on FriendFeed
[Translate]
[...] more on Spanner, check out these two posts from The Register and Scalable Web Architecture blog. GA_googleFillSlot("article-footer"); Advertise Here [...]
[Translate]