Showing posts from March 17, 2010

Pregel: Google’s other data-processing infrastructure

Inside Google, MapReduce is used for 80% of all the data processing needs. That includes indexing web content , running the clustering engine for Google News , generating reports for popular queries ( Google Trends ), processing satellite imagery , language model processing for statistical machine translation and  even mundane tasks like data backup and restore. The other 20% is handled by a lesser known infrastructure called “Pregel” which is optimized to mine relationships from “graphs”. According to wikipedia a “graph” is a collection of vertices or ‘nodes’ and a collection of ‘edges’ that connect pair of ‘nodes’.  Depending on the requirements, a ‘graph’ can be undirected which means there is no distinction between the two ‘nodes’ in the graph, or it could be directed from one ‘node’ to another. While you can calculate something like ‘pagerank’ with MapReduce very quickly, you need more complex algorithms to mine some other kinds of