Posted in November 17, 2010 ¬ 12:12 amh.RoyansComments Off
When I heard interesting uses cases of how “Sawzall” is used to hack huge amounts of log data within Google I was thinking about two things. Apache PIG, which is “a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. [...]
Read the rest of this entry »
Posted in November 4, 2010 ¬ 5:47 amh.Royans
While trying to figure out how to do real-time log analysis in my own organization I realized that most map-reduce frameworks are designed to run as batch jobs in time delays manner rather than be instantaneous like a SQL query to a Mysql DB. There are some frameworks which are bucking the trend. Yahoo! Lab! [...]
Read the rest of this entry »
Posted in March 18, 2010 ¬ 10:09 pmh.Royans
MapReduce, Bigtable and Pregel have their origins in Google and they all deal with “large systems”. But all of them may be dwarfed in size and complexity by a new project Google is working on, which was mentioned briefly (may be un-intentionally) at an event last year. Instead of caching data closer to user, it [...]
Read the rest of this entry »
datastore, eventually consistent, framework, google, mapreduce, replication, scalabilitydatastore, eventually consistent, google, mapreduce, replication, scalability
Posted in March 17, 2010 ¬ 11:16 pmh.Royans
Inside Google, MapReduce is used for 80% of all the data processing needs. That includes indexing web content, running the clustering engine for Google News, generating reports for popular queries (Google Trends), processing satellite imagery , language model processing for statistical machine translation and even mundane tasks like data backup and restore. The other 20% [...]
Read the rest of this entry »
Posted in January 19, 2010 ¬ 9:44 pmh.RoyansComments Off
After filing in 2004, google finally got its patent on “System and method for efficient large-scale data processing” approved yesterday. Gigaom pointed out that if Google really wants to enforce it, it would have to go after many different vendors who are implementing “mapreduce” in some form in their applications and databases. Google’s intentions of [...]
Read the rest of this entry »