Sawzall and the PIG
When I heard interesting uses cases of how “ Sawzall †is used to hack huge amounts of log data within Google I was thinking about two things. Apache PIG, which is “a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.†CEP (Complex event processing) - consists in processing many events happening across all the layers of an organization , identifying the most meaningful events within the event cloud, analyzing their impact, and taking subsequent action in real time. [ Also look at esper ] Google has opened parts of this framework in a project called “ Szl †Sawzall is a procedural language developed for parallel analysis of very large data sets (such as logs). It prov