Posts

Showing posts from November 16, 2010

Sawzall and the PIG

Image
When I heard interesting uses cases of how “ Sawzall ” is used to hack huge amounts of log data within Google I was thinking about two things. Apache PIG, which is “a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.” CEP (Complex event processing) - consists in processing many events happening across all the layers of an organization , identifying the most meaningful events within the event cloud, analyzing their impact, and taking subsequent action in real time. [ Also look at esper ] Google has opened parts of this framework in a project called “ Szl ” Sawzall is a procedural language developed for parallel analysis of very large data sets (such as logs). It prov

Presentation: “OrientDB, the database of the web”

I knew there was something called “OrientDB”, but didn’t know much about it until I went through these slides. Here is what I learned in one sentence. Its a easy to install NoSQL(schemaless) datastore, with absolutely no configuration required, supports ACID transactions, it can be used as a document store, a graph store and a key value store, it can be queried using SQL-like and JSON syntax, supports indexing and triggers and its been benchmarked to do 150000 inserts using commodity hardware.  That’s a lot of features. OrientDB the database for the Web of lvca - Snoopal