Showing posts from November 3, 2010

Real-Time MapReduce using S4

While trying to figure out how to do real-time log analysis in my own organization I realized that most map- reduce frameworks are designed to run as batch jobs in time delays manner rather than be instantaneous like a SQL query to a Mysql DB. There are some frameworks which are bucking the trend. Yahoo! Lab! recently announced that their “Advertising Sciences” group has built a general purpose, real-time, distributed, fault-tolerant, scalable, event driven, expandable platform called “S4” which allows programmers to easily implement applications for processing continuous unbounded streams of data. S4 clusters are built using low-cost commoditized hardware, and leverage many technologies from Yahoo!’s Hadoop project. S4 is written in Java and uses the Spring Framework to build a software component architecture. Over a dozen pluggable modules have been created so far. Why do we need a real-time map-reduce framework? Applications such as personalization, user fee

Storage options on app engine

For those who think google app engine only has one kind of datastore, the one built around “ bigtable ”, think again. Nick Johnson goes into details of all the other options available with their pro’s and con’s in his post. App Engine provides more data storage mechanisms than is apparent at first glance. All of them have different tradeoffs, so it's likely that one - or more - of them will suit your application well. Often, the ideal solution involves a combination, such as the datastore and memcache, or local files and instance memory. Storage options he lists.. Datastore Memcache Instance memory Local Files Read more: Original post from Nick