Real-Time MapReduce using S4
While trying to figure out how to do real-time log analysis in my own organization I realized that most map- reduce frameworks are designed to run as batch jobs in time delays manner rather than be instantaneous like a SQL query to a Mysql DB. There are some frameworks which are bucking the trend. Yahoo! Lab! recently announced that their “Advertising Sciences†group has built a general purpose, real-time, distributed, fault-tolerant, scalable, event driven, expandable platform called “S4†which allows programmers to easily implement applications for processing continuous unbounded streams of data. S4 clusters are built using low-cost commoditized hardware, and leverage many technologies from Yahoo!’s Hadoop project. S4 is written in Java and uses the Spring Framework to build a software component architecture. Over a dozen pluggable modules have been created so far. Why do we need a real-time map-reduce framework? Applications such as personalization, user fee