Posted in December 18, 2010 ¬ 3:09 pmh.RoyansComments Off
A few weeks ago I mentioned Yahoo! Labs was working on something called S4 for real-time data analysis. Yesterday they released an 8 page paper with detailed description of how and why they built this. Here is the abstract from the paper. Its interesting to note that the authors compared S4 with MapReduce and explained [...]
Read the rest of this entry »
Posted in December 3, 2010 ¬ 6:24 amh.RoyansComments Off
Found an interesting new open source project which I hadn’t heard about before. Kafka is a messaging system used by linkedin to serve as the foundation of their activity stream processing. Kafka is a distributed publish-subscribe messaging system. It is designed to support the following Persistent messaging with O(1) disk structures that provide constant time [...]
Read the rest of this entry »
Posted in November 9, 2010 ¬ 7:08 amh.Royans
Ever since I saw a demo of this tool, I’ve been on the edge, waiting for it to be opensourced so that I could use it. The problem its trying to solve is a real pain-point which most webops folks would understand. Yesterday folks at stumbleupon finally opened it up. Its released under LGPLv3 license. [...]
Read the rest of this entry »
Posted in October 6, 2010 ¬ 9:17 pmh.Royans
Dynamic infrastructure can be a challenging if apps and scripts can’t keep up with them. At Ingenuity we observed this problem when we started moving towards virtualization and SOA (service oriented architecture). Remembering server names became impractical, and error-free manual configuration changes became impossible. While there are some tools which solve parts of this specific [...]
Read the rest of this entry »
cassandra, cfmap, dashboard, discover, distributed, publishcassandra, cfmap, dashboard, discover, distributed, publish
Posted in June 1, 2010 ¬ 6:32 pmh.RoyansComments Off
Most of us who deal with traditional databases take auto-increments for granted. While auto-increments are simple on consistent clusters, it can become a challenge in a cluster of independent nodes which don’t use the same source for the unique-ids. Even bigger challenge is to do it in such a way so that they are roughly [...]
Read the rest of this entry »