Ever since I saw a demo of this tool, Iâ€™ve been on the edge, waiting for it to be opensourced so that I could use it. The problem its trying to solve is a real pain-point which most webops folks would understand.
Yesterday folks at stumbleupon finally opened it up. Its released under LGPLv3 license. You can find the source here and the documentation here.
At StumbleUpon, we have found this system tremendously helpful to:
- Get real-time state information about our infrastructure and services.
- Understand outages or how complex systems interact together.
- Measure SLAs (availability, latency, etc.)
- Tune our applications and databases for maximum performance.
- Do capacity planning.
Cosmin Lehene, has a wonderful pair of posts about how a team in Adobe selected, tested and implemented an HBase based datastore for production use.
Its interesting how much they thought about failure and about the backup for backups. And in spite of all that how things still break. Building something new based on cutting-edge technology is not for the faint hearted. It needs to be supported by the organization, lead by those who can see the future, and backed by a team of experts who are always ready for challenges.