Dynamic infrastructure can be a challenging if apps and scripts canâ€™t keep up with them. At Ingenuity we observed this problem when we started moving towards virtualization and SOA (service oriented architecture). Remembering server names became impractical, and error-free manual configuration changes became impossible.
While there are some tools which solve parts of this specific problem, we couldnâ€™t find any opensource tool which could be used to both publish and discover state of a system in a distributed, scalable and fault-tolerant way. Zookeeper which comes pretty close to what we needed was a fully consistent system which was not designed to be used across multiple data centers over high latency, unstable network connections. We wanted a system which could not only be up during network outages, but also sync up the state from different data-centers when they are connected.
We built a few different tools to solve our scalability problems, one of which is a tool called Cfmap which we are opensourcing today to help others facing the same problem.
So what is cfmap ?
Built over cassandra, cfmap is designed to be a scalable, eventually consistent and a fault tolerant repository of state information. It provides a set of REST APIs and UIs to both publish and discover state of an entity or a group of entities with great ease. The APIs are so simple that you would most probably be writing your own custom agents for the various servers and processes than use the agent which comes bundled with the tool.
We have been using cfmap internally for a few months and the results are promising. Here is an example of how cfmapâ€™s dashboard looks like on our networkÂ (Iâ€™ve changed some names to protect the actual resource names).Â Here is another dashboard which is running out in the public which you can use today as a demo.
Cfmap provides the ability to quickly drill down to a filtered set of servers or apps, and the ability to export them quickly into a json or a shell greppable format. The two export formats available today makes dashboarding and scripting a trivial task.
The image above shows a small set of applications from our dev cluster which is sorted in the order of the time when the apps were deployed. In addition to showing the host names, status of the apps, and version information, it also lists the time when the app sent the last heartbeat. What is not visible here is that it also keeps track of certain changes in a â€œlogâ€ which could be used to understand historical changes of a particular record over time.
While REST interface is easy to use, you could choose to use the commandline tool â€œcfqueryâ€, which comes with Cfmap to interact with cfmap. Cfquery could be used to both publish and search resultsâ€¦ lets look at some example.
Here is an example of how one could extract a list of all the hosts in cfmap.
rkt@torque:~/cc/cfmap/bin$ ./cfquery.pl -c view | grep ":host=" | cut -d':' -f2host=team50host=ip-10-205-15-124host=torquehost=anorien
Here is a more elaborate example which shows up cfmap output could be used as parts of other scripts. In this case, the query just specifies a host â€œanorienâ€ in the query. The result is a dump of all the properties set by the host. A few extra commands can quickly help you extract specific properties which can then be used as a data-source for other tools (like monitoring).
rkt@torque:~/cc/cfmap/bin$ ./cfquery.pl -c view -p "host=anorien" 52cb892bc339f286bacbcfe9a8c8b4a6:port=0 52cb892bc339f286bacbcfe9a8c8b4a6:stats_host_freeswap=1999 52cb892bc339f286bacbcfe9a8c8b4a6:host=anorien 52cb892bc339f286bacbcfe9a8c8b4a6:stats_host_loadavg5m=0 52cb892bc339f286bacbcfe9a8c8b4a6:cfqversion=1.1 52cb892bc339f286bacbcfe9a8c8b4a6:stats_host_estconn=4 52cb892bc339f286bacbcfe9a8c8b4a6:type=host 52cb892bc339f286bacbcfe9a8c8b4a6:deployed_date=1286217400 52cb892bc339f286bacbcfe9a8c8b4a6:version=2.6.32-00007-g56678ec 52cb892bc339f286bacbcfe9a8c8b4a6:ip=127.0.0.1 52cb892bc339f286bacbcfe9a8c8b4a6:stats_host_pscount=101 52cb892bc339f286bacbcfe9a8c8b4a6:stats_host_loadavg15m=0 52cb892bc339f286bacbcfe9a8c8b4a6:stats_host_loadavgentities=0 52cb892bc339f286bacbcfe9a8c8b4a6:stats_host_freemem=3 52cb892bc339f286bacbcfe9a8c8b4a6:stats_host_loadavg1m=0 52cb892bc339f286bacbcfe9a8c8b4a6:stats_host_totalswap=1999 52cb892bc339f286bacbcfe9a8c8b4a6:stats_host_totalmem=501 52cb892bc339f286bacbcfe9a8c8b4a6:appname=os 52cb892bc339f286bacbcfe9a8c8b4a6:checked=1286331427 rkt@torque:~/cc/cfmap/bin$ ./cfquery.pl -c view -p "host=anorien" | grep stats_host_totalmem 52cb892bc339f286bacbcfe9a8c8b4a6:stats_host_totalmem=501 rkt@torque:~/cc/cfmap/bin$ ./cfquery.pl -c view -p "host=anorien" | grep stats_host_totalmem | cut -d'=' -f2 501
Few other interesting features
- Schema-less design â€“ cfmap provides a simple schema-less datastore which could be used for other purposes as well. Please note that since it was designed to maintain â€œstateâ€ (instead of a simple datastore API), it has a few reserved keywords which have a special meaning.
- Low overhead to add/delete cfmap nodes â€“ Since its built over cassandra, adding new nodes is as simple as adding new cassandra servers.
- Configurable – The recommended way of setting up cfmap for production use would be to host cfmap (which comes with a bundled version of cassandra) on 3 or more servers. Then put them all under a single DNS entry (round robin) and let DNS loadbalancing take care of the rest.
- If you want an even more redundancy, setup something like haproxy on each of the nodes which could also monitor and redirect traffic to alternate cfmap nodes when failures (or GCs) happen.
- The default setup doesnâ€™t enforce consistency during reads or writes to facilitate smooth operation even during massive network or system failures. But if you wish, you could tweak the consistency, replication requirements based on your needs.
Cfmap is still a very early prototype, but we welcome others to play with it.