Cassandra : inverted index
Cassandra is the only NOSQL datastore I’m aware of, which is scalable, distributed, self replicating, eventually consistent, schema-less key-value store running on java which doesn’t have a single point of failure. HBase could also match most of these requirements, but Cassandra is easier to manage due to its tiny footprint.
The one thing Cassandra doesn’t do today is indexing columns.
Lets take a specific example to explain the problem. Lets say there are 100 rows in the datastore which have 5 columns each. If you want to find the row which says “Service=app2â€, you will have to iterate one row at a time which is like full database scan. In a 100 row datastore if only one row had that particular column, it could take on an average about 50 rows before you find your data.
While I’m sure there is a good reason why this doesn’t exist yet, the application inserting the data could build such an inverted index itself even today. Here is an example of how a table of inverted index would look like.
If you want to find the “status†of all rows where “Service=app2â€, all you have to do is find the list of keys by making a single call to this table. The second call would be to get all the columns values for that row. Even if you have 100 different rows in a table, finding that one particular row, matching your search query, could now be done in two calls.
Of course there is a penalty you have to pay. Every time you insert one row of data, you would also have to insert multiple rows to build the inverted index. You would also have to update the inverted index yourself if any of the column values are updated or deleted. Cassandra 0.5.0 which was recently released has been benchmarked to insert about 10000 rows per second on a 4 core server with 2GB of RAM. If you have an average of 5 columns per row, that is about 1.5k actual row inserts per second (that includes 5 rows of inserts/updates required for an inverted index). For more throughput you always have an option to add more servers.
Facebook and Digg are both extensively using Cassandra in their architectures. Here are some interesting reading materials on Cassandra if you’d like to explore more.
- Digg: Looking to the future of Cassandra
- Facebook: Structured storage system on a P2P Network
- Jonathan Ellis’ cassandra reading list
- An example of how “delicous†schema would look like in cassandra : asenchi
- Cassandra : Articles and presentations
- Getting started
- WTF is a supercolumn
- Cassandra internals
Comments
But I could be wrong, and would appreciate if you could shed a little light on why you think it can still do 10k per second.
Note that I'm not saying this invalidates the NoSQL approach. I'm pretty well known as a NoSQL *advocate* myself. It's just something people have to be aware of as they're working through their scale/feature/CAP requirements.
The solution I came up with, addresses a particular problem I was having with using cassandra in managing a glorified service state registry which I was building. I wanted Cassandra’s availability and partition-tolerance features and didn’t really care about its scalability because I knew what my dataset size would be.
The implementation doesn’t require “read-extend-writeâ€. You can just “write†to the column-family directly because I know what key to add. I also don’t have any use case to to a scan entire index… all I need is last N rows to find something I need from an index. As far as Cassandra is concerned, that table is just a huge ordered list of rows, so I don’t think there should be a performance issue.
I think you could make an argument that indexes could be out of sync from data at times since they are not in the same column Family, but again the application I’m building is ok with minor issues like that.
My understanding is that what I’m doing is not too far away from how others are using Cassandra. Some sample schema designs from Evan @ twitter shows that they might also use similar way of indexing data using Cassandra. http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/
rkt
I have been doing something not exactly the same but it is in effect an index for an intranet search engine that i have been migrating to cassandra from oracle.
My implementation supports what i'd like to say is "full-text" search, but is still in early beta.
@Jeff some valid points there especially on the concurrent updates when i first attempted to create a simple index i ran into those issues but simple code changes and additions solved those issues.
My solution was to maintain a column family for index logs where i keep track of what is being written with other useful info including a status of success or failure. Since its a search engine which is continually running i simply use a few methods in a class to periodically check for error statuses in the log. Each success log is deleted at some point and i check my column families to see if any rows or columns were written that are marked as failed in the log. If no record is found in the column family the value should have been written then i retry and then continue to check for inconsistency in other column families related to the failed update and try to correct them. It's extra work but it has worked out quite well to so far. My data set is only 100 GB right now but it's growing day by day and i've hammered out quite a lot of issues and she's fairly stable to this end and quite fast one searches which is my priority :-)