<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Cassandra : inverted index</title>
	<atom:link href="http://www.royans.net/arch/cassandra-inverted-index/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.royans.net/arch/cassandra-inverted-index/</link>
	<description>Building reliable, high performance, highly available clusters</description>
	<lastBuildDate>Mon, 12 Jul 2010 14:35:55 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com" />
	<atom:link rel="hub" href="http://superfeedr.com/hubbub" />
		<item>
		<title>By: Royans</title>
		<link>http://www.royans.net/arch/cassandra-inverted-index/comment-page-1/#comment-697</link>
		<dc:creator>Royans</dc:creator>
		<pubDate>Sun, 07 Mar 2010 16:57:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.royans.net/arch/cassandra-inverted-index/#comment-697</guid>
		<description>Jeff,

The solution I came up with, addresses a particular problem I was having with using cassandra in managing a glorified service state registry which I was building. I wanted Cassandra’s availability and partition-tolerance features and didn’t really care about its scalability because I knew what my dataset size would be.

The implementation doesn’t require “read-extend-write”. You can just “write” to the column-family directly because I know what key to add. I also don’t have any use case to to a scan entire index… all I need is last N rows to find something I need from an index. As far as Cassandra is concerned, that table is just a huge ordered list of rows, so I don’t think there should be a performance issue. 

I think you could make an argument that indexes could be out of sync from data at times since they are not in the same column Family, but again the application I’m building is ok with minor issues like that.

My understanding is that what I’m doing is not too far away from how others are using Cassandra. Some sample schema designs from Evan @ twitter shows that they might also use similar way of indexing data using Cassandra. http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/

rkt</description>
		<content:encoded><![CDATA[<p>Jeff,</p>
<p>The solution I came up with, addresses a particular problem I was having with using cassandra in managing a glorified service state registry which I was building. I wanted Cassandra’s availability and partition-tolerance features and didn’t really care about its scalability because I knew what my dataset size would be.</p>
<p>The implementation doesn’t require “read-extend-write”. You can just “write” to the column-family directly because I know what key to add. I also don’t have any use case to to a scan entire index… all I need is last N rows to find something I need from an index. As far as Cassandra is concerned, that table is just a huge ordered list of rows, so I don’t think there should be a performance issue. </p>
<p>I think you could make an argument that indexes could be out of sync from data at times since they are not in the same column Family, but again the application I’m building is ok with minor issues like that.</p>
<p>My understanding is that what I’m doing is not too far away from how others are using Cassandra. Some sample schema designs from Evan @ twitter shows that they might also use similar way of indexing data using Cassandra. <a href="http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/" rel="nofollow">http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/</a></p>
<p>rkt</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff Darcy</title>
		<link>http://www.royans.net/arch/cassandra-inverted-index/comment-page-1/#comment-696</link>
		<dc:creator>Jeff Darcy</dc:creator>
		<pubDate>Sun, 07 Mar 2010 16:32:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.royans.net/arch/cassandra-inverted-index/#comment-696</guid>
		<description>The problem of updating an inverted index is much worse than merely an extra update per new cell.  The first extra bit of pain is dealing with concurrent updates; a simple read-extend-write of the index now falls prey to the classic problem of N-1/N concurrent updates being lost.  The next bit of pain is dealing with a really big index.  If you have a few thousand rows, let alone a billion, updates of a simple index are going to be extremely inefficient.  Now you&#039;ll have to maintain your index as a B-tree or some such using multiple rows.  Now you get to the most painful part of all: concurrent multi-row updates.  If you want to support something as simple as SQL&#039;s &quot;ORDER BY x LIMIT n&quot; for a potentially large dataset with just about any of the simpler distributed key/value or column stores, it&#039;s going to be rather unpleasant.

Note that I&#039;m not saying this invalidates the NoSQL approach.  I&#039;m pretty well known as a NoSQL *advocate* myself.  It&#039;s just something people have to be aware of as they&#039;re working through their scale/feature/CAP requirements.</description>
		<content:encoded><![CDATA[<p>The problem of updating an inverted index is much worse than merely an extra update per new cell.  The first extra bit of pain is dealing with concurrent updates; a simple read-extend-write of the index now falls prey to the classic problem of N-1/N concurrent updates being lost.  The next bit of pain is dealing with a really big index.  If you have a few thousand rows, let alone a billion, updates of a simple index are going to be extremely inefficient.  Now you&#8217;ll have to maintain your index as a B-tree or some such using multiple rows.  Now you get to the most painful part of all: concurrent multi-row updates.  If you want to support something as simple as SQL&#8217;s &#8220;ORDER BY x LIMIT n&#8221; for a potentially large dataset with just about any of the simpler distributed key/value or column stores, it&#8217;s going to be rather unpleasant.</p>
<p>Note that I&#8217;m not saying this invalidates the NoSQL approach.  I&#8217;m pretty well known as a NoSQL *advocate* myself.  It&#8217;s just something people have to be aware of as they&#8217;re working through their scale/feature/CAP requirements.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Scalability updates for Feb 18, 2010 &#124; Scalable web architectures</title>
		<link>http://www.royans.net/arch/cassandra-inverted-index/comment-page-1/#comment-448</link>
		<dc:creator>Scalability updates for Feb 18, 2010 &#124; Scalable web architectures</dc:creator>
		<pubDate>Fri, 19 Feb 2010 05:17:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.royans.net/arch/cassandra-inverted-index/#comment-448</guid>
		<description>[...] backend for Lucene ? This seems to solve the problem of building reverse index on cassandra which I previously blogged [...]</description>
		<content:encoded><![CDATA[<p>[...] backend for Lucene ? This seems to solve the problem of building reverse index on cassandra which I previously blogged [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Ellis</title>
		<link>http://www.royans.net/arch/cassandra-inverted-index/comment-page-1/#comment-327</link>
		<dc:creator>Jonathan Ellis</dc:creator>
		<pubDate>Sun, 07 Feb 2010 18:34:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.royans.net/arch/cassandra-inverted-index/#comment-327</guid>
		<description>@Royans Sorry, I read too fast.  You&#039;re right. :)</description>
		<content:encoded><![CDATA[<p>@Royans Sorry, I read too fast.  You&#8217;re right. <img src='http://www.royans.net/arch/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Royans</title>
		<link>http://www.royans.net/arch/cassandra-inverted-index/comment-page-1/#comment-326</link>
		<dc:creator>Royans</dc:creator>
		<pubDate>Sun, 07 Feb 2010 16:46:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.royans.net/arch/cassandra-inverted-index/#comment-326</guid>
		<description>@jothanan : I assumed that if 1 row of data needs 5 more rows of inserts/updates that would make a total of 6 rows of inserts/updates required per row of data inserted. 10k/6 =~ 1.5K. 
But I could be wrong, and would appreciate if you could shed a little light on why you think it can still do 10k per second.
</description>
		<content:encoded><![CDATA[<p>@jothanan : I assumed that if 1 row of data needs 5 more rows of inserts/updates that would make a total of 6 rows of inserts/updates required per row of data inserted. 10k/6 =~ 1.5K.<br />
But I could be wrong, and would appreciate if you could shed a little light on why you think it can still do 10k per second.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Royans</title>
		<link>http://www.royans.net/arch/cassandra-inverted-index/comment-page-1/#comment-325</link>
		<dc:creator>Royans</dc:creator>
		<pubDate>Sun, 07 Feb 2010 16:43:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.royans.net/arch/cassandra-inverted-index/#comment-325</guid>
		<description>@adrian : Of course. Cassandra looked like a better fit at this time for me, but I&#039;m sure you can do this on any kind of key-value store.
</description>
		<content:encoded><![CDATA[<p>@adrian : Of course. Cassandra looked like a better fit at this time for me, but I&#8217;m sure you can do this on any kind of key-value store.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Ellis</title>
		<link>http://www.royans.net/arch/cassandra-inverted-index/comment-page-1/#comment-323</link>
		<dc:creator>Jonathan Ellis</dc:creator>
		<pubDate>Sun, 07 Feb 2010 14:55:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.royans.net/arch/cassandra-inverted-index/#comment-323</guid>
		<description>Correction: the 10k rows/s number _is_ with 5-column rows.</description>
		<content:encoded><![CDATA[<p>Correction: the 10k rows/s number _is_ with 5-column rows.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Adrian</title>
		<link>http://www.royans.net/arch/cassandra-inverted-index/comment-page-1/#comment-320</link>
		<dc:creator>Adrian</dc:creator>
		<pubDate>Sun, 07 Feb 2010 10:44:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.royans.net/arch/cassandra-inverted-index/#comment-320</guid>
		<description>I&#039;m pretty sure you can also implement inverted indexes with HBase, too.</description>
		<content:encoded><![CDATA[<p>I&#8217;m pretty sure you can also implement inverted indexes with HBase, too.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
