<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments for Scalable web architectures</title>
	<atom:link href="http://www.royans.net/arch/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.royans.net/arch</link>
	<description>Building reliable, high performance, highly available clusters</description>
	<lastBuildDate>Sat, 13 Mar 2010 15:42:19 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<atom:link rel="hub" href="http://pubsubhubbub.appspot.com" />
	<atom:link rel="hub" href="http://superfeedr.com/hubbub" />
		<item>
		<title>Comment on The Reddit problem: Learning from mistakes by Scalability links for March 13th 2010 &#124; Scalable web architectures</title>
		<link>http://www.royans.net/arch/reddit-learning-from-mistakes/comment-page-1/#comment-737</link>
		<dc:creator>Scalability links for March 13th 2010 &#124; Scalable web architectures</dc:creator>
		<pubDate>Sat, 13 Mar 2010 15:42:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.royans.net/arch/reddit-learning-from-mistakes/#comment-737</guid>
		<description>[...] I suspected, Reddit is now moving to Cassandra – Another [...]</description>
		<content:encoded><![CDATA[<p>[...] I suspected, Reddit is now moving to Cassandra – Another [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Automated, faster, repeatable, scalable deployments by Scalability links for March 13th 2010 &#124; Scalable web architectures</title>
		<link>http://www.royans.net/arch/automated-faster-repeatable-scalable-deployments/comment-page-1/#comment-736</link>
		<dc:creator>Scalability links for March 13th 2010 &#124; Scalable web architectures</dc:creator>
		<pubDate>Sat, 13 Mar 2010 15:39:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.royans.net/arch/automated-faster-repeatable-scalable-deployments/#comment-736</guid>
		<description>[...] Automated, faster, repeatable, scalable deployments&#160; [...]</description>
		<content:encoded><![CDATA[<p>[...] Automated, faster, repeatable, scalable deployments&#160; [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Cassandra : inverted index by Royans</title>
		<link>http://www.royans.net/arch/cassandra-inverted-index/comment-page-1/#comment-697</link>
		<dc:creator>Royans</dc:creator>
		<pubDate>Sun, 07 Mar 2010 16:57:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.royans.net/arch/cassandra-inverted-index/#comment-697</guid>
		<description>Jeff,

The solution I came up with, addresses a particular problem I was having with using cassandra in managing a glorified service state registry which I was building. I wanted Cassandra’s availability and partition-tolerance features and didn’t really care about its scalability because I knew what my dataset size would be.

The implementation doesn’t require “read-extend-write”. You can just “write” to the column-family directly because I know what key to add. I also don’t have any use case to to a scan entire index… all I need is last N rows to find something I need from an index. As far as Cassandra is concerned, that table is just a huge ordered list of rows, so I don’t think there should be a performance issue. 

I think you could make an argument that indexes could be out of sync from data at times since they are not in the same column Family, but again the application I’m building is ok with minor issues like that.

My understanding is that what I’m doing is not too far away from how others are using Cassandra. Some sample schema designs from Evan @ twitter shows that they might also use similar way of indexing data using Cassandra. http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/

rkt</description>
		<content:encoded><![CDATA[<p>Jeff,</p>
<p>The solution I came up with, addresses a particular problem I was having with using cassandra in managing a glorified service state registry which I was building. I wanted Cassandra’s availability and partition-tolerance features and didn’t really care about its scalability because I knew what my dataset size would be.</p>
<p>The implementation doesn’t require “read-extend-write”. You can just “write” to the column-family directly because I know what key to add. I also don’t have any use case to to a scan entire index… all I need is last N rows to find something I need from an index. As far as Cassandra is concerned, that table is just a huge ordered list of rows, so I don’t think there should be a performance issue. </p>
<p>I think you could make an argument that indexes could be out of sync from data at times since they are not in the same column Family, but again the application I’m building is ok with minor issues like that.</p>
<p>My understanding is that what I’m doing is not too far away from how others are using Cassandra. Some sample schema designs from Evan @ twitter shows that they might also use similar way of indexing data using Cassandra. <a href="http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/" rel="nofollow">http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/</a></p>
<p>rkt</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Cassandra : inverted index by Jeff Darcy</title>
		<link>http://www.royans.net/arch/cassandra-inverted-index/comment-page-1/#comment-696</link>
		<dc:creator>Jeff Darcy</dc:creator>
		<pubDate>Sun, 07 Mar 2010 16:32:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.royans.net/arch/cassandra-inverted-index/#comment-696</guid>
		<description>The problem of updating an inverted index is much worse than merely an extra update per new cell.  The first extra bit of pain is dealing with concurrent updates; a simple read-extend-write of the index now falls prey to the classic problem of N-1/N concurrent updates being lost.  The next bit of pain is dealing with a really big index.  If you have a few thousand rows, let alone a billion, updates of a simple index are going to be extremely inefficient.  Now you&#039;ll have to maintain your index as a B-tree or some such using multiple rows.  Now you get to the most painful part of all: concurrent multi-row updates.  If you want to support something as simple as SQL&#039;s &quot;ORDER BY x LIMIT n&quot; for a potentially large dataset with just about any of the simpler distributed key/value or column stores, it&#039;s going to be rather unpleasant.

Note that I&#039;m not saying this invalidates the NoSQL approach.  I&#039;m pretty well known as a NoSQL *advocate* myself.  It&#039;s just something people have to be aware of as they&#039;re working through their scale/feature/CAP requirements.</description>
		<content:encoded><![CDATA[<p>The problem of updating an inverted index is much worse than merely an extra update per new cell.  The first extra bit of pain is dealing with concurrent updates; a simple read-extend-write of the index now falls prey to the classic problem of N-1/N concurrent updates being lost.  The next bit of pain is dealing with a really big index.  If you have a few thousand rows, let alone a billion, updates of a simple index are going to be extremely inefficient.  Now you&#8217;ll have to maintain your index as a B-tree or some such using multiple rows.  Now you get to the most painful part of all: concurrent multi-row updates.  If you want to support something as simple as SQL&#8217;s &#8220;ORDER BY x LIMIT n&#8221; for a potentially large dataset with just about any of the simpler distributed key/value or column stores, it&#8217;s going to be rather unpleasant.</p>
<p>Note that I&#8217;m not saying this invalidates the NoSQL approach.  I&#8217;m pretty well known as a NoSQL *advocate* myself.  It&#8217;s just something people have to be aware of as they&#8217;re working through their scale/feature/CAP requirements.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on The Reddit problem: Learning from mistakes by Royans</title>
		<link>http://www.royans.net/arch/reddit-learning-from-mistakes/comment-page-1/#comment-548</link>
		<dc:creator>Royans</dc:creator>
		<pubDate>Fri, 05 Mar 2010 14:59:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.royans.net/arch/reddit-learning-from-mistakes/#comment-548</guid>
		<description>Jereny, Thanks for the comments. I&#039;m not surprised by the plan to move to Cassandra. Best of luck !</description>
		<content:encoded><![CDATA[<p>Jereny, Thanks for the comments. I&#8217;m not surprised by the plan to move to Cassandra. Best of luck !</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on The Reddit problem: Learning from mistakes by Jereny Edberg</title>
		<link>http://www.royans.net/arch/reddit-learning-from-mistakes/comment-page-1/#comment-534</link>
		<dc:creator>Jereny Edberg</dc:creator>
		<pubDate>Fri, 05 Mar 2010 07:46:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.royans.net/arch/reddit-learning-from-mistakes/#comment-534</guid>
		<description>Hey there.  I&#039;m the guy that wrote the post.  Your assessment is spot on.

We didn&#039;t use libketama because it didn&#039;t exist at the time (a few years ago) and also because it just wasn&#039;t something people thought about.  We could switch to it now, but as you said, we are going to drop memcacheDB, so there is no need to &quot;fix&quot; it.  Also, the version of the memcached protocol that is the frontend for memcacheDB is old, and doesn&#039;t support the binary protocol.  Most of the memcached libraries that support ketama require the binary protocol (not all of them).

So yeah, it looks like we&#039;ll probably go to Cassandra in the very near feature.</description>
		<content:encoded><![CDATA[<p>Hey there.  I&#8217;m the guy that wrote the post.  Your assessment is spot on.</p>
<p>We didn&#8217;t use libketama because it didn&#8217;t exist at the time (a few years ago) and also because it just wasn&#8217;t something people thought about.  We could switch to it now, but as you said, we are going to drop memcacheDB, so there is no need to &#8220;fix&#8221; it.  Also, the version of the memcached protocol that is the frontend for memcacheDB is old, and doesn&#8217;t support the binary protocol.  Most of the memcached libraries that support ketama require the binary protocol (not all of them).</p>
<p>So yeah, it looks like we&#8217;ll probably go to Cassandra in the very near feature.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on The Reddit problem: Learning from mistakes by Royans</title>
		<link>http://www.royans.net/arch/reddit-learning-from-mistakes/comment-page-1/#comment-523</link>
		<dc:creator>Royans</dc:creator>
		<pubDate>Tue, 02 Mar 2010 18:47:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.royans.net/arch/reddit-learning-from-mistakes/#comment-523</guid>
		<description>Based on what I see, I think Reddit datastore predates libketama, and that they knew about this issue long time ago. BTW Reddit blog has entries from 2005 and looks like libketama was announced in 2007. But thats just a guess and would love for someone to confirm it.</description>
		<content:encoded><![CDATA[<p>Based on what I see, I think Reddit datastore predates libketama, and that they knew about this issue long time ago. BTW Reddit blog has entries from 2005 and looks like libketama was announced in 2007. But thats just a guess and would love for someone to confirm it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on The Reddit problem: Learning from mistakes by Igor</title>
		<link>http://www.royans.net/arch/reddit-learning-from-mistakes/comment-page-1/#comment-522</link>
		<dc:creator>Igor</dc:creator>
		<pubDate>Tue, 02 Mar 2010 18:42:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.royans.net/arch/reddit-learning-from-mistakes/#comment-522</guid>
		<description>I did not quite get Reddit mistake, what stoped them from using libketama as a consistent hashing for MemcacheDB?
It&#039;s part of almost every memcached client now, so undoubtedly they must have had it.</description>
		<content:encoded><![CDATA[<p>I did not quite get Reddit mistake, what stoped them from using libketama as a consistent hashing for MemcacheDB?<br />
It&#8217;s part of almost every memcached client now, so undoubtedly they must have had it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Scalable logging using Syslog by Martin Scholl</title>
		<link>http://www.royans.net/arch/scalable-logging-using-syslog/comment-page-1/#comment-500</link>
		<dc:creator>Martin Scholl</dc:creator>
		<pubDate>Sat, 27 Feb 2010 07:21:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.royans.net/arch/scalable-logging-using-syslog/#comment-500</guid>
		<description>There is a great syslog replacement: http://www.rsyslog.com/ . rsyslogd is part of most distributed and is open source software. 

Rsyslog doesn&#039;t use UDP so there is no problem of loosing logs. Even more, rsyslog buffers on disk when an upstream rsyslogd is down. All in all, rsyslog is the better syslog.</description>
		<content:encoded><![CDATA[<p>There is a great syslog replacement: <a href="http://www.rsyslog.com/" rel="nofollow">http://www.rsyslog.com/</a> . rsyslogd is part of most distributed and is open source software. </p>
<p>Rsyslog doesn&#8217;t use UDP so there is no problem of loosing logs. Even more, rsyslog buffers on disk when an upstream rsyslogd is down. All in all, rsyslog is the better syslog.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on More on Amazon S3 versioning (webinar) by Andy, CloudBerry Lab</title>
		<link>http://www.royans.net/arch/more-on-amazon-s3-versioning-webinar/comment-page-1/#comment-499</link>
		<dc:creator>Andy, CloudBerry Lab</dc:creator>
		<pubDate>Sat, 27 Feb 2010 06:43:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.royans.net/arch/more-on-amazon-s3-versioning-webinar/#comment-499</guid>
		<description>I always enjoy learning what other people think about Amazon Web Services and how they use them. Check out my very own tool CloudBerry Explorer that helps manage S3 on Windows . It is a freeware. http://s3.cloudberrylab.com/ New version comes with Versioning support!</description>
		<content:encoded><![CDATA[<p>I always enjoy learning what other people think about Amazon Web Services and how they use them. Check out my very own tool CloudBerry Explorer that helps manage S3 on Windows . It is a freeware. <a href="http://s3.cloudberrylab.com/" rel="nofollow">http://s3.cloudberrylab.com/</a> New version comes with Versioning support!</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic page generated in 0.499 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2010-03-14 11:39:56 -->
