I knew there was something called â€œOrientDBâ€, but didnâ€™t know much about it until I went through these slides. Here is what I learned in one sentence. Its a easy to install NoSQL(schemaless) datastore, with absolutely no configuration required, supports ACID transactions, it can be used as a document store, a graph store and a key value store, it can be queried using SQL-like and JSON syntax, supports indexing and triggers and its been benchmarked to do 150000 inserts using commodity hardware. Thatâ€™s a lot of features.
Humor is not what this website is about, but sometimes it doesnâ€™t matter how the message is wrapped to get it across some brains. Iâ€™m a big NoSQL fan, but I also understand where some of the specific implementations are weak. I have nothing against MongoDB, but this is just too funny not to share.
A few of us joined in at the new Twitter office in downtown SF (right next to Moscone Center) and were for the first time shown what Twitter is doing about â€œTwitter Annotationsâ€. We probably created the first set of 3rd party applications around this new API. During the Hackathon I spent some time to wear my â€œScalable web architectureâ€ hat to think what I could learn from this experience which Iâ€™ve summarized below.
Twitter annotations from a developerâ€™s view point is just an extension to existing APIs which now allows posting of additional structured content along with â€œtweetâ€. The content stays within the context of the tweet and will be retweeted/shared automatically with the main tweet. Twitter has some recommendations on how the annotations should be structured, for example they were talking about â€œtypeâ€ which sounded very much like Open Graphâ€™s â€œtype/categoryâ€ concept, with the difference that Twitter has left the field open for any kind of â€œtypeâ€ users want. Facebook, if I remember right, had strongly recommended users to use a small set of â€œcategories/typesâ€ which they published. Twitter accepted these annotations in multiple formats of which I tried the â€œsimpleâ€ and â€œJSONâ€ protocols. The â€œJSONâ€ way was the most recommended/used medium of annotation during the whole hackathon. While annotation structure (using JSON) did allow multiple â€œtypesâ€ in the same tweet, there were a few limitations which were slightly constricting. The first big one was that the current implementation allowed only 512 bytes in the annotations field. The second limitation was that the structure, while its JSON, it only supported a few levels of depth in the structured annotation. This was extremely restrictive for the use case I was trying to hack up.
There were a few things I learnt during the whole 32 hour experience. The first one was that twitter had actually hosted these half baked APIs on http://api.twitter.com and http://www.twitter.com, which Iâ€™m glad to say is still accessible using my account from outside twitterâ€™s buildings. Of course the hackers(we) had to be white-listed to get access to use it, but from an operations view point this is extremely gutsy since one bad ACL code fragment could expose a lot of uncooked APIs to the whole world. This approach of testing is not new to Twitter and is frequently used for A/B testing in newer (more agile) organizations around the world.
The second was the fact that while Cassandra is in use at twitter, they donâ€™t use it as the primary datastore for all the tweets (yet). They have other uses for it which they didnâ€™t elaborate. The version of Cassandra they use is close to 0.6.2 which just got released. It also looked (from my discussions with one engineer) like cassandra treated rack-awareness and datacenter-awareness in a slightly different way. In the previous documentations I read, they both were the same for all practical purposes. In other words, I need to research this a little more since optimizations in this area can boost Cassandra’s performance across datacenters.
The third was that while Twitter uses cutting edge tools for a lot of different things, they donâ€™t have service discovery nailed yet. They are playing with zookeeper, and I believe they will use that eventually, but its not there yet. This by itself is amazing because without service discovery, the management of configuration and rolling out configuration changes becomes centralized which has its own advantages and disadvantages. At the organization I work, we are playing with cassandra as a service publication/discovery tool for monitoring and consuming services. The short discussion I had with twitter folks about using cassandra in such a way validated the work work Iâ€™m doing with cassandra. But Iâ€™m still puzzled why others are not thinking about cassandra (or other eventually-consistent datastore) for service discovery. It sounded like Zookeeper might be an overkill for my organization, but I should take a look at it again.
The fifth and final Iâ€™d like to write about is the hackthon itself. Its amazing how Twitter organized this hackathon, got a group of hackers to play with their new APIs and gave them ability to demo their hacks to the likes of Paul Graham and Ron Conway. In return they got very interesting product-ideas and use-cases for a feature which is still unpolished and unreleased. But more importantly they also got a bunch of hackers to intentionally and unintentionally break the feature and discover some serious and some very annoying bugs. They also got feedback on what does and doesnâ€™t resonates with developers. In a way this is similar to what some other organizations (including Google) already do with their alpha/beta program, but nothing beats the velocity of hacking up 10 to 20 almost-ready products around a brand new feature in less than 32 hours.
- Twitter annotations API
- Eary look at Twitterâ€™s Annotations
- Realtime SemanticWeb with Twitter Annotations
- Annotations Hackfest
- What Twitter Annotations mean
- Flagthis : Using existing semantic data to annotate links in twitter
P.S.: Iâ€™m terribly sorry for spamming my twitter followers who were bombarded with twitter test messages for two days. Next time Iâ€™ll pick a test account 🙂
I had the opportunity to listen to Tony Tam at MongoSF talking about why and how they moved Wordnik from Mysql to MongoDB. Both the Slides and the Video of the talk are attached to the end of this post.
Wordnik is a new kind of â€œwordâ€ repository which is much more â€œcurrentâ€ than traditional dictionaries. In addition to showing words which have not yet reached mainstream, they give tons of other related information and statistics about any given word.
They had started looking for alternatives to Mysql after they hit 4billion rows in MyISAM and started noticing locks on tables exceeding 10s of seconds during inserts. They wanted to insert 1500 â€œvaluesâ€ per second and fell short of it even with elaborate workarounds.
The core requirement for a replacement was very simple. The datastore they were about to replace was RO-centric at the rate of 10000 reads for every write, they had almost no application logic in database (no stored procedures), there were no PrivateKey/ForiegnKey to worry about and consistency wasnâ€™t an absolute must.
Of all the NoSQL solutions they tried, and they eventually decided to go with MongoDB.
After repeated failures of using virtual servers, the final MongoDB server configuration was a non-virtualized 32GB server instance with 2×4 cores connected to external FC/SAN storage for high disk throughput. Since the datastructure they were planning to move from Mysql was not heavily relational, they were able to construct a hierarchical representation of the same data for MongoDB.
To make the migration easy, they made extensive effort to make the application work with both Mysql and MongoDB. They wanted the ability to change the backend datastore with just a flip of the switch. To avoid too much complications, they decided to freeze writes to Mysql during migration without disabling the website entirely, so users were able to use the application with minimal impact (since the application didnâ€™t require too much writes to begin with). The 4billion rows migration took one whole day. During the data migration process they were able to execute up to 100000 inserts per second [ I hope I got my notes right 🙂 ]
At the end of the migration they slowly moved frontend application servers one node at a time to MongoDB datastore. They did find some issues (which they expected), and continued tweaking the setup until they got a stable, well tuned system, ready for all the write load.
Things learnt from the move
- MongoDBâ€™s read was so good that memcached layer wasnâ€™t needed anymore
- â€œKeyâ€ names for each object in a row takes up space. If you use key name which is 10 bytes long, then by 1000 inserts you would have wasted 10000 bytes just on the key name. Based on Tonyâ€™s estimate MongoDB used up 4 times the storage they had used in Mysql.
- Writing queries to read and write data is much more readable. JSON is much more structured than some of the SQL queries.
- MongoDB is a resource intensive system. It needs a lot of CPU, memory and disk. Sharding should be investigated for larger datastores. [ From what I heard auto-sharding is not exactly fully production ready thoughâ€¦ but things could have changed since I researched this ]
- Every â€œdocumentâ€ in MongoDB has a 4MB size limit by default.
All 20 of the MongoSF slides are up which you can find here.
Iâ€™ll have a more detailed post about what I think about it in a few days, but until then chew on these slides from bit.ly who also uses it to power its backend datastore.
- Interesting news items for the week
- Presentations, talks and opinions
- Pregel â€“ Googleâ€™s Graph DB infrastructure
- The first public talk (as far as I know) on Pregel is scheduled to happen at Sigmod2010
- Spanner â€“ Googleâ€™s plan to build a single wordwide cluster
- Itegrating Apache Mahout with Lucense and Solr
- Redis overview
- Thoughts on Drizzle and Thoughts on thoughts on Drizzle
- From NoSQL Live in Boston
- Riak and Cassandra â€“ this started a sh!t storm which has since cooled down..
- Pregel â€“ Googleâ€™s Graph DB infrastructure
NoSQL solutions have one thing in common. They are generally designed for horizontal scalability. So its no wonder that lot of applications in the â€œtwitterâ€ world have picked NoSQL based datastores for their persistence layer. Here is a collection of these apps from MyNoSQL blog.
- Twitter uses Cassandra
- MusicTweets used Redis [ Ref ] â€“ The site is dead, but you can still read about it
- Tstore uses CouchDB
- Retwis uses CouchDB
- Retwis-RB uses Redis and Sinatra ?? – No idea what sinatra is. Will have to look into it. [ Update: Sinatra is not a DB store ]
- Floxee uses MongoDB
- Twidoop uses Hadoop
- Swordfish built on top of Tokyo Cabinet comes with a twitter clone app with it.
- Tweetarium uses Tokyo Cabinet
Do you know of any more ?
Cassandra is the only NOSQL datastore Iâ€™m aware of, which is scalable, distributed, self replicating, eventually consistent, schema-less key-value store running on java which doesnâ€™t have a single point of failure. HBase could also match most of these requirements, but Cassandra is easier to manage due to its tiny footprint.
The one thing Cassandra doesnâ€™t do today is indexing columns.
Lets take a specific example to explain the problem. Lets say there are 100 rows in the datastore which have 5 columns each. If you want to find the row which says â€œService=app2â€, you will have to iterate one row at a time which is like full database scan. In a 100 row datastore if only one row had that particular column, it could take on an average about 50 rows before you find your data.
While Iâ€™m sure there is a good reason why this doesnâ€™t exist yet, the application inserting the data could build such an inverted index itself even today. Here is an example of how a table of inverted index would look like.
If you want to find the â€œstatusâ€ of all rows where â€œService=app2â€, all you have to do is find the list of keys by making a single call to this table. The second call would be to get all the columns values for that row. Even if you have 100 different rows in a table, finding that one particular row, matching your search query, could now be done in two calls.
Of course there is a penalty you have to pay. Every time you insert one row of data, you would also have to insert multiple rows to build the inverted index. You would also have to update the inverted index yourself if any of the column values are updated or deleted. Cassandra 0.5.0 which was recently released has been benchmarked to insert about 10000 rows per second on a 4 core server with 2GB of RAM. If you have an average of 5 columns per row, that is about 1.5k actual row inserts per second (that includes 5 rows of inserts/updates required for an inverted index). For more throughput you always have an option to add more servers.
Facebook and Digg are both extensively using Cassandra in their architectures. Here are some interesting reading materials on Cassandra if youâ€™d like to explore more.
- Digg: Looking to the future of Cassandra
- Facebook: Structured storage system on a P2P Network
- Jonathan Ellisâ€™ cassandra reading list
- An example of how â€œdelicousâ€ schema would look like in cassandra : asenchi
- Cassandra : Articles and presentations
- Getting started
- WTF is a supercolumn
- Cassandra internals