MongoDB: Migration from Mysql at Wordnik
I had the opportunity to listen to Tony Tam at MongoSF talking about why and how they moved Wordnik from Mysql to MongoDB. Both the Slides and the Video of the talk are attached to the end of this post.
Wordnik is a new kind of “word†repository which is much more “current†than traditional dictionaries. In addition to showing words which have not yet reached mainstream, they give tons of other related information and statistics about any given word.
They had started looking for alternatives to Mysql after they hit 4billion rows in MyISAM and started noticing locks on tables exceeding 10s of seconds during inserts. They wanted to insert 1500 “values†per second and fell short of it even with elaborate workarounds.
The core requirement for a replacement was very simple. The datastore they were about to replace was RO-centric at the rate of 10000 reads for every write, they had almost no application logic in database (no stored procedures), there were no PrivateKey/ForiegnKey to worry about and consistency wasn’t an absolute must.
Of all the NoSQL solutions they tried, and they eventually decided to go with MongoDB.
After repeated failures of using virtual servers, the final MongoDB server configuration was a non-virtualized 32GB server instance with 2x4 cores connected to external FC/SAN storage for high disk throughput. Since the datastructure they were planning to move from Mysql was not heavily relational, they were able to construct a hierarchical representation of the same data for MongoDB.
To make the migration easy, they made extensive effort to make the application work with both Mysql and MongoDB. They wanted the ability to change the backend datastore with just a flip of the switch. To avoid too much complications, they decided to freeze writes to Mysql during migration without disabling the website entirely, so users were able to use the application with minimal impact (since the application didn’t require too much writes to begin with). The 4billion rows migration took one whole day. During the data migration process they were able to execute up to 100000 inserts per second [ I hope I got my notes right :) ]
At the end of the migration they slowly moved frontend application servers one node at a time to MongoDB datastore. They did find some issues (which they expected), and continued tweaking the setup until they got a stable, well tuned system, ready for all the write load.
Things learnt from the move
- MongoDB’s read was so good that memcached layer wasn’t needed anymore
- “Key†names for each object in a row takes up space. If you use key name which is 10 bytes long, then by 1000 inserts you would have wasted 10000 bytes just on the key name. Based on Tony’s estimate MongoDB used up 4 times the storage they had used in Mysql.
- Writing queries to read and write data is much more readable. JSON is much more structured than some of the SQL queries.
- MongoDB is a resource intensive system. It needs a lot of CPU, memory and disk. Sharding should be investigated for larger datastores. [ From what I heard auto-sharding is not exactly fully production ready though… but things could have changed since I researched this ]
- Every “document†in MongoDB has a 4MB size limit by default.
All 20 of the MongoSF slides are up which you can find here.
Comments
Well, first of all, I suppose I need to have at least one blog entry per year, so this one just makes it. For the last few months, I've been putting more and more data structures into MongoDB instead of MySQL, and I just put a task into the backlog to...