May 31, 2010

@twitter annotations : What I learnt at the hackfest….

A few of us joined in at the new Twitter office in downtown SF (right next to Moscone Center) and were for the first time shown what Twitter is doing about  “Twitter Annotations”. We probably created the first set of 3rd party applications around this new API. During the Hackathon I spent some time to wear my “Scalable web architecture” hat to think what I could learn from this experience which I’ve summarized below.Twitter

Twitter annotations from a developer’s view point is just an extension to existing APIs which now allows posting of additional structured content along with “tweet”. The content stays within the context of the tweet and will be retweeted/shared automatically with the main tweet. Twitter has some recommendations on how the annotations should be structured, for example they were talking about “type” which sounded very much like Open Graph’s “type/category” concept, with the difference that Twitter has left the field open for any kind of “type” users want. Facebook, if I remember right, had strongly recommended users to use a small set of “categories/types” which they published. Twitter accepted these annotations in multiple formats of which I tried the “simple” and “JSON” protocols. The “JSON” way was the most recommended/used medium of annotation during the whole hackathon. While annotation structure (using JSON) did allow multiple “types” in the same tweet, there were a few limitations which were slightly constricting. The first big one was that the current implementation allowed only 512 bytes in the annotations field. The second limitation was that the structure, while its JSON, it only supported a few levels of depth in the structured annotation. This was extremely restrictive for the use case I was trying to hack up.

There were a few things I learnt during the whole 32 hour experience. The first one was that twitter had actually hosted these half baked APIs on http://api.twitter.com and http://www.twitter.com, which I’m glad to say is still accessible using my account from outside twitter’s buildings. Of course the hackers(we) had to be white-listed to get access to use it, but from an operations view point this is extremely gutsy since one bad ACL code fragment could expose a lot of uncooked APIs to the whole world. This approach of testing is not new to Twitter and is frequently used for A/B testing in newer (more agile) organizations around the world.

The second was the fact that while Cassandra is in use at twitter, they don’t use it as the primary datastore for all the tweets (yet). They have other uses for it which they didn’t elaborate. The version of Cassandra they use is close to 0.6.2 which just got released. It also looked (from my discussions with one engineer) like cassandra treated rack-awareness and datacenter-awareness in a slightly different way. In the previous documentations I read, they both were the same for all practical purposes. In other words, I need to research this a little more since optimizations in this area can boost Cassandra's performance across datacenters.

The third was that while Twitter uses cutting edge tools for a lot of different things, they don’t have service discovery nailed yet. They are playing with zookeeper, and I believe they will use that eventually, but its not there yet. This by itself is amazing because without service discovery, the management of configuration and rolling out configuration changes becomes centralized which has its own advantages and disadvantages. At the organization I work, we are playing with cassandra as a service publication/discovery tool for monitoring and consuming services. The short discussion I had with twitter folks about using cassandra in such a way validated the work work I’m doing with cassandra. But I’m still puzzled why others are not thinking about cassandra (or other eventually-consistent datastore) for service discovery. It sounded like Zookeeper might be an overkill for my organization, but I should take a look at it again.

The fourth was that Twitter employs a lot of very smart/passionate people who are amazingly good with most of the network/application stack. They dove into things like browser/javascript/cookies and then switched to dissecting network traffic using sniffer tools to debug a possible Oauth implementation bug and other weird things. This just adds to the current popular wisdom that scalability/stability/security can’t be done in small little silos.

The fifth and final I’d like to write about is the hackthon itself. Its amazing how Twitter organized this hackathon, got a group of hackers to play with their new APIs and gave them ability to demo their hacks to the likes of Paul Graham  and Ron Conway. In return they got very interesting product-ideas and use-cases for a feature which is still unpolished and unreleased. But more importantly they also got a bunch of hackers to intentionally and unintentionally break the feature and discover some serious and some very annoying bugs. They also got feedback on what does and doesn’t resonates with developers. In a way this is similar to what some other organizations (including Google) already do with their alpha/beta program, but nothing beats the velocity of hacking up 10 to 20 almost-ready products around a brand new feature in less than 32 hours.

References

P.S.: I’m terribly sorry for spamming my twitter followers who were bombarded with twitter test messages for two days. Next time I’ll pick a test account :)

May 26, 2010

MongoDB: Migration from Mysql at Wordnik

I had the opportunity to listen to Tony Tam at MongoSF talking about why and how they moved Wordnik from Mysql to MongoDB.  Both the Slides and the Video of the talk are attached to the end of this post.

Wordnik is a new kind of “word” repository which is much more “current” than traditional dictionaries. In addition to showing words which have not yet reached mainstream, they give tons of other related information and statistics about any given word.logo_347x88

They had started looking for alternatives to Mysql after they hit 4billion rows in MyISAM and started noticing locks on tables exceeding 10s of seconds during inserts. They wanted to insert 1500 “values” per second and fell short of it even with elaborate workarounds.

The core requirement for a replacement was very simple. The datastore they were about to replace was RO-centric at the rate of 10000 reads for every write, they had almost no application logic in database (no stored procedures), there were no PrivateKey/ForiegnKey to worry about and consistency wasn’t an absolute must.

Of all the NoSQL solutions they tried, and they eventually decided to go with MongoDB.

After repeated failures of using virtual servers, the final MongoDB server configuration was a non-virtualized 32GB server instance with 2x4 cores connected to external FC/SAN storage for high disk throughput. Since the datastructure they were planning to move from Mysql was not heavily relational, they were able to construct a hierarchical representation of the same data for MongoDB.

To make the migration easy, they made extensive effort to make the application work with both Mysql and MongoDB. They wanted the ability to change the backend datastore with just a flip of the switch. To avoid too much complications, they decided to freeze writes to Mysql during migration without disabling the website entirely, so users were able to use the application with minimal impact (since the application didn’t require too much writes to begin with).  The 4billion rows migration took one whole day. During the data migration process they were able to execute up to 100000 inserts per second [ I hope I got my notes right :) ]

At the end of the migration they slowly moved frontend application servers one node at a time to MongoDB datastore. They did find some issues (which they expected), and continued tweaking the setup until they got a stable, well tuned system, ready for all the write load.

Things learnt from the move

  • MongoDB’s read was so good that memcached layer wasn’t needed anymore
  • “Key” names for each object in a row takes up space. If you use key name which is 10 bytes long, then by 1000 inserts you would have wasted 10000 bytes just on the key name. Based on Tony’s estimate MongoDB used up 4 times the storage they had used in Mysql.
  • Writing queries to read and write data is much more readable. JSON is much more structured than some of the SQL queries.
  • MongoDB is a resource intensive system. It needs a lot of CPU, memory and disk. Sharding should be investigated for larger datastores. [ From what I heard auto-sharding is not exactly fully production ready though… but things could have changed since I researched this ]
  • Every “document” in MongoDB has a 4MB size limit by default.

All 20 of the MongoSF slides are up which you can find here.

May 23, 2010

MongoDB : The bit.ly implementation

I was at the MongoSF conference few weeks ago, and 10gen just hosted one at NY as well.I was taken aback by the simplicity and the hope MongoDB provides.

I’ll have a more detailed post about what I think about it in a few days, but until then chew on these slides from bit.ly who also uses it to power its backend datastore.

May 20, 2010

Google Storage : What it really is…

Yesterday Google formally announced Google Storage to a few (5000?) of us at Google I/O. Here is the gist of this as I see it from the various discussions/talks I attended.

To begin with, I have to point out that there is almost nothing new in what Google has proposed to provide. Amazon has been doing this for years with its S3.  The key difference is that if you are a google customer you won’t have to look elsewhere for storage services like this one.

Lets get the technical details out

  • Its tries to implement a Strong consistency model (CA of the CAP: Consistent and Available). Which means data you store is automatically replicated in a consistent way across multiple datacenter
    • Currently it replicates to multiple locations within US. In future it does plan to replicate across continents.
    • Currently there are no controls to control how replication happens or to where. They plan to learn from usage in beta period and develop controls over time.
  • There are two basic building blocks for objects Google Code Labs
    • Buckets – Containers
        All objects are stored in flat container. However, the tools understand “/” and “*” (wild cards) and does the right thing when used correctly
    • Objects – objects/files inside those containers
  • Implements RESTful APIs (GET/PUT/POST/DELETE/HEAD/etc)
    • All resources are identified by a URI
  • No theoretical size limit of Buckets or containers. However a 100GB limit per account would be imposed during beta phase.
  • Its of course built on Google very well tested, scalable, highly available infrastructure
  • It provides multiple, flexible authentication and sharing models
    • Does support standard public/private key based auth
    • Will also have integration with some kind of groups which will allow object to be shared with  or controlled by with multiple identities.
    • ACLs can be applied to both Buckets and Objects
      • Buckets
        • Control who can list objects
        • Who can create/delete objects
        • Who can read/write into the bucket
      • Objects
        • Who can read
        • Who can read/write
  • Tools
    • There were two tools mentioned during the talk
      • GS manager looks like a web application which allows an admin to manage this service
      • GS util is more like the shell tools AWS provides for S3.
        • As I mentioned before GS util accepts wild card
          • So something like this is possible
            • gsutil cp gs://gs2010/*  /home/rkt/gs2010
  • The service was created with “data liberation” as one of the goals. As shown by the previous command it takes just one line of code to transfer all of your data out.
  • Resume feature (if connection breaks during a big upload) is not available yet, but thats on the roadmap.
  • Groups feature was discussed a lot, but its not ready in the current release
  • Versioning feature is not available. Wasn’t clear if its on the roadmap or how long before its implemented.

Few other notes.

  • Its not clear how this plays with the “storage service” Google currently provides for Gmail/Docs storage. From what I heard this is not related to that storage service at all and there are no plans to integrate it.
  • The service is free in beta period to all developers who get access to it, but when its released it will follow a pricing model similar others in the industry. The pricing model is already published on their website
  • The speakers and the product managers didn’t comment on whether storage access from google apps engine would be charged (or at what rate)
  • They do provide MD5 signatures as a way of verifying if an object on the client is same as the object on the server, but its not used storing files itself. (So MD5 collisions issue shouldn’t be a problem)
  • US Navy is already using this service with about 80TB of data on Google Storage, and from what I heard they looked pretty happy talking about it.

I suspect this product will be in beta for a while before they release it out in the open.