December 20, 2009

Cassandra for service registry/discovery service

My last post was about my struggle to find a good distributed ESB/Service-discovery solution built over open source tools which was simple to use and maintain. Thanks to reader comments (Dan especially) and some other email exchanges, it seems like building a custom solution is unavoidable if I really want to keep things simple.

Dan suggested that I could use DNS to find seed locations for config store which would work very well in a distributed network. If security wasn’t a concern this seed location could have been on S3 or SimpleDB, but the requirement that it needs to be secured on internal infrastructure forced me to investigate simple replicated/eventually-consistent databases which could be hosted internally in different data centers with little or no long term administration cost.

My search lead me to investigate a few different NOSQL options

But the one I finally settled on as a possible candidate was Cassandra. Unlike some of the others, since our application platform was based on java, Cassandra was simple to install and setup. The fact that Facebook used it to store 50TB of data across 150 servers helped us convince it was stable as well.

The documentation on this project isn’t as much as I would have liked, but I did get it running pretty fast. Building a service registry/discovery service on top of this is whats next on my mind..

More on Cassandra

If you are interested in learning more about cassandra I’ll recommend you to listen to this talk by Avinash Lakshman (facebook) and read a few other posts listed here.

Cassandra: Articles

  • Cassandra -- Getting Started: Cassandra data model from a Java perspective

  • Using Cassandra’s Thrift interface with Ruby

  • Cassandra and Thrift on OS X: one, two, three

  • Looking to the Future with Cassandra: how Digg migrated their friends+diggs data set to Cassandra from mysql

  • Building Scalable Databases: Denormalization, the NoSQL Movement and Digg

  • WTF is a SuperColumn? An Introduction to the Cassandra Data Model

  • Meet Scalandra: Scala wrapper for Cassandra

  • Cassandra and Ruby: A Love Affair? - Engine Yard's walk-through of the Cassandra gem

  • Up and Running with Cassandra: featuring data model examples of a Twitter clone and a multi-user blog, and ruby client code

  • Facebook Engineering notes and Cassandra introduction and LADIS 2009 paper

  • ArchitectureInternals

  • ArchitectureGossip

  • Cassandra: Presentations

  • Cassandra in Production at Digg from NoSQL East 09

  • Introduction to Cassandra at OSCON 09

  • What Every Developer Should Know About Database Scalability: presentation on RDBMS vs. Dynamo, BigTable, and Cassandra

  • IBM Research's scalable mail storage on Cassandra

  • NOSQL Video - NOSQL Slides: More on Cassandra internals from Avinash Lakshman.

  • Video of a presentation about Cassandra at Facebook: covers the data model of Facebook's inbox search and a lot of implementation details. Prashant Malik and Avinash Lakshman presenting.

  • Cassandra presentation at sigmod: mostly the same slides as above

  • If any of you have worked on cassandra, please let me know how that has been working out for you.

    November 28, 2009

    Service registry (ESB) for scalable web applications.

    This blog post is the result of my futile attempts at understanding how others have solved the problem of automatic service discovery.

    How do organizations, which have a huge collection of custom applications, design scalable web application without having to hardcode server names and port numbers in the configuration file ?

    I believe the terminology I’m hinting at is either called a “Service Registry” or a “Enterprise Service Bus” which is part of the whole SOA (Service oriented architecture) world.

    The organization I work for, has a limited multicast based service announcement/discovery infrastructure, but not widely used across all the applications. In addition to the fact that multicast routing can become complicated (ACL management of yet another set of network addresses), its also not a solution where parts of applications could reside on the cloud. Amazon’s EC2, for instance, doesn’t allow multicast traffic between its hosts.

    Microsoft Azure .Net services (part of the Azure platform) provides a service registry (or proxy) to which internal and external applications can connect to, to provide and consume services. The design doesn’t allow direct connection between provider and consumer which makes this a massive single point of failure. I agree any kind of registry has this problem, but the fact that you need this service to be up all the time to make every single request makes it extremely risky.

    There are at least two open source projects by Apache foundation which touch this topic. One is Service Mix and the other is Synapse. I’ve also spoken to a few commercial entities who do this, and wasn’t really convinced that its worth spending big bucks for this.

    The reason why I’m puzzled is that I don’t see a single open source project being widely used in this area. I’ve been a REST kind of a guy and hate the whole SOAP world… and was hoping there would be something simple which could be setup and used without pulling my hair off.

    My contacts at some of the larger organizations seem to make me believe that they all use proprietary solutions for their infrastructure.  Is this really such a complex problem ?

    If you have an ESB in your network, please do drop in a line about it.

    October 26, 2009

    Amazon launches Relational Database as a service

    Its hard to say why I didn't see this coming even after amazon launched hadoop and hive as a service. There is a huge demand for a relational database on the cloud and a lot of middlemen are raking a lot of cash.

    Today Amazon launches something they call "RDS". The service basically provides a AWS managed Mysql instance which includes backups and api based option to add more nodes. The cheapest RDS service is about 11 cents an hour, and the most expensive is a little over 3 dollars an hour.

    From cloudave
    Amazon RDS is nothing but a MySQL 5.1 database instance that is exclusively for a particular user and can be accessed via a single API call. The user gets all the capabilities of MySQL database with an additional ability to scale up based on the needs. It rids the customers of any need for time consuming database administration tasks. The patches are applied automatically with the database backed up automatically with a possibility for the user to set the retention period.

    August 25, 2009

    Private clouds: By Amazon

    A few days ago I blogged about how VMware is going to do a huge push into “private clouds” around the VMware 2009 conference. But little did we know that Amazon had something up its sleeve as well. It has announced it today.

    AWS now supports creation of Virtual Private Cloud with private address space (including RFC 1918) which could be locked down by a VPN connection to only your organization only. You still get most of the benefit of Amazons cheap hardware pricing but you get to lock down the infrastructure for security reasons.

    Regardless of how you see it, this is huge for IT and the developer community. Some may love it, and I’m sure some will be pretty angry at Amazon for trying to commodities security and making it look as if network security was as simple as that.

    With VMware’s announcements next week, there is no doubt in my mind that the next one year at least there will be a significant push towards “private clouds”.

    July 28, 2009

    Vmware: internal + external “private” clouds

    Last year at VMware 2008 conference they discussed something called diagram-private-cloud-fed-large[1]vCloud. Before VMware 2009, they will be announcing external clouds providers around that platform which allow internal clouds to extend their infrastructure to external clouds.

    What VMware is trying to do is allow organizations to build cloud networks with the possibility of moving few services/components to external clouds.

    vCenterServer_TN_2[1]To make this seamless the VMware vSphere tool which currently allows internal cloud management will be enhanced to allow it to manage instances on the external cloud almost as if it was part of the internal cloud. In fact if the rumors are true, they will even support vMotion across to external cloud providers (restrictions apply).

    VMware is getting on the cloud bandwagon in a big way… just take a look at the number of sessions they have mentioning cloud.

    July 20, 2009

    Scalability for dummies

    Alex Barrera has a very interesting post about how frustrating it is to figure out that you have a problem and how much trouble it is to fix it after the product is live.

    I am there, I am suffering the redesign phase (twice now). It’s hard, it’s lonely, it’s discouraging and frustrating, but it needs to be done. I just wrote this post so that outsiders can get a glimpse of what is it to be there and how it affects the whole company, not just the tech department. Scalability problems aren’t something you can discard as being ONLY technical, it’s roots might be technical but its effects will shake the whole company.

    The post actually reminded me of this post by Marton Trencseni which talks about the phases of improvement in scalability architecture a product goes through and digs a little deeper into what could have prevented it.

    For startups or for companies which are just prototyping new ideas, their goals can sometimes be just to “test the waters”, and the product owners don’t care much about allocating/reserving enough resources for engineering to build it the “right way”. And there is a good reason for that as well, since a lot of prototypes ( or early products ) die off soon after launch because of issues completely unrelated to scalability. Its hard to figure out if you want to test the idea first or devote a lot of resources to get it done the right way from day one.

    July 17, 2009

    Weekend reading material



    1. redis - : Redis is a key-value database. It is similar to memcached but the dataset is not volatile, and values can be strings, exactly like in memcached, but also lists and sets with atomic operations to push/pop elements.
    2. HBase - : HBase is the Hadoop database. Its an open-source, distributed, column-oriented store modeled after the Google paper, Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop.
    3. Sherpa -
    4. BigTable -
    5. voldemort - It is basically just a big, distributed, persistent, fault-tolerant hash table. For applications that can use an O/R mapper like active-record or hibernate this will provide horizontal scalability and much higher availability but at great loss of convenience. For large applications under internet-type scalability pressure, a system may likely consists of a number of functionally partitioned services or apis, which may manage storage resources across multiple data centers using storage systems which may themselves be horizontally partitioned. For applications in this space, arbitrary in-database joins are already impossible since all the data is not available in any single database. A typical pattern is to introduce a caching layer which will require hashtable semantics anyway. For these applications Voldemort offers a number of advantages
    6. Dynamo - A highly available key-value storage system that some of Amazon’s core services use to provide an “always-on” experience.  To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.
    7. Cassandra - Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store. Cassandra brings together the distributed systems technologies from Dynamo and the data model from Google's BigTable. Like Dynamo, Cassandra is eventually consistent. Like BigTable, Cassandra provides a ColumnFamily-based data model richer than typical key/value systems.
    8. Hypertable - : Hypertable is an open source project based on published best practices and our own experience in solving large-scale data-intensive tasks.
    9. HDFS - The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets.


    1. Eventually Consistent 
    2. Bunch of Links at bytepawn
    3. Fallacies of Distributed Computing

    Is Yahoo launching a cloud storage solution : MObStor

    While rest of the world is busy with Microsoft and Google, Yahoo might be preparing to launch MObStor which they tout as the “Unstructured Storage for the Internet”.

    While comparing MObStor to the various Cloud computing storage solutions already available, Navneet Joneja, Sr. Product Manager, mentions Facebook’s Haystack to describe MObStor’s architectural design. He also points out that though Facebook’s Haystack was optimized to store photographs, MObStor was optimized for diverse set of use cases.

    Its a REST based, browser-accessible API with simple security model, and content-agnostic storage features. The focus of this service seems to be fast, reliable, secure storage with the option of allowing customers to layer additional services on top of the core service. It claims it would be optimized for high performance and high availability (who doesn’t).

    Here is more from the Yahoo Developer Network Blog

    Facebook's Haystack is based on commodity storage. While MObStor does support commodity storage, it doesn't require it. Instead, we have a storage-layer abstraction we call the ObjectStore. The ObjectStore encapsulates the key storage operations we need to perform, and allows us to have many underlying physical object stores. This allows us to mix, for example, filer-based storage with commodity storage. The upper layers have the routing intelligence that determines which ObjectStore a given piece of data is stored in. However, like Haystack, we do support high request rates using our own optimized ObjectStore written to run on commodity hardware - with one important difference. While Haystack identifies every object using a 64-bit photo key, all objects in MObStor are accessible through logical (i.e., client-supplied) URLs, not object IDs.

    In MObStor, the storage layer maintains the mapping between logical URLs and physical storage, and can use any means to do so - the implementation is encapsulated within the storage layer. Needless to say, this operation is a potential performance bottleneck, so we've carefully optimized the algorithms used and the hardware that they run on.

    Now with Amazon, Google, Microsoft and Yahoo in the picture the last shoe might finally drop.

    July 12, 2009

    CouchDB scalability issues ? (updated)

    Jonathan Ellis’ started up a storm when he posted an entry about CouchDB about 6 months ago. He questioned some of CouchDB’s claims and made an attempt to warn users who don’t understand practical issues around CoughDB very well.

    After reading his post and some comments, it looked like he was specifically concerned about CouchDB’s ability to distribute/scale a growing database automatically.

    Its a good read if you are curious. He has stopped accepting comments on his blog, but that shouldn’t stop you from commenting here.

    As Jan pointed out in the comments Jonathan is assuming “distributed” means “auto-scaling” which is not true.

    -- links from the blog.. Cassandra dynomite Sawzall Pig

    July 11, 2009

    Cloud architecture: Notes from an Amazon talk


    Some notes from a talk I was at. Didn’t get time to write it in detail. But hey, something is better than nothing… right ?

    Design for failure

            - handle failure
                - use elastic ip addresses
                - use multiple amazon ec2 availability zones
                - create mutliple database slaves across multiple zones
                - use real-time monitoring (amazon cloudwatch)
                - use amazon EBS for persistent file system
                    - snapshot database to s3 (from ebs)

    Loose coupling sets you free

            - independent components
            - design everything as a blackbox
            - de-coupling for hybrid models
            - loadbalance-clusters
            - use SQS as buffers to queue messages. Allows elasticity

    Design for dynamism

            - build for changes in infrastructure 
                - Don't assume health of fixed location of components
                - Use designs that are resilient to reboot and re-launch
                - Bootstrap your instances
                - Enable dynamic configuration
                    - Enable Self discovery
                        (puttet, chef, ?)
                - Free auto-scaling features (by triggers)
                - Use Elastic loadbalancing on multiple layers
                - Use configurations in SimnpleDB to bootstrap instances

    Build security in every layerider encrypted files

            - Physical is free
            - network is easy
                - Can confider app to talk to only web and db layer... etc. Everything can be automated.
            - The rest can be added
                - Create distinct Security Groups for each Amazon EC2 cluster
                - Use group-based rules for controlling access between layers
                - Restrict external access to specific IP ranges
                - Encrypt data "at-rest" in Amazon S3
                - Encrypt data "in-transit" (SSL)
                - Consider encrypted file systems in EC2 for sensitive data

    Dont fear constraints

            - More RAM ?
                Distribute load across machines. Shared distributed cache
            - Better IOPS on my database ?
                Multiple read0only / sharding / DB clustering
            - Your server has better config ?
                Implement elasticity
            - Static IP ?
                Boot script for software reconfiguration from SimpleDB


    Leverage aws storage solutions

            - Amazon S3: for large static objects (whats the maximum size per object ?)
            - Amazon Coudfront: content distribution
            - Amazon SimpleDB: simple data indexing/querying
            - Amazon EC2 local disk drive: transient data
            - Amazon EBS: RDBMS persistent storage + S3 Snapshots

    Is Percentage of company Bloggers/Twitter_users inversely proportional to Company size ?

    Small organizations often keep a very active online presence . For them, any news is good news. Larger organizations however try to be opposite of that and control information.

    What I’ve been trying to understand is how in spite of all that companies like Google and Microsoft still manage to have a huge online presence.

    No.Of.TwitterAccounts= (Size.Of.Company)^(1/2)  ?

    For example today, Google announced a list of all of its Twitter accounts in one page. 

    How do they do it ?

    General - our central account - for Blogger fans - user tips & updates - news, tips, tricks on our visual image search - latest headlines via Google News - from our feed reader team - news & notes from Google's personalized homepage - news of interest to students using Google - for YouTube fans - en Espanol - solutions for IT and workplace productivity

    Geo-related - Google SketchUp news - SketchUp's 3D Warehouse - 3D modeling to build your favorite places - Earth & Maps tools for nonprofits & orgs - uses, tips, mashups -Android app for the night sky

    Ads-related - for online publishers - looking out for AdWords questions and tech issues - Google Guide for AdWords Help Forum - insights for website effectiveness - re building display ads - for retail advertisers - for U.K. tech advertisers - for German AdWords customers - for German ad agencies - info for Portuguese-language publishers - AdWords news & tips in Russian - Spanish updates from the Inside

    AdWords blog - AdWords API tips

    Developer & technical - from our research scientists - Google Webmaster Central - latest updates for Google developer products - Data APIs provide a standard protocol for reading and writing web data - web apps run on Google infrastructure - our initiative for complete import/export of all data - about using Google Maps embedded in websites - Google's largest annual developer event

    Culture, People - notes from our @Google speaker series - the voice of Google recruiters

    Country or Region - Google activities in Australia & New Zealand - Google in Germany - Latin America (en Espanol) - Notes on Google policy issues in Italy

    July 04, 2009

    Cell phone speeds, reliability in US

    Novaram and PC World did a cell phone service provider test across the nation to compare the three big cell giants. 

    I was very shocked and surprised at how crappy the AT&T; wireless network’s reliability is in the city I live.  No wonder people have been constantly complaining about service problems.

    I wish Apple had gone with Verizon for iPhone… I’ve used verizon for years (before I switched to AT&T;) and was pretty happy with them.

    Novarum test results; click for full-size image.

    June 27, 2009

    Monitoring Cloud health

    Both Amazon and Google (and probably others as well) provide web pages which monitors its service status. The one which I go to, when I need to compare availability and to detect service problems is the one called Cloudstatus by Hyperic.

    They try to monitor most of the individual services provided by Google (Engine, Datastore, Memcache, Fetch) and Amazon (EC2, S3, SQS, SDB, FPS).

    On top of online graphs, you can also subscribe to twitter status updates which can be really helpful during a real outage.

    June 25, 2009

    BSET SearchEngine relevance test results

    A few days ago I started a tool called BSET – Blackbox Search engine Testing tool to evaluate how good Bing really is. If you watch the stats on the page, its clear which search engine is being consistently picked as the winner.

    The results were collected from 518 unique source IP addresses (some were just NATs from larger organizations). 251 users just executed 1 query each. 111 users executed 2 queries and rest executed more than that.

    A total of 808 results were submitted just for “standard web search” category and of that 44% of the submissions were in favor of Google. 32% of them were for Yahoo. Only about 28% results went for Microsoft’s new search engine “Bing”.

    Between Google and Yahoo, a user is 15% more likely to pick Google than Yahoo. Between Google and Bing, a user will pick Google 21% more frequently than Bing.yahoo200

    The results may not be staggering for folks who have been following search engine trends over the last few weeks, but for me, to see the results from this random test is surprising considering the amount of money Microsoft plans to pump into Bing’s advertisement. I wish I had done this test before Bing was launched to find out how different MSN is from Bing…

    So why is Google better ?

    google200Since search results were pulled using published search APIs from the search providers and because these search APIs may not always show the same results which users see on the real search page, it could be argued that these results may be inaccurate.

    Another problem I noticed is that different search engines behave differently when there are spelling errors in search. For example look at the results for “steven hakwing” ( was looking for Stephen Hawking) on the 3 search engines

    Bing  - Bing tells you that you could have spelt is wrong, and shows results for “steven hawking” instead.

    Yahoo – Yahoo warns me that I should probably correct my spelling to “Stephen Hawking” but shows the search results for “steven hakwing”

    Google – Google suggests that I could be looking for “Steven Hawking”, but actually shows me results for “Stephen Hawking” which is what I really wanted.

    Since I didn’t use spell-sugession APIs to correct the search terms before it was submitted, it could be argued that my tests are biased towards google which does auto-correction. But as an end-user, I could argue that that I want to see what I intended to type and not what I actually typed. I think the ability to predict what users are thinking is is one of the core reasons why Google has a lead over other search engines.

    And as for Bing’s cash-back plan, a friend of mine said that he’d be happy to use Bing to buy something.. as soon as he figures out what he really wants on Google or Yahoo.

    I welcome your comments or feedback, especially if you have ideas on how I could improve the tests.

    June 20, 2009

    Building BlackboxSET on GAE/java

    Last week I spent a few hours building a search engine testing tool called “BlackboxSET”. The purpose of the tool was to allow users to see search results from three different search providers and vote for the best set of results without knowing the source of the results. The hope was that the search engine which presents best set of results on the top of the page will stand out. What we found was interesting. Though Google’s search score aren’t significantly better than Yahoo’s or Bing’s, it is the current leader imageon BlackboxSET.

    But this post is about what it took me to build BlackboxSET on GAE which as you can see is a relatively simple application. The entire app was built in a few hours of late night hacking and I decided to use Google’s AppEngine infrastructure to learn a little more about GAE.

    Primary goals

    1. Ability to randomly show results from the three search engines
    2. Persist data collected after the user votes
    3. Report the results using a simple pie chart in real time if possible

    Backend design

    1. Each time the user does a new search, a random sequence is generated on the server which represents the order in how the user will see the results on the browser.
    2. When the user clicks on ‘Vote’ button, the browser will make a call to the server to log the result and to retrieve the source of search results from the server.

    Decisions and observations made while trying to build this on GAE

    1. Obviously using Java was not optional since I didn’t know python.
    2. And since I haven’t played with encrypted cookies, the decision was made to persist the randomized order in session object which looked pretty straight forward.
    3. Since the user sessions are relatively short and since session objects in GAE/java are persisted to memcache automatically, it was decided not to interact with memcache directly. This particular feature of GAE/java is not documented clearly, and from what I’ve heard from Google Engineers its something they don’t openly recommend to rely on. But it works and I have used in the past without any problems.
    4. When the voting results from the browser are sent to the server, the server logs it without any processing in a simple table in datastore. The plan was to keep sufficient information in these event logs so that if the app does get hacked/gamed, additional information in the event logs will help us filter out events which should be rejected. It unfortunately also means that to extract anything interesting from this data, one would have to spend a lot of computational resources to parse it.
    5. Google Chart API was used for graphing. This was a no brainer. But because GAE limits on the number of rows per datastore query to 1000, I had to limit the chart API to look at only last 1000 results. GAE now provides a “Task” feature which I think can be used offline processing but haven’t used it yet.

    Problems I ran into – I had designed the app to resist gaming, but was not adequately prepared for some of the other challenging problems related to horizontal scalability.

    1. The first problem was that processing 1000 rows of voting logs to generate graph for each person was taking upto 10 to 15 seconds on GAE infrastructure. The options I had to solve this problem was, to either reduce the log sample size requested from Datastore (something smaller than 1000), or to cache the results for a period of time so that not all users were impacted by the problem.  I went with the second option.
    2. The second problem was sort of a showstopper. Some folks were reporting inaccurate search results… in some cases there were duplicates with the same set of search results shown in two out of three columns. This was bad. And even more weird was the fact that it never happened when I was running the app on my desktop inside the GAE sandbox. Also mysterious was that the problems didn’t show up until the load started picking up  app (thanks to a few folks who twittered it out).
      1. The root cause of these issues could be due to the way I assumed the session objects are persisted and replicated in GAE/java. I assumed that when I persist an object in the apps session object, it is synchronously replicated to the memcache.
      2. I also assumed that if multiple instances of the app were brought up by GAE under heavy load, it will try to do some kind of sticky loadbalancing. Sticky loadbalacing is an expensive affair so on hindsight I should have expected this problem. However I didn’t know that GAE infrastructure will start loadbalancing across multiple instances even at 2 requests per second which seems too low.
      3. Since the randomization data cannot be stored in cookie (without encrypting), I had to store it on the server. And from the point when the user is presented with a set of search results, to the point when the user votes on it, it would be nice to keep the user on the same app instance. Since I GAE was switching users (was doing loadbalancing based on load)  I had to find a more reliable way to persist the randomization information.
      4. The solution I implemented was two fold. First I reduced the number of interactions between the browser and the backend server from 4 to 2 HTTP requests. This effectively reduced the probability of users switching app instances during the most critical part of the app’s operation . The second change was that I decided not to use Session object and instead used memcache directly to make this the randomization data persist a little more reliably.
      5. On hindsight, I think encrypted cookies would have been a better approach for this particular application. It completely side-steps the requirement of keeping session information on the server.

    I’m sure this is not the end of all the problems. If there is an update I’ll definitely post it here. If there are any readers who are curious about anything specific please let me know and I’ll be happy to share my experiences.

    June 17, 2009

    BlackboxSET – Blackbox Search Engine Testing

    The launch of Bing has shaken the Google Kingdom a little bit. I for oneimage have been doubting my own support for Google’s search engine. And I know others who swear by Yahoo’s search engine which is a trust I don’t share. To make such testing easier, I’ve spent a few hours last night to create a tool which allows you to search something against the 3 top search engines and lets you decide which one is the best. At the end of the exercise you should be able to find out if you are doing the right thing by sticking with your personal search engine.

    May the best search engine win.

    June 16, 2009

    Steps to migrate your webapp to AWS

    Most web applications needs at least the following services to be self sufficient. Computational power, storage, webserver/cdn, database,  messaging, loadbalancer and monitoring.

    Here is the tried and tested steps as recommended by AWS folks

    1. Move static web content to S3 storage first. Images, css stylesheets, javascript files, html, etc can all be moved to S3. Its easier to move some static content than others, so there See full size imagemight be some work required to understand how to breakup web content to move parts of it into the cloud.
    2. The content on S3 can be served by Amazon Cloudfront service which is Amazon’s CDN(content delivery network) service. Once you persist your data on S3, your users will get those objects from the S3 servers located closest to them.
    3. Move applications and webserver layer to the EC2 infrastructure. This step will require you to figure out how to automate deployments into cloud infrastructure
    4. Once your apps are in the cloud, you can start working on building your availability zones to make your infrastructure tolerant to failures of Amazon datacenters. For example if you have apps deployed across US and Europe, if the US datacenters have problems, European datacenters would be able to absorb the shock and keep your services available.
    5. Start using Amazons auto-scaling functionality to add/remove infrastructure automatically depending on the load on the system.
    6. The most complicated part might be moving your databases to the AWS cloud. If you plan to keep your databases on RDBMS (Mysql/Postgress) then you should try to EBS (Elastic Block Storage) and figure out how to take snapshots to S3. You should also try to figure out how to do DB replication across availability zones to keep your site available during single datacenter failures.
    7. At this point since most of your application components are in the cloud, you should be able to start using new amazon services to make your service even better. One possible example is SQS which allows frontend applications to queue requests for other parts of the application (or DB) for asynchronous processing.
    8. Investigate the possibility of moving more of the DB components to S3 and SimpleDB to reduce the need of RDBMS as much as possible. S3 is ideal for storing large objects while SimpleDB is ideal for small stubs of data. A lot of applications using these services , use them together.
    9. After your apps are all configured on aws, this would be a good time to setup monitoring. Amazon provides CloudWatch service which allows you to monitor your applications.

    Issues to worry about. Moving to the cloud can be full of small potholes. If you understand them and anticipate them it would be easier for you to move. Here are some, you should be careful about

    1. S3 service is “eventually consistent”. Which means that the data saved to S3 server may not be immediately available on read. Its also possible that if the same content is updated on two different S3 servers at the same time, one of the writes would be lost. This is not always bad, and if you understand it you will realize that there are ways around it.
    2. The loadbalancer service Amazon provides doesn’t support SSL.
    3. SimpleDB has per row max size limitation. This is why SimpleDB is better for keeping metadata which can be searched with reference to the complete data which could be kept in S3.

    Parts of this post was summarized from Jinesh’s talk at the “AWS Start-up Tour 2009”.

    June 15, 2009

    Opera Unite: web server built in ?

    Opera Logo

    There seems to be a lot of talk about “Opera Unite” launch and everyone is so pumped up about the new feature, “webserver built into the web browser”.

    This is just like twitter. I think it might be a great idea for a few, but for the masses it might turn out to be just over-boated hype. Most of us who have used a recent OS have sharing features and we have been always on the look out for better firewalls to block it. Now here comes a browser which wants to do the same thing, and for some reason doesn’t expect firewalls to impact it?

    Have all the security concerns gone away all of a sudden ? While the world is switching to a lighter OS and browser, Opera is trying to build a kitchen sink.

    That being said, I think its a bold step on Opera’s part, and I have to give credit for its “unique” idea, regardless of how useful I think its going to be.

    June 14, 2009

    Working with Google App engine’s datastore

    I heard a great set of Google App engine datastore related talks at the google I/O conference. I think this is one of the best out talks I heard which is now on Youtube. You should watch it if you are working with or planning to work with Google App Engine in the near future. Click on this link if you cant see the embedded video.


    May 27, 2009

    Google wave : Let the predictions begin…

    During the keynote today Google premiered something completely brand new called Google Wave. From the look of it looked like next-gen SMTP+XMPP protocol which allows email-like-msgs/instant-communication/collaboration using fully distributed architecture (similar to SMTP). The focus was on collaboration and notification.

    During the whole demo I was thinking just two things.

    1) Twitter is screwed  !
    2) Ditto facebook !

    The solution proposed has a side effect of trying to solve the spamming issue as well.

    The key here is that they are not releasing an app on which people can login when its launched … they are instead releasing a protocol and possibly working opensource server which users can deploy and get running quickly.

    …won’t happen overnight…. But if they build this into gmail which has a large adoption rate, it could become the next big hot thing pretty fast.

    More here….

    Google app engine review (Java edition)

    For the last couple of weekends I’ve been playing with Google App Engine, (Java edition) and was pleasantly surprised at the direction it has taken. I was also fortunate enough to see some Google Engineers talk on this subject at Google I/O which helped me a lot to compile all this information.

    But before I get into the details, I like to warn you that I’m not a developer, let alone a java developer. My experience with java has been limited to prototyping ideas and wasting time (and now probably yours too). appengine_lowres

    Developing on GAE isn’t very different from other Java based development environments. I used the eclipse plugin to build and test the GAE apps in the sandbox on my laptop. For most part everything you did before will work, but there are limitations introduced by GAE which tries to force you to write code which is scalable.

    1. Threads cant be created - But one can modify the existing thread state
    2. Direct network connections are not allowed – URLConnection can be used instead
    3. Direct file system writes not allowed. - Use Memory, memcache, datastore instead. ( Apps can read files which are uploaded as part of the apps)
    4. Java2D not allowed -  But certain Images API, Software rendering allowed
    5. Native Code not allowed-  Only pure Java libraries are allowed
    6. There is a JRE class whitelist which you can refer to to know which classes supported by GAE.

    GAE runs inside a heavily version of jetty/jasper servlet container currently using Sun’s 1.6 JVM (client mode). Most of what you would did to build a webapp world still applies, but because of limitations of what can work on GAE, the libraries and frameworks which are known to work should be explicitly checked for. If you are curious whether the library/framework you use for your webapp will work in GAE, check out this page for the official list of known/working options (will it play in app engine).

    Now the interesting part. Each request gets a maximum of 30 seconds in which it has to complete or GAE will throw an exception. If you are building a web application which requires large number of datastore operations, you have to figure out how to break requests into small chunks such that it does complete in 30 seconds. You also have to figure out how to detect failures such that clients can reissue the request if they fail.

    But this limitation has a silver lining. Though you are limited by how long a request can take to execute, you are not limited by the number of simultaneous requests currently (you can get to 32 simultaneous threads in free account, and can go up higher if you want to pay). Theoretically you should be able to scale horizontally to as many requests per second as you want.  There are few other factors, like how you architect your data in datastore, which can still limit how many operations per second you can do. Some of the other GAE limits are listed here.

    You have to use google’s datastore api’s to persist data to maximize GAE’s potential. You could still use S3, SimpleDB or your favorite cloud DB storage, but the high latency would probably kill your app first.

    The Datastore on GAE is where GAE gets very interesting and departs significantly from most traditional java webapp development experiences. Here are a few quick things which took me a while to figure out.

    1. Datastore is schemaless (I’m sure you knew this already)
    2. Its built over google’s BigTable infrastructure. (you knew this as well…)
    3. It looks like SQL, but don’t be fooled. Its so crippled that you won’t recognize it from two feet away. After a week of playing with GAE I know there are at least 2 to 3 ways to query this data, and the various syntaxes are confusing.  ( I’ll give an update once a figure this whole thing out)
    4. You can have Datastore generate keys for your entities, or you can assign it yourself. If you decide to create your own keys (which has its benefits BTW) you need to figure out how to build the keys in such a way that they don’t collide with unintentional consequences.
    5. Creation of “uniqueness” index is not supported.
    6. Nor can you do joins across tables. If you really need a join, you would have to do it at the app. I heard there are some folks coming out with libraries which can fake a relational data model over datastore… don’t have more information on it right now.
    7. The amount of datastore CPU (in addition to regular app CPU) you use is monitored. So if you create a lot of indexes, you better be ready to pay for it.
    8. Figuring out how to index your data isn’t rocket science. Single column indexes are automatically built for you. Multi-column indexes need to be configured in the app. GAE sandbox running on your desktop/laptop does figure out which indexes you need by monitoring your queries, so you may not have to do much for most part. When you upload the app, the config file instructing which index are required is uploaded with it. In GAE Python, there are ways to tell google not to index some fields
    9. Index creation on GAE takes a long time for some reason. Even for small tables. This is a known issue, but not a show stopper in my personal opinion
    10. Figuring out how to breakup/store/normalize/denormalize your data to best use GAE’s datastore would probably be one of the most interesting challenges you would have to deal with.
    11. The problem gets trickier if you have a huge amount of data to process in each request. There are strict CPU resource timeouts which currently look slightly buggy to me (or work in a way I don’t understand yet). If a single query takes over a few seconds (5 to 10) it generally fails for me. And if the same HTTP request generates a lot of datastore queries, there is a 30 second limit on the HTTP request after which the request would be killed.
    12. From what I understand datastore is optimized for reads and writes are expensive. Not only do indexes have to be updated, each write needs to be written to the disk before the operation is considered complete. That brings in physical limitations of how fast you can process data if you are planning to write a lot of data. Breaking data into multiple tables is probably a better way to go
    13. There is no way to drop a table or a datastore. You have to delete it 1000 rows at a time using you app currently. This is one of the biggest issues brought up by the developers and its possible it would be fixed soon.
    14. There is no way to delete an application either…
    15. There is a python script to upload large amount of data to the GAE datastore. Unfortunately, one needs to understand how the datamodel you designed for java app looks like in python world. This has been a blocker for me, but I’m sure I could have figured it out using google groups if I really wanted to.
    16. If I understand correctly the datastore (uses BigTable architecture) is built on top of 4 large bigtables.
    17. If I understand correctly, though GAE’s datastore architecture supports transactions, its Master-Master replication across multiple-datacenters has some caveats which needs to be understood. GAE engineers explained that 2 Phase comit and Paxos are better at handling data consistencies across datacenters but suffers from heavy latency because of which its not used for GAE’s datastore currently. They hope/plan to give some kind of support for a more reliable data consistency mechanism.

    Other than the Datastore, I’d like to mention a few other key things which are important central elements of the GAE architecture.

    1. Memcache support is built in. I was able to use it within a minute of figuring out that its possible. Hitting datastore is expensive and if you can get by with just using memcache, thats what is recommended.
    2. Session persistence exist and its persisted to both memcache and datastore. However its disabled by default and GAE engineers recommend to stay away from it. Managing sessions is expensive, especially if you are hitting datastore very frequently.
    3. Apps can send emails (there are paid/free limits)
    4. Apps can make HTTP requests to outside world using URLConnection
    5. Apps get google authentication support out of the box. Apps don’t have to manage user information or build login application/module to create user specific content.
    6. Currently GAE doesn’t provide a way to set which datacenter (or country) to host your app from (Amazon allows users to choose US or EU). They are actively working to solve this problem.

    Thats all for now, I’ll keep you updated as things move along. If you are curious about something very specific, please do leave a comment here or at the GAE java google group.

    May 17, 2009

    New EC2 features: Elastic Load Balancing, Auto Scaling, and Monitoring


    If you have not used EC2 because of some reason, chances are that those reasons don’t exist anymore. More information available in the following places.

    1. AWS Blog
    2. All things Distributed
    3. Right Scale

    March 01, 2009

    Safari crossed 10% mark ?

    Apple released some statistics to show that thanks to Safar 4 beta, Safari has crossed 10% threshold for the first time. Though that might be true, I don’t see it sticking there. Safari 4 javascript execution was fast, but I found Chrome to be faster. I for one have already abandoned Safar 10 on my windows. Doesn’t mean I hate it… just means that I’m not convinced its the best yet.

    Friendfeed using Mysql for Schema-less data

    Bret has a nice little article talking about why most people should still stick with known, tested database engines even if the data stored is not relational. Friendfeed uses a simple table to keep attribute value pairs and separate tables to keep indexes for each attribute which needs indexing.FriendFeed

    The design is very simple and reasonable, and makes an effective argument against using cloud DB (or something like CouchDB) when you can get away with what you need with true and tested engines.

    February 28, 2009

    Experimenting with SimpleDB (

    A few years ago I wrote a simple online bookmarking tool called Flagthis. The tool allowed one to bookmark sites using a javascript bookmarklet from the bookmark tab. The problem it was trying to solve is that most links people bookmark are never used again if they are not checked out within the next few days.  The tool helps the user ignore bookmarks which were not used in last 30 days.

    The initial version of this tool used MySQL database. The original application architecture was very simple, and other than the database it could have scaled horizontally. Over the weekend I played a little with SimpleDB and was able to convert my code to use SimpleDB in a matter of hours.


    Here are some things I observed during my experimentation

    1. Its not a relational database.
    2. Can’t do joins in the database. If joins have to be done, it has to be done at the application which can be very expensive .
    3. De-normalizing data is recommended.
    4. Schemaless: You can add new columns (which are actually just new row attributes) anytime you want.
    5. You have to create your own unique row identifiers. SimpleDB doesn’t have a concept of auto-increment
    6. All attributes are auto-Indexed. I think in Google App Engine you had to specify which columns need indexing. I’m wondering if this would increase cost of using SimpleDB.
    7. Data is automatically replicated across Amazon’s huge SimpleDB cloud. But they only guarantee something called “Eventually Consistent”. Which means data which is “put” into the system is not guaranteed to be available in the next “get”.
    8. I couldn’t find a GUI based tool to browse my SimpleDB like the way some S3 browsers do. I’m sure someone will come up with something soon. [Updated: Jeff blogged about some simpleDB tools here]
    9. There are limits imposed by SimpleDB on the amount of data you can put in. Look at the tables below.


    Attribute Maximum
    domains 100 active domains
    size of domains 10GB
    attributes per domain 250,000,000
    attributes per item 256 attributes
    size per attribute 1024 characters


    Attribute Maximum
    items returned in a query response 250 items
    seconds a query may run 5 seconds
    attribute names per query predicate 1 attribute name
    comparisons per predicate 10 operators
    predicates per query expression 10 predicates

    Other related discussions (Do checkout CouchDB)

    Techmeme run out of news ?

    A lot of us go to Techmeme for our hourly fix. But for the last few hours things haven’t been quite the same. Come to think of it, the quality of news on techmeme could be an indicator of whats left to come to the tech industry.

    The first couple of news of news has nothing to do with technology in general and the third news item is a few days old already. The three items after that are the same old news in different wrapping.


    Either the weekend is getting to me, or this is the lull before the storm.