Posted in July 22, 2010 ¬ 8:22 pmh.Royans
If you have read this blog before, you know how much I admire those who use continuous deployments in production. Doing that at scale is even more impressive. But the message which gets lost sometimes is that Continuous deployments may not be for everyone. Most continuous integration environments usually do all of their deployments from [...]
Read the rest of this entry »
Posted in June 22, 2010 ¬ 1:44 pmh.Royans
Interesting observations/thoughts on web operations collected from a few sessions at Velocity conference 2010 [ most are from a talk by Theo Schlossnagle, author of “Scalable internet architectures” ] Optimization Don’t over optimize. Could take away precious resources away from critical functions. Don’t scale early. Planning for more than 10 times the load you currently [...]
Read the rest of this entry »
Posted in May 31, 2010 ¬ 12:07 pmh.Royans
A few of us joined in at the new Twitter office in downtown SF (right next to Moscone Center) and were for the first time shown what Twitter is doing about “Twitter Annotations”. We probably created the first set of 3rd party applications around this new API. During the Hackathon I spent some time to [...]
Read the rest of this entry »
Posted in March 27, 2010 ¬ 3:13 pmh.Royans
Ted Dziuba has a post about “I can’t wait for NoSQL to Die”. The basic argument he makes is that one has to be at the size Google is to really benefit from NoSQL. I think he is missing the point. Here are my observations. This is similar to the argument the traditional DB vendors [...]
Read the rest of this entry »
CAP, NOSQL, architecture, cassandra, cloud, database, datastore, eventually consistent, highavailabilityarchitecture, CAP, datastore.nosql
Posted in March 9, 2010 ¬ 12:58 amh.Royans
While efficient automated deployment tools like Puppet and Capistrano are a big step in the right direction, its not the complete solution for an automated deployment process. This post will explore some of the less discussed issues which are as important for automated, fast, repeatable scalable deployments. Rapid Build and Integration with tests Use Source [...]
Read the rest of this entry »
Posted in March 5, 2010 ¬ 7:44 amh.Royans
Unless you are running a fly by night shop, DR (Disaster recovery) should be one of the top issues for your operations team. In a “Scalable architecture” world, the complexity of DR can become a disaster in itself. Yesterday Google Announced that it now finally has DR plan for Google Apps. While this is nice, [...]
Read the rest of this entry »
Posted in March 1, 2010 ¬ 9:30 pmh.Royans
Reddit has a very interesting post about what not to do when trying to build a scalable system. While the error is tragic, I think its an excellent design mistakes to learn from. Though the post lacked detailed technical report, we might be able to recreate what happened. They mentioned they are using MemcacheDB datastore, [...]
Read the rest of this entry »
Posted in February 14, 2010 ¬ 3:33 pmh.Royans
Large distributed systems run into a problem which smaller systems don’t usually have to worry about. “Brewers CAP Theorem” [ Ref 1] [ Ref 2] [ Ref 3] defines this problem in a very simple way. It states, that though its desirable to have Consistency, High-Availability and Partition-tolerance in every system, unfortunately no system can [...]
Read the rest of this entry »
Posted in February 5, 2010 ¬ 1:44 amh.Royans
Most of the newer, successful, web startups have one thing in common. They release smaller changes more often. Being in operations, I am often surprised how these organizations manage such a feat without breaking their website. Here are some notes from someone in flickr about how they do it. The two most important part of [...]
Read the rest of this entry »
Posted in February 3, 2010 ¬ 1:27 amh.Royans
Networking devices on the edges have become smarter over time. So have the firewalls and switches used internally within the networks. Whether we like it or not, web applications over time have grown to depend on them. Its impossible to build a flawless product because of which its standard practice to disable all unused services [...]
Read the rest of this entry »