Showing posts from 2010

Switching roles: next stop Google

Jan of 2011 will start a little different for me after 10 years. I’ve accepted a position in the Google Apps Enterprise Group and would be joining them early next month. Other than the fun stuff I do outside my regular job, I’ve been in IT related roles for as long as I can remember. And while IT has been very challenging and is an exciting field to be in , I feel that its time for a little exploration. I will deeply miss all of my friends at Ingenuity . Some of whom I’ve worked with for over 10 years... but I'm ready for my next challenge.

Switching roles: next stop Google

Jan of 2011 will start a little different for me after 10 long years. I’ve accepted a position in the Google Apps Enterprise group and would be joining them early next month. Other than the fun stuff I do outside my regular job, I’ve been doing IT related stuff for as long as I can remember. And while IT has been very challenging and exciting field to be in , I feel that its time for a little exploration. My scalable web architecture blog and this personal blog will continue to stay up, but I’m not sure at this point how my new job will impact the frequency at which I post here.

S4: Distributed Stream Computing Platform

A few weeks ago I mentioned Yahoo! Labs was working on something called S4 for real-time data analysis. Yesterday they released an 8 page paper with detailed description of how and why they built this. Here is the abstract from the paper. Its interesting to note that the authors compared S4 with MapReduce and explained that MapReduce was too optimized for batch process and wasn’t the best place to do real time computation. They also made an architectural decision of not building a system which can do both offline (batch) processing and real-time processing since they feared such a system would end up to be not good for either. S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. Keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) emit one or mo

REST APIs for cloud management and the launch

I found the top two stories on scalebig last night to be interesting enough for me to dig a little deeper. The one which surprised me the most was William Vambenepe’s post about why he thinks that REST APIs doesn’t matter in context of cloud management. While REST might be ideal for many different things, including web based applications which are accessed mostly by the browsers, Amazon chose to avoid REST for most of its infrastructure management APIs. Has this lack of REStfulness stopped anyone from using it? Has it limited the scale of systems deployed on AWS? Does it limit the flexibility of the Cloud offering and somehow force people to consume more resources than they need? Has it made the Amazon Cloud less secure? Has it restricted the scope of platforms and languages from which the API can be invoked? Does it require more experienced engineers than competing solutions? I don’t see any sign that the answer is “yes” to any of these questions. Considering the

Providing Dynamic DNS over “Amazon Route 53” ( a hackathon )

On hindsight, yesterday’s “Route 53” announcement was not completely unexpected. Amazon is an IAAS provider and its in their own interest to automate infrastructure as much as possible. After tackling monitoring and cloudfront features, DNS was one of the more obvious targets for improvement. So when I was trying to pick a challenge for this morning’s hackathon , I picked one around “Amazon Route 53”  service. At the end of the day I had a almost functional public dynamic DNS service using “Route 53” as the DNS service and Twitter’s oauth service for authentication. The final hack is up here You are most welcome to play with it and/or use it. After the initial creation of user in the system (with a little help from twitter’s oauth), the end user is free to use the browser based web application or  lynx or curl based REST interface to add/create/update host records. The current version only supports “A” records, but it would be expa

Amazon Route 53 : Programmable DNS is finally here

Managing DNS has been considered as an art by many. If you manage your own DNS records, and run your own external DNS servers, I’m sure you have some stories to share. Unfortunately unlike most other infrastructure on the internet, DNS screw-ups can get very costly, especially because caching policies can tend to keep your mistakes alive long after you have rolled back your changes. The unforgiving nature of DNS has forced most, except a few hardcore sys-admins, from avoiding the DNS hell and choosing a managed service to do it for them. Domain name registrars like network solutions, mydomain and godaddy already provide these DNS services, but I can’t recall any of them providing APIs to make these changes automatically. DynDNS does provide an API to change DNS mappings, but it costs15 bucks a year for a single host. There might be others which I’m not aware off, but the bottom line is that there is no standard, and its not cheap. Customers on AWS today unfortunately have t

Scalability links for December 4th

Scalability links for December 4th: Presenting - La Brea - An interesting tool which could be used to understand how failures, latency and other annoying issues can impact an application. The tool allows one to insert system calls into an existing application without recompiling original application. What's new in Cassandra 0.7: Secondary indexes - I finally see an example of the promissed land !! :) Can't wait to try this out. NoCAP Part III GigaSpaces clustering explained.. - Devops - The War Is Over - if You Want It - Great Introductory Video on Scalability from Harvard Computer Science - Strategy: Google Sends Canary Requests into the Data Mine - This is another way of testing code thrown out by continuous deployments. Very nice. Very Low-Cost, Low-Power Servers - Better Workflow Management in CDH with Oozie 2 - Facebook at 13 Million Queries Per Second Recommends: Minimize Request Variance - Keeping Customers Happy - Another New Elastic Load Balancer Feature -

AWS Cloudwatch is now really open for business

In a surprise move Amazon today released a bunch of new features to its cloudwatch service, some of which, till now, were provided by third party service providers. Basic Monitoring of Amazon EC2 instances at 5-minute intervals at no additional charge. Elastic Load Balancer Health Checks -Auto Scaling can now be instructed to automatically replace instances that have been deemed unhealthy by an Elastic Load Balancer. Alarms - You can now monitor Amazon CloudWatch metrics, with notification to the Amazon SNS topic of your choice when the metric falls outside of a defined range. Auto Scaling Suspend/Resume - You can now push a "big red button" in order to prevent scaling activities from being initiated. Auto Scaling Follow the Line -You can now use scheduled actions to perform scaling operations at particular points in time, creating a time-based scaling plan. Auto Scaling Policies - You now have more fine-grained control over the modifications

Kafka : A high-throughput distributed messaging system.

Found an interesting new open source project which I hadn’t heard about before. Kafka is a messaging system used by linkedin to serve as the foundation of their activity stream processing. Kafka is a distributed publish-subscribe messaging system. It is designed to support the following Persistent messaging with O(1) disk structures that provide constant time performance even with many TB of stored messages. High-throughput: even with very modest hardware Kafka can support hundreds of thousands of messages per second. Explicit support for partitioning messages over Kafka servers and distributing consumption over a cluster of consumer machines while maintaining per-partition ordering semantics. Support for parallel data load into Hadoop. Kafka is aimed at providing a publish-subscribe solution that can handle all activity stream data and processing on a consumer-scale web site. This kind of activity (page views, searches, and other user actions) are a key ingredient in many of

The unbiased private vs AWS ROI worksheet

One of the my problems with most cloud ROI worksheets is that they are heavily weighted for use-cases where resource usage is very bursty. But what if your resource requirements aren’t bursty ? And what if you have a use case where you have to maintain a small IT team to manage some on-site resources due to compliance and other issues ?  In his latest post , Richard shares his worksheet for everyone to play with.    

Google App Engine 1.4.0 pre-release is out

The complete announcement is here , but here are the changes for the java SDK. The two big changes I liked is the fact that there is now an “always on” feature, and “tasks” feature has graduated out of beta/testing. The Always On feature allows applications to pay and keep 3 instances of their application always running, which can significantly reduce application latency. Developers can now enable Warmup Requests. By specifying  a handler in an app's appengine-web.xml, App Engine will attempt to to send a Warmup Request to initialize new instances before a user interacts with it. This can reduce  the latency an end-user sees for initializing your application. The Channel API is now available for all users. Task Queue has been officially released, and is no longer an experimental feature. The API import paths that use 'labs' have been deprecated. Task queue storage will count towards an application's overall storage quota, and will thus

How to setup Amazon Cloudfront ( learning with experimentation )

I have some experience with Akamai’s WAA (Web applications archive) service, which I’ve been using in my professional capacity for a few years now. And I’ve have been curious about how  cloudfront compares with it . Until a few weeks ago, Cloudfront didn’t have a key feature which I think was critical for it to win the traditional CDN customers. “ Custom origin ” is an amazing new feature which I finally got to test last night and here are my notes for those who are curious as well. My test application which I tried to convert was my news aggregator portal . The application consists of a rapidly changing front page (few times a day) ,  a collection of old pages archived in a sub directory and some other webpage elements like headers, footers, images, style-sheets etc. While Amazon Coudfront does have a presence on AWS management console , it only supports S3 buckets as origins. Since my application didn’t have any components which requi

Netflix: Dev and Ops internals

I’ve seen a number of posts from Netflix folks talking about their architecture in the recent weeks. And part of that is due to an ongoing effort to expand their business for which they seem to be hiring like crazy. Here is the yet another interesting deck of slides which mentions stuff across both Dev and Ops. One of the most interesting deck of slides I’ve seen in recent past. View more presentations from Adrian Cockcroft .

The Cloud: Watch your step ( Google App engine limitations )

Any blog which promotes the concept of cloud infrastructure would be doing injustice if it doesn’t provide references to implementations where it failed horribly. Here is an excellent post by Carlos Ble where he lists out all the problems he faced on Google App engine (python).  He lists 13 different limitations, most of which are very well known facts, and then lists some more frustrating reasons why he had to dump the solution and look for an alternative. The tone of the voice is understandable, and while it might look like App-Engine-bashing, I see it as a great story which others could lean from. For us, GAE has been a failure like Wave or Buzz were but this time, we have paid it with our money. I've been too stubborn just because this great company was behind the platform but I've learned an important lesson: good companies make mistakes too. I didn't do enough spikes before developing actual features. I should have performed more proofs of concept before inv

Sawzall and the PIG

When I heard interesting uses cases of how “ Sawzall ” is used to hack huge amounts of log data within Google I was thinking about two things. Apache PIG, which is “a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.” CEP (Complex event processing) - consists in processing many events happening across all the layers of an organization , identifying the most meaningful events within the event cloud, analyzing their impact, and taking subsequent action in real time. [ Also look at esper ] Google has opened parts of this framework in a project called “ Szl ” Sawzall is a procedural language developed for parallel analysis of very large data sets (such as logs). It prov

Presentation: “OrientDB, the database of the web”

I knew there was something called “OrientDB”, but didn’t know much about it until I went through these slides. Here is what I learned in one sentence. Its a easy to install NoSQL(schemaless) datastore, with absolutely no configuration required, supports ACID transactions, it can be used as a document store, a graph store and a key value store, it can be queried using SQL-like and JSON syntax, supports indexing and triggers and its been benchmarked to do 150000 inserts using commodity hardware.  That’s a lot of features. OrientDB the database for the Web of lvca - Snoopal

Cloud economics: Not really black and white..

While some of the interest in moving towards public cloud is based on sound economics, there is a small segment of this movement purely due to the “ herd mentality ”. The slide on the right is from a Microsoft publication shows that larger networks may be less economical on the cloud (at least today). Richard Farley, has been discussing this very topic for few months now. He observed that a medium sized organization which already has a decent IT infrastructure including a dedicated IT staff to support it has a significantly smaller overhead than what cloud vendors might make it look like. Here is a small snippet from his blog. If you are not afraid to get dirty with numbers read the rest here . Now, we know we need 300 virtual servers, each of which consumes 12.5% of a physical host.  This means we need a total of 37.5 physical hosts.  Our vendor tells us these servers can be had for $7k each including tax and delivery with the cabinet.  We can’t buy a half server, a

Cassandra: What is HintedHandoff ?

Nate has a very good post about how Cassandra is different from a lot of other distributed data-stores. In particular he explains that every node in a Cassandra cluster are identical to every other node . After using cassandra and a few months I can tell you for a fact that its true. It does come at a price though. Because its so decentralized, if you want to make a schema change, for example, configuration files of all of the nodes in the cluster need to be updated all at the same time. Some of these problems will go away when 0.70 finally comes out . While is true that Cassandra doesn’t have a concept of single “master” server, each node participating in the cluster do actually act as masters and slaves of parts of the entire key range. The actual % size of the key range owned by an individual node depends on the replication factor, on how one picks the keys and what partitioner algorithm was selected. The fun starts when a node, which could be the master for a range of ke

OpenTSDB – Distributed time series database

Ever since I saw a demo of this tool, I’ve been on the edge, waiting for it to be opensourced so that I could use it.  The problem its trying to solve is a real pain-point which most webops folks would understand. Yesterday folks at stumbleupon finally opened it up. Its released under LGPLv3 license. You can find the source here and the documentation here . At StumbleUpon, we have found this system tremendously helpful to: Get real-time state information about our infrastructure and services. Understand outages or how complex systems interact together. Measure SLAs (availability, latency, etc.) Tune our applications and databases for maximum performance. Do capacity planning.

AWS cloudfront grows up… a little. Now allows Custom origins.

  Cloudfront has come a long way from its humble beginnings. Here is what Jeff had to say when he announced that its out of “beta” …. First, we've removed the beta tag from CloudFront and it is now in full production. During the beta period we listened to our customers and added a number of important features including Invalidation , a default root object , HTTPS access , private content , streamed content , private streamed content , AWS Management Console support , request logging , and additional edge locations . We've also r educed our prices . There's now an SLA (Service Level Agreement) for CloudFront. If availability of your content drops below 99.9% in any given month, you can apply for a service credit equal to 10% of your monthly bill. If the availability drops below 99% you can apply for a service credit equal to 25% of your monthly bill. While all this is a big step forward, its probably not enough for the more advanced CDN users to

Building your first cloud application on AWS

Building your first web application on AWS is like shopping for a car at pepboys, part by part . While manuals to build one might be on aisle 5, the experience of having built one already is harder to buy. Here are some interesting logistical questions, which I don’t think get enough attention, when people discuss issues around building a new AWS based service. Picking the right Linux distribution : Switching OS distribution may not be too simple if your applications need custom scripts. Picking and sticking with a single distribution will save a lot of lost time. Automated server builds : There are many ways to skin this cat. Chef , Puppet , Cfengine are all good... Whats important is to pick one early in the game. Multi-Availability Zone support: Find out if multi availability zone support is important. This can impact over all architecture of the solution and tools used to implement the solution. Data consistency requirements : Similar to the Multi-AZ support q

Riak MapReduce: A story in Three Acts

  Riak MapReduce: A Story In Three Acts

Rapid prototyping with solr

Extreme prototyping with Solr by Eric Hatcher At ApacheCon this week I presented “Rapid Prototyping with Solr” .  This is the third time I’ve given a presentation with the same title.  In the spirit of the rapid prototyping theme, each time I’ve created a new prototype just a day or so prior to presenting it.  At Lucene EuroCon the prototype used attendee data, a treemap visualization, and a cute little Solr-powered “app” for picking attendees at random for the conference giveaways.  For a recent Lucid webinar the prototype was more general purpose, bringing in and making searchable rich documents and faceting on file types with a pie chart visualization. This time around, the data set I chose was’s catalog of datasets , which fit with the ApacheCon open source aura, and Lucid Imagination’s support of Open Source for America , which helps to advocate for open source in the US Federal Government.  The prototype built includes faceting browsing, query

Shipping Trunk : For web applications

I had briefly blogged about this presentation before from Velocity 2010. I wish they had released the video for this session. I went through this slide deck again today to see if Paul mentioned any of the problems organization like ours are dealing with in its transition from quarterly releases to weekly/continuous releases. One of the key observations Paul made during his talk is that most organizations still treat web applications as desktop software and have very strict quality controls which may not be as necessary since releasing changes for web app in a SAAS (Software as a service) is much more cheaper than for releasing patches for traditional desktop software. Here are some of the other points he made. For really detailed info check out the [ slides ] Deploy frequently, facilitating rapid product iteration Avoid large merges and the associated integration testing Easily perform A/B testing of functionality Run QA and beta testing on production hardware

Real-Time MapReduce using S4

While trying to figure out how to do real-time log analysis in my own organization I realized that most map- reduce frameworks are designed to run as batch jobs in time delays manner rather than be instantaneous like a SQL query to a Mysql DB. There are some frameworks which are bucking the trend. Yahoo! Lab! recently announced that their “Advertising Sciences” group has built a general purpose, real-time, distributed, fault-tolerant, scalable, event driven, expandable platform called “S4” which allows programmers to easily implement applications for processing continuous unbounded streams of data. S4 clusters are built using low-cost commoditized hardware, and leverage many technologies from Yahoo!’s Hadoop project. S4 is written in Java and uses the Spring Framework to build a software component architecture. Over a dozen pluggable modules have been created so far. Why do we need a real-time map-reduce framework? Applications such as personalization, user fee

Storage options on app engine

For those who think google app engine only has one kind of datastore, the one built around “ bigtable ”, think again. Nick Johnson goes into details of all the other options available with their pro’s and con’s in his post. App Engine provides more data storage mechanisms than is apparent at first glance. All of them have different tradeoffs, so it's likely that one - or more - of them will suit your application well. Often, the ideal solution involves a combination, such as the datastore and memcache, or local files and instance memory. Storage options he lists.. Datastore Memcache Instance memory Local Files Read more: Original post from Nick

Is auto-sharding ready for auto-pilot ?

James Golick makes a point which lot of people miss. He doesn’t believe auto-sharding features NoSQL provides is ready for full auto-pilot yet, and that good developers have to think about sharding as part of design architecture, regardless of what datastore you pick. If you take at face value the marketing materials of many NoSQL database vendors, you'd think that with a horizontally scalable data store, operations engineering simply isn't necessary. Recent high profile outages suggest otherwise. MongoDB, Redis-cluster (if and when it ships), Cassandra, Riak, Voldemort, and friends are tools that may be able to help you scale your data storage to varying degrees. Compared to sharding a relational database by hand, using a partitioned data store may even reduce operations costs at scale. But fundamentally, no software can use system resources that aren't there. At the very least one has to understand how auto sharding in a NoSQL works, how easy is it to setu

Why Membase Uses Erlang

It’s totally worth it. Erlang (Erlang/OTP really, which is what most people mean when they say “Erlang has X”) does out of the box a lot of things we would have had to either build from scratch or attempt to piece together existing libraries to do. Its dynamic type system and pattern matching (ala Haskell and ML) make Erlang code tend to be even more concise than Python and Ruby, two languages known for their ability to do a lot in few lines of code. The single largest advantage to us of using Erlang has got to be its built-in support for concurrency . Erlang models concurrent tasks as “ processes ” that can communicate with one another only via message-passing (which makes use of pattern matching!), in what is known as the actor model of concurrency. This alone makes an entire class of concurrency-related bugs completely impossible. While it doesn’t completely prevent deadlock , it turns out to be pretty difficult to miss a potential deadlock scenario when you wr

Cassandra's future @facebook and links to other NoSQL slides

I heard an unconfirmed rumor that facebook is moving away from Cassandra . Not sure why, or to what, but rumors like this is a concern regardless. After twitter 's backing off , and digg's troubles , which were indirectly linked to either Cassandra's maturity as a production solution or their understanding of Cassandra's capability, it might raise more eyebrows if facebook does really abandon cassandra.  Cassandra was created in Facebook, which it opensourced, but its my understanding today that most of the development on the open sourced cassandra happens outside its walls. Rackspace is a big sponsor(may not be the largest anymore) of the open source project and Riptano , which has built a whole compnay around the open source project has done a tremendous job of promoting. Scalability links for October 30th : SD Forum Membase Talk Slides - Slides from "Membase" Talk on Oct 26th at SDForum. NoSQL Database Architectures & Hypertable - NoSQL Data

Scalability links for October 28th

Scalability links for October 28th: New Features in Cassandra 0.7 - The biggest change for me in 0.7 is the presence of secondary indexes (inverted indexes). I'll be switching cfmap to 0.7 very soon... Do you have an Elephant and Pig in your data center? Hadoop momentum continues - Hadoop's is taking over the data analysis world in a way LAMP once did. Pregel: Graph Processing at Large-Scale - Some slides on Pregel "A System for large-scale graph processing" High-End Varnish – 275 thousand requests per second. - Very impressive. RESTful Cassandra - REST apis to cassandra is the one thing which is missing. This might fill the gap.

Scalability links for October 21st

Scalability links for October 21st: VMware and Google Launch 1st Series of Development Tools - VMware has collected some interesting assets. One of them was SpringSource. I'm slightly surprised with google and vmware working together. I guess both of them think there is something they both can gain from it. OpenStack: An Open Cloud Initiative Makes its 1st Release - Openstack has promised a lot. Lets figure out if it can deliver. I'm optimistic. AWS Free Tier: 750 hours of EC2 for free - This is big. They are going after Google App Engine I think. But it sucks for those who are already on AWS since this is available only to new users. SPOF (Single Point of Failure) Analysis - I can see why this can be a science by itself. SPOF detections and impact analysis is very important in production systems. Data Center Automation Startup Puppet Labs Acquires Open Source Project The Marionette Collective - The Marionette Collective aka. mcollective is a framework to build

Scalability links for October 18th

Scalability links for October 18th: Foursquare MongoDB Outage Post Mortem - Detailed analysis of what caused the foursquare (mongodb) outages. SURGE Recap - Interesting take aways from a scalability conference. The one new take away I noticed is "Make use of academic literature". Netflix Migration to the Cloud - Very interesting (technical) information about why netflix moved to the cloud. Phoebus: Erlang-based Implementation of Google's Pregel - Is Phoebus and attempt at opensource version of Pregel ? Why Riak Search Matters... - Didn't understand riak until this post compared it with lucene and solr. Itching to try it out... one more nosql experimentation isn't going to kill me. Google at USENIX Symposium on Operating Systems Design and Implementation (OSDI ‘10) - I've mentioned one of the papers listed here already. the other two seem to be interesting as well. OpenTSDB: A Distributed, Scalable Monitoring System on Top of HBase - I sa

Scaling Graphite by using Cfmap as the data transport

Graphite is an extremely promising system and resource graphing tool which tries to take RRD to the next level. Here are some of the most interesting features of graphite which I liked. Updates can happen to records in the past (RRD doesn’t allow this I think) Creation of new datasets is trivial with whisper/carbon ( its part of  the graphite framework ) Graphite allows federated storage (multiple servers across multiple datacenters for example) Monitoring and graphing resources across data-centers is tricky however. Especially because WAN links cannot be trusted. While loosing some data may be ok for some folks, it may not be acceptable for others. Graphite tries to solve this problem by providing an option to federate data across multiple servers which could each be kept in separate datacenters. Another way to solve this problem is by using a data transport which is resilient to network failures. Since Cfmap (thanks to Cassandra ) is a distributed, eventuall

Cfmap: Publishing, discovering and dashboarding infrastructure state

Dynamic infrastructure can be a challenging if apps and scripts can’t keep up with them. At Ingenuity we observed this problem when we started moving towards virtualization and SOA (service oriented architecture). Remembering server names became impractical, and error-free manual configuration changes became impossible. While there are some tools which solve parts of this specific problem, we couldn’t find any opensource tool which could be used to both publish and discover state of a system in a distributed, scalable and fault-tolerant way. Zookeeper which comes pretty close to what we needed was a fully consistent system which was not designed to be used across multiple data centers over high latency, unstable network connections. We wanted a system which could not only be up during network outages, but also sync up the state from different data-centers when they are connected. We built a few different tools to solve our scalability problems, one of which is a tool called Cfmap