Showing posts from January, 2010

The real concerns about Cloud infrastructure (as it is today)

While “ private clouds may not be the future ” they are definitely needed today. Here are some of the top issues bothering some organizations which have been thinking about going into the cloud. Some of issues were based on Craig Bolding ’s talk on “Guide to cloud security”. Unlike your own data center, you will never know what the cloud vendors are running , or how they backup, or what their DR plans are. They will say you shouldn’t care, but do you remember what happened to the Tmobile customer’s on Danger ? Uptime, availability and responsiveness is less predictable than in a self hosted environment. In most cases the cloud vendors may not even choose to let customers know about major maintenance if they don’t anticipate any issues. Organizations who manage their own infrastructure would always try to avoid doing two major changes which have interdependencies. Multi-Tenancy means you may have to worry about a noisy neighbor . Muti-Tenancy could als

Fixing GSLB (Global Server load balancing)

Standard DNS protocol allows DNS servers to respond with multiple addresses in the replies for simple DNS lookup queries. This, and the way that the order of records is changed in every reply is collectively known as the “ Round Robin DNS ” technique to load balance across a set of servers. Though a lot of organizations are using Round Robin DNS to load balance across servers in the same datacenter, some are also trying to use it as an HA solution by load balancing across multiple datacenters. In an event of a failure in one of the datacenter, using such an implementation, the impact could be limited, and with a slight change of DNS configuration (removing the IP of the datacenter which went down) the site could become fully operational again. It would be nicer if the DNS servers could monitor and remove servers which are inactive or are throwing  errors of some kind. This is what GSLBs are all about. But what they really excel at, which regular DNS servers can’t do, is that

Cloud computing in 1963 ( actually Timesharing )

Found this on Feld Thoughts . Its not really about cloud computing, but they are interested in making efficient use of computational resources, which is one of the goals of today’s “cloud computing”  as well. This magnificent video is from 1963 Timesharing: A Solution to Computer Bottlenecks where MIT Professor Fernando Corbato explains how timesharing works to MIT Science Reporter John Fitch (who has one of those magnificent deep reporter voices).

AppScale, an OpenSource GAE implementation

If you don’t like EC2 you have an option to move your app to a new vendor. But if you don’t like GAE  (Google app engine) there aren’t any solutions which can replace GAE easily. AppScale might change that. AppScale is an open-source implementation of the Google AppEngine (GAE) cloud computing interface from the RACELab at UC Santa Barbara. AppScale enables execution of GAE applications on virtualized cluster systems. In particular, AppScale enables users to execute GAE applications using their own clusters with greater scalability and reliability than the GAE SDK provides. Moreover, AppScale executes automatically and transparently over cloud infrastructures such as the Amazon Web Services (AWS) Elastic Compute Cloud (EC2) and Eucalyptus, the open-source implementation of the AWS interfaces. The list of supported infrastructures is very impressive. However the key, in my personal opinion, would be stability and compatibility with current GAE APIs. Learn mor

Videos on scalable web architectures

If you are like me, you are already following all the talks and presentations published on YouTube. But if you have not been, nothing stops you from starting now. A new “ Videos ” page has been added to this blog to list the latest YouTube videos related to scalable web architectures. Videos related to scalable web architectures Please leave comments if you have a favorite online lecture/presentation which is not listed here.

Scalability Updates for Jan 26th 2010

A few interesting updates for today Derrick Harris made an insightful observation that Cloud providers are pairing up with CDNs . I won’t be surprised if some consolidations happen in this arena Cassandra 0.5.0 is released . I can’t wait to try it out. Paper: Keyspace: A consistently replicated, highly-available key-value store Paper: PaxosLease: Diskless Paxos for leases Cloudkick : A cross cloud-platform monitoring service. A short review here . In a post by Reuven Cohen about “ Oversubscribing the cloud ” talks about why “quotas” are important in a cloud infrastructure. He thinks there is a non-linear relation between capacity and customer demand. If it were non-linear, does it mean things should get more expensive as demand increases ? I hadn’t heard of “ graph database ” until today. HyperGraphDB does just that and its a distributed database.

Hive @Facebook

Hive is a data warehouse infrastructure built over Hadoop . It provides tools to enable easy data ETL, a mechanism to put structures on the data, and the capability to querying and analysis of large data sets stored in Hadoop files. Hive defines a simple SQL-like query language, called QL, that enables users familiar with SQL to query the data. At the same time, this language also allows programmers who are familiar with the MapReduce fromwork to be able to plug in their custom mappers and reducers to perform more sophisticated analysis that may not be supported by the built-in capabilities of the language. At a user group meeting, Ashish Thusoo from Facebook data team, spoke about how Facebook uses Hive for their data processing needs. Problem Facebook is a free service and has been experiencing rapid growth in last few years. The amount of data it collects, which used to be around 200GB per day in March 2008, has now grown to 15TB per day today.  Facebook realized early on that ins

Scalability Killers (The art of scalability)

Top 10 scalability killers from The Art of scalability: Scalable Web Architecture, Processes, and Organizations for Modern Enterprise Thinking Scalability is just about technology Overuse of Synchronous calls Failure to weed or seed soon enough Inappropriate use of databases Cesspools instead of swim lanes Reliance on Vertical scale Failure to Learn from History Changing Development methodologies to fix problems Too little caching, too late Overreliance on Third parties to scale

Private clouds not the future ?

James Hamilton is one of the leaders in this industry and has written a very thought provoking post about private clouds not being the future . This is what he said about private clouds when compared to existing not-cloud solutions. A fix, Not the future (reference to an InformationWeek post) Runs at lower utilization levels Consumes more power Less efficient environmentally Runs at higher costs Though I believe in most of his comments, I’m not convinced with the generalization of the conclusions. In particular, what is the maximum number of servers one need to own, beyond which outsourcing will become a liability. I suspect this is not a very high number today, but will grow over time. Hardware costs : The scale at which Amazon buys infrastructure is just mind boggling, but organizations buying in bulk could get pretty good deal from those same vendors as well.  Its not clear to me how many servers one has to buy to get discounts like what amazon does.

HAProxy : Load balancing

Designing any scalable web architecture would be incomplete without investigating “load balancers”.  There used to be a time when selecting and installing load balancers was an art by itself. Not anymore. A lot of organizations today, use Apache web servers as a proxy server (and also as a load balancer) for the backend application clusters. Though Apache is the most popular web server in the world, it also considered over-weight if all you want to do is proxy a web application. The huge codebase which apache comes with and the separate modules which need to be compiled and configured with it, could soon become a liability. HAProxy is a tiny proxying engine which doesn’t have all the bells and whistles of apache, but is highly qualified to act as a HTTP/TCP proxy server. Here are some of the other wonderful things I liked about it Extremely tiny codebase. Just two runtime files to worry about, the binary and the configuration file. Compiles in seconds. 10 seconds the

Google App Engine and Social Apps


ESI: Edge Side Includes

Web page caching gets tricky once personalization is involved. Lets take twitter public_timeline for example which seems to be perfect for caching. Unfortunately when a user is logged in, it also shows the user’s information. Caching that particular page in its entirety, on the web server, in such scenarios, may not be an option. Another scenario is where parts of a page might expire faster than other (require different cache TTLs). Here again caching the whole page doesn’t help. Edge side includes(ESI) is a markup language specifically designed to help web servers assemble dynamic content at the web layer. <esi:include src=""/> The above ESI tag is similar to tags in jsp/php/etc which allow one page to refer to another page for parts of the content on the page. By breaking up the page into smaller objects the webserver could apply different TTL settings (and user validation) to different parts of content. Twitter used to (and may still ) use “ Varnis

Google patents Map reduce “System and method for efficient large-scale data processing”

After filing in 2004, google finally got its patent on “ System and method for efficient large-scale data processing ”  approved  yesterday. Gigaom pointed out that if Google really wants to enforce it, it would have to go after many different vendors who are implementing “mapreduce” in some form in their applications and databases. Google’s intentions of how to use it are not clear, but this is what one of the spokesperson  said. Like other responsible, innovative companies, Google files patent applications on a variety of technologies it develops. While we do not comment about the use of this or any part of our portfolio, we feel that our behavior to date has been inline with our corporate values and priorities.

Heroku platform for scalable web applications

I’m so locked up in my own java world that I didn’t realize something as cool as this existed in the ruby world. Heroku is the instant ruby platform. Deploy any ruby app instantly with a simple and familiar git push . Take advantage of advanced features like HTTP caching , memcached , rack middleware , and instant scaling built into every app. Never think about hosting or servers again. From a layman’s point of view, Heroku looks like a ruby version of GAE ( Google app engine ). It has some of the same features as GAE.  But unlike GAE, Heroku actually talks about their architecture in great detail. They use Nginx as the front-end HTTP reverse proxy server and Varnish for the caching right behind Nginx. They wrote their own custom software to “route” requests between the web frontend and the backend services. The actual user code runs on the “Dyno Grid” where each dyno looks like a self contained ruby instance with user code (compiled slugs). There could be

Dilbert and the cloud


Architecting for the Cloud: Best practices

Amazon has published another “ Best practices ” document. This one covers the almost the entire collection of services. Its biased towards AWS (obviously), but its still one of the best description summary of the various services amazon offers today. Just the diagram above tells a lot about how the various AWS services interact with each other. Here is another small section from the document. AWS specific tactics to automate your infrastructure Define Auto-scaling groups for different clusters using the Amazon Auto-scaling feature in Amazon EC2. Monitor your system metrics (CPU, Memory, Disk I/O, Network I/O) using Amazon CloudWatch and take appropriate actions (launching new AMIs dynamically using the Auto-scaling service) or send notifications. Store and retrieve machine configuration information dynamically: Utilize Amazon SimpleDB to fetch config data during boot-time of an instance (eg. database connection strings). SimpleDB may also be used to st

Monitoring large-scale application clusters

Most software engineering organizations build applications with some hooks in place to allow functional tests. Some organizations continuously build and test all software automatically at check-in. And then there are those who have learnt from mistakes, and have built a suite of tests which get triggered at startup to look for problems which could indicate a failed initialization. The next step in building a scalable web application, is creating some form of self-monitoring logic (sometimes called a watchdog) which could periodically test itself (or monitor performance statistics) for problems worth escalating to operations team. Arnon has a couple of (1) interesting   (2) posts on the topic which I came across today. He summarized the whole suggestion using 3 acronymns. And I’m going to add one more to make it complete BBIT : Build time Built in Tests. PBIT: Power-on Built in Tests CBIT: Continuous Built in Tests IBIT: Initiated Built in Tests The organizat

Understanding Cloud computing efficiency

Picking a cloud service at times, unfortunately,  is far more complex  than picking up a brand new car. I remember how torn I was between a honda-hybrid, which came with some tax rebates and a carpool sticker and a non-hybrid one which was significantly cheaper. Understanding the short term and long term benefits is the key. Today AWS is not the only game in the town. There are lots of other reliable (or some flavor off) options. GoGrid , Joyent ,  Microsoft and GoogleAppEngine are some. Here are the key differences which one should understand before deciding which one to go for. * IAAS (Infrastructure as a service) providers like AWS (EC2) and Rackspace provide virtual infrastructure which you can manage and control. In most cases you are billed by a time-unit and you would have control to increase or decrease resources available for your application. PAAS (Platform as a service) on the other hand only provides APIs for your application. PAAS based infrastructure is usually b