February 27, 2010

Scalability links for Feb 28th 2010

February 26, 2010

Cassandra as a communication medium – A service Registry and Discovery tool

Few weeks ago while I was mulling over what kind of service registry/discovery system to use for a scalable application deployment platform, I realized that for mid-size organizations with complex set of services, building one from scratch may be the only option.

I also found out that many AWS/EC2 customers have already been using S3 and SimpleDB to  publish/discover services. That discussion eventually led me to investigate Cassandra as the service registry datastore in an enterprise network.

Here are some of the observations I made as I played with column_orientedCassandra for this purpose.  I welcome feedback from readers if you think I’m doing something wrong or if you think I can improve the design further.

  • The biggest issue I noticed with Cassandra was the absence of inverted index which could be worked around as I have blogged here. I later realized there is something called Lucandra  as well which I need to look at, at some point.
  • The keyspace structure I used was very simple… ( I skipped some configuration lines to keep it simple )

<Keyspace Name="devkeyspace">
    <ColumnFamily  CompareWith="UTF8Type" Name="forward"  />
    <ColumnFamily  CompareWith="UTF8Type" Name="reverse"  />

  • Using an “OrderPreservingPartitioner” seemed important to do “range scans”.  Order Preserving partitioner keeps objects with similar looking keys together to allow bulk reads and writes. By default Cassandra randomly distributes the objects across the cluster which works well if you only have a few nodes.
  • I eventually plan to use this application across two datacenters. The best way to mirror data across datacenters in Cassandra is by using “RackAwareStrategy”. If you select this option, it tells Cassandra to try to pick replicas of each token from different datacenters/racks. The default algorithm uses IP addresses to determine if two nodes are part of the same rack/datacenter, but there are other interesting ways to do it as well.
  • Some of the APIs changed significantly between the versions I was playing with. Cassandra developers will remind you that this is expected in a product which is still at 0.5 version. What amazes me, however, is the fact that Facebook, Digg and now Twitter have been using this product in production without bringing down everything.
  • I was eventually able to build a thin java webapp to front-end Cassandra, which provided the REST/json interface for registry/discovery service. This is also the app which managed the inverted indexes.
    • Direct Cassandra access from remote services was disabled for security/stability reasons.
    • The app used DNS to loadbalance queries across multiple servers.
  • My initial performance tests on this cluster performed miserably because I forgot that all of my requests were hitting the same node. The right way to tests Cassandra’s capacity is by loadbalancing requests across all Cassandra nodes.
    • Also realized, that by default, the logging mode was set to “DEBUG” which is very verbose. Shutting that down seemed to speed up response times as well.
  • Playing with different consistency levels for reading and writing was also an interesting experience, especially when I started killing nodes just to see the app break. This is what tweeking CAP is all about.
  • Due to an interesting problem related to “eventual consistency”, Cassandra doesn’t completely delete data which was marked deletion or was intentionally changed. In the default configuration that data is kept around for 10 days before its completely removed from the system.
  • Some documentation on the core operational aspects of Cassandra exist, but it would be nice if there were more.

Cassandra was designed as a scalable,highly available datastore. But because of its interesting self-healing and “RackAware” features, it can become an interesting communication medium as well.

Talk on “database scalability”

This is a very interesting talk by Jonathan Ellis on database scalability. He designed and implemented multi-petabyte storage for Mozy and is currently the project chair for Apache Cassandra.

  • Scalability is not improving latency, but increasing throughput
  • But overall performance shouldn’t degrade
  • Throw hardware, not people at the problem
  • Traditional databases use b-tree indexes. But requires the entire index to be in-memory at the same place.
  • Easy bandaid #1– SSD storage is better for b-tree indexes which need to hit disk
  • Easy bandaid #2 – Buy faster server every 2 years. As long as your userbase doesn’t grow faster that Moore’s law
  • Easy bandaid #3 – Use caching to handle hotspots (Distributed)
  • Memcache server failures can change where hashing keys are kept
  • Consistent hashing solves the problem by mapping keys to tokens. The tokens can move around to more or less server. Apps would be able to figure out which keys are where.

February 25, 2010

Scalable logging using Syslog

Syslog is a commonly used transport mechanism for system logs. But people sometimes forget it could be used for a lot of other purposes as well.

Take, for example, the interesting challenge of aggregating web server logs from 100 different servers into one server and syslogthen figuring out how to merge them. If you have built your own tool to do this, you would have figured out  by now how expensive it is to poll all the servers and how out-of-date these logs could get by the time you process it. If you are not inserting them into some kind of datastore which sorts the rows by timestamp, you now also have to take up the challenge of building merge-sort script.

There is nothing which stops applications from using syslog as well. If your apps are in Java, you should try out Syslog appender for log4j [Ref 1] [Ref 2]. Not only do you get central logging, you also get get to see real-time “tail -f” of events as they happen in a merged file. If there are issues anywhere in your network, you have just one place to look at. If your logging volume is high, you would have to use other tools (or build your own) to do log analysis.

Here are some things you might have to think about if you plan to use syslog for your environment.

  1. Setup different syslog servers for each of your datacenters using split DNS or by use different hostnames.
  2. Try not to send logs across WAN links
  3. Rotate logs on a nightly basis, or depending on the log volume
  4. Reduce amount of logging (don’t do “debug” in production for example)
  5. Write tools to detect change in logging volume in dev/qa environment. If you follow good logging practice, you should be able to identify components which are responsible for the increase very quickly.
  6. Identify log patterns which could be causes of concerns and setup some kind of alerting using your regular monitoring service (nagios for example). Don’t be afraid to use 3rd party tools which do this very well.
  7. Syslog over UDP is non-blocking, but the syslog server can overloaded if logging volume is not controlled. The most expensive part of logging is disk i/o. If you notice high i/o
  8. UDP doesn’t guarantee that every log event will make it to the syslog server. Find out if that level of uncertainty in logging is ok for your environment.

Other interesting observations

  1. The amount of changes required in a java app which is already using log4j to log to a syslog server is trivial
  2. Logging to local files can be disabled, which means you don’t have to worry about disk storage on each server..
  3. If you are using or want to use tools like splunk or hadoop/hbase for log analysis, syslog is probably the easiest way to get there.
  4. You can always loadbalance syslog servers by using DNS loadbalancing.
  5. Apache webservers can’t do syslog out of the box, but you can still make it happen
  6. I personally like haproxy more and it does do syslog out of the box.
  7. If you want to log events from startup/shutdown scripts, you can use the “logger” *nix command to send events to the syslog server.

How is log aggregated in your environment ?


February 24, 2010

SimpleDB now allows you to tweak consistency levels

We discussed Brewer’s Theorm a few days ago and how its challenging to obtain Consistency, Availability and Partition tolerance in any distributed system. We also discussed that many of the Amazon Web Servicesdistributed datastores allow CAP to be tweaked to attain certain operational goals.

Amazon SimpleDB, which was released as an “Eventually Consistent” datastore,  today launched a few features to do just that.

  • Consistent reads: Select and GetAttributes request now include an optional Boolean flag “ConsistentRead” which requests datastore to return consistent results only. If you have noticed scenarios where read right after a write returned an old value, it shouldn’t happen anymore.
  • Conditional put/puts, delete/deletes : By providing “conditions” in the form of a key/value pair SimpleDB can now conditionally execute/discard an operation. This might look like a minor feature, but can go a long way in providing reliable datastore operations.

Even though SimpleDB now enables operations that support a stronger consistency model, under the covers SimpleDB remains the same highly-scalable, highly-available, and highly durable structured data store. Even under extreme failure scenarios, such as complete datacenter failures, SimpleDB is architected to continue to operate reliably. However when one of these extreme failure conditions occurs it may be that the stronger consistency options are briefly not available while the software reorganizes itself to ensure that it can provide strong consistency. Under those conditions the default, eventually consistent read will remain available to use.


February 22, 2010

NoSQL in the Twitter world

NoSQL solutions have one thing in common. They are generally designed for horizontal scalability. So its no wonder that lot of applications in the “twitter” world have picked NoSQL based datastores for their Twitter.compersistence layer. Here is a collection of these apps from MyNoSQL blog.

  1. Twitter uses Cassandra
  2. MusicTweets used Redis [ Ref ] – The site is dead, but you can still read about it
  3. Tstore uses CouchDB
  4. Retwis uses CouchDB
  5. Retwis-RB uses Redis and Sinatra ??  - No idea what sinatra is. Will have to look into it. [ Update: Sinatra is not a DB store ]
  6. Floxee uses MongoDB
  7. Twidoop uses Hadoop
  8. Swordfish built on top of Tokyo Cabinet comes with a twitter clone app with it.
  9. Tweetarium uses Tokyo Cabinet


Do you know of any more ?

February 21, 2010

Eventual consistency is just caching ?

So there is someone who thinks “eventual consistency is just caching”.  Though I liked the idea of discussing this, I don’t agree with Udi’s views on this.

“Cache” is generally used to store data which is more expensive to obtain from the primary location. For example, caching mysql queries is ideal for queries which could take more than fraction of a second to execute. Another example is caching queries to S3, SimpleDB or Google’s datastore which could cost money and introduce network latency into the mix. Though most applications are built to use such caches, they are also designed to be responsive in absence of caching layer.

The most important difference between “cache” and a “datastore” is that the dataflow is generally from “datastore” to “cache” rather than the other way round. Though one could queue data on “cache” first and then update datastore later (for performance reasons) that is not the way one should use it. If you are using “cache” to queue data for slower storage, you are using the wrong product. There are better “queuing” solutions  (activemq for example) that can do it for you in a more reliable way.

In most “eventually consistent” systems, there is no concept of primary and secondary nodes. Most nodes on such systems are considered equal and have similar performance characteristics.

Since “caching” solutions are designed for speed, they generally don’t have a concept of “replicas” or allow persistence to disk. Synchronizing between replica’s or to a disk can be expensive and be counter productive which is why its rare to find them on “caching” products. But many “eventually consistent” systems do provide a way for developers to request the level of “consistency” (or disk persistence) desired.

Do you have an opinion on this ? Please share examples if you have seen “caching” layer being used as an “eventually consistent datastore”.

Update: Udi mentioned on twitter that “write through caches” are eventually consistent. Sure, they are if you are talking about a caching layer on top of a persistent layer. I think there is an argument which could be made that “caches” are eventually consistent, but the reverse may not be true which is what his original post mentioned.

February 17, 2010

Scalability updates for Feb 18, 2010

Some of interesting links for today

More on Amazon S3 versioning (webinar)

If you missed the AWS S3 versioning webcast, I have a copy of the video here. And here are the highlights..


  • You can enable and disable this at the bucket level
  • They don’t think there is a performance penalty of turning versioning (but it was kind of obvious S3 would be doing slightly extra work to figure out which is the latest version of any object you have)
  • There isn’t any additional cost for using versioning. But you have to pay for extra copy of each object.
  • MFA (multi factor authentication) to delete objects is not mandatory when versioning is turned on. It needs to be turned on. This was slightly confusing in the original email I got from AWS.
  • If you are planning to use this, please watch this video. There is a part where they explain what happens if you disable versioning after using the feature. This is something you might like to know about.
  • They use GUID for versioning of each object
  • You can iterate over objects and figure out how many versions you have for each object, but currently its not possible to find all objects which have versions older than X date. This is important if you are planning to garbage collection (cleaning up older copies of data) for a later time.

More References

February 14, 2010

Brewers CAP Theorem on distributed systems

Large distributed systems run into a problem which smaller systems don’t usually have to worry about. image“Brewers CAP Theorem” [ Ref 1] [ Ref 2] [ Ref 3]  defines this problem in a very simple way.

It states, that though its desirable to have Consistency, High-Availability and Partition-tolerance in every system, unfortunately no system can achieve all three at the same time.

Consistent: A fully Consistent system is one where the system can guarantee that once you store a state (lets say “x=y”) in the system, it will report the same state in every subsequent operation until the imagestate is explicitly changed by something outside the system. [Example 1] A single MySQL database instance is automatically fully consistent since there is only one node keeping the state. [Example 2] If two MySQL servers are involved, and if the system is designed in such a way that all keys starting “a” to “m” is kept on server 1, and keys “n” to “z” are kept on server 2”,  then system can still easily guarantee consistency.image Lets now setup the DBs as master-master replicas[Example 3] . If one of the database accepts get a “row insert” request, that information has to be committed to the second system before the operation is considered complete. To require 100% consistency in such a replicated environment, communication between nodes is paramount. The over all performance of such a system could drop as number of replica’s required goes up.

Available:  The database in [Example 1] or [Example 2] are not highly Available. In [Example 1] if the node goes down there would 100% data loss. In [Example 2] if one node goes down, you will have 50% data loss. [Example 3] is the only solution which solves that problem. A simple MySQL server replication [ Multi-master mode]  setup could provide 100% availability. Increasing the number of nodes with copies of the data directly increases the availability of the system. Availability is not just a protection from hardware failure. Replicas also help in loadbalancing concurrent operations, especially the read operations. “Slave” MySQL instances are a perfect example of such “replicas”.

Partition-tolerance: So you got Consistency and Availability by replicating data. Lets say you had these two MySQL servers in Example 3, in two different datacenters, and you loose the network connectivity between the two datacenters making both databases incapable of synchronizing state between the two. Would the two DBs be fully functional in such a scenario ? If you somehow do manage allow read/write operations on these two databases, it can be proved that the two servers won’t be consistent anymore. A banking application which keeps “state of your account” at all times is a perfect example where its bad to have inconsistent bank records. If I withdraw 1000 bucks from California, it should be instantly reflected in the NY branch so that the system accurately knows how much more I can withdraw at any given time. If the system fails to do this, it could potentially cause problems which could make some customers very unhappy. If the banks decide consistency is very important, and disable write operations during the outage, it will will loose “availability” of the cluster since all the bank accounts at both the branches will now be frozen until network comes up again.

This gets more interesting when you realize that C.A.P. rules don’t have to be applied in an “all or nothing” fashion. Different systems can choose various levels of consistency, availability or partition tolerance to meet their business objective. Increasing the number of replicas, for example, increases high availability but it could at the same time reduce partition tolerance or consistency.

When you are discussing distributed systems ( or distributed storage ), you should try to identify which of the three rules it is trying to achieve. BigTable, used by Google App engine, and HBase, which runs over Hadoop, claim to be always consistent and highly available. Amazon’s Dynamo, which is used by S3 service and datastores like Cassandra instead sacrifice consistency in favor of availability and partition tolerance.

CAP theorem doesn’t just apply to databases. Even simple web applications which store state in session objects have this problem. There are many “solutions” out there which allow you to replicate session objects and make sessions data “highly available”, but they all suffer from the same basic problem and its important for you to understand it.

Recognizing which of the “C.A.P.” rules your business really needs should be the first step in building any successful distributed, scalable, highly-available system.

February 10, 2010

Scaling updates for Feb 10, 2010

Lots of interesting updates today.

But would like to first mention the fantastic work Cloud computing group at UCSB are doing to make appengine framework more open. They have done significant work at making appscale “work” with different kinds of data sources including HBase, Cassandra, Voldemort, MongoDB, Hypertable and Mysql and MemcacheDB. Appscale is actively looking for folks interested in working with them to make this stable and production ready.

  • GAE 1.3.1 released: I think the biggest news about this release is the fact that 1000 row limit has now been removed. You still have to deal with the 30 second processing limit per http request, but at least the row limit is not there anymore. They have also introduced support for automatic transparent datastore api retries for most operations. This should dramatically increase reliability of datastore queries, and reduces the amount of work developers have to do to build this auto-retry logic.
  • Elastic search is a lucene based indexing product which seems to do what Solr used to do with the exception that it can now scale across multiple servers. Very interesting product. I’m going to try this out soon.
  • MemcacheDB: A distributed key-value store which is designed to be persistent. It uses memcached protocol, but its actually a datastore (using Berkley DB) rather than cache. 
  • Nasuni seems to have come up with NAS software which uses cloud storage as the persistent datastore. It has capability to cache data locally for faster access to frequently accessed data.
  • Guys at Flickr have two interesting posts you should glance over. “Using, Abusing and Scaling MySQL at Flickr” seems to be the first in a series of post about how flickr scales using Mysql. The next one in the series is “Ticket Servers: Distributed Unique Primary Keys on the Cheap”
    • Finally a fireside chat by Mike Schroepfer, VP of Engineering,  about Scaling Facebook.

February 08, 2010

Versioning data in S3 on AWS

One of the problem with Amazon’s S3 was the inability to take a “snapshot” of the state of S3 at anyAmazon Web Services given moment. This is one of the most important DR (disaster recovery) steps of any major upgrade which could potentially corrupt data during a release. Until now the applications using S3 would have had to manage versioning of data, but it seems Amazon has launched a versioning feature built into S3 itself to do this particular task. In addition to that, they have made it a requirement that delete operations on versioned data can only be done using MFA (Multi factor authentication).

Versioning allows you to preserve, retrieve, and restore every version of every object in an Amazon S3 bucket. Once you enable Versioning for a bucket, Amazon S3 preserves existing objects any time you perform a PUT, POST, COPY, or DELETE operation on them. By default, GET requests will retrieve the most recently written version. Older versions of an overwritten or deleted object can be retrieved by specifying a version in the request.

The way AWS Blog describes the feature, it looks like a version would be created every time an object is modified and each object in S3 could have different number of copies depending on the number of times it was modified.

This kind of reminds me of SVN/CVS like versioning control system and I wonder how long it will take for someone to build a source code versioning system on S3.

BTW, data requests to a versioned object is priced the same way as regular data, which basically means you are getting this feature for free.


February 06, 2010

Cassandra : inverted index

Cassandra is the only NOSQL datastore I’m aware of, which is scalable, distributed, self replicating, eventually consistent, schema-less key-value store running on java which doesn’t have a single point of failure. HBase could also match most of these requirements, but Cassandra is easier to manage due to its tiny footprint.

The one thing Cassandra doesn’t do today is indexing columns.

Lets take a specific example to explain the problem. Lets say there are 100 rows in the datastore which have 5 columns each. If you want to find the row which says “Service=app2”, you will have to iterate one row at a time which is like full database scan. In a 100 row datastore if only one row had that particular column, it could take on an average about 50 rows before you find your data.


While I’m sure there is a good reason why this doesn’t exist yet, the application inserting the data could build such an inverted index itself even today. Here is an example of how a table of inverted index would look like.


If you want to find the “status” of all rows where “Service=app2”, all you have to do is find the list of keys by making a single call to this table. The second call would be to get all the columns values for that row. Even if you have 100 different rows in a table, finding that one particular row, matching your search query, could  now be done in two calls.

Of course there is a penalty you have to pay. Every time you insert one row of data, you would also have to insert multiple rows to build the inverted index. You would also have to update the inverted index yourself if any of the column values are updated or deleted. Cassandra 0.5.0 which was recently released has been benchmarked to insert about 10000 rows per second on a 4 core server with 2GB of RAM. If you have an average of 5 columns per row, that is about 1.5k actual row inserts per second (that includes 5 rows of inserts/updates required for an inverted index). For more throughput you always have an option to add more servers.

Facebook and Digg are both extensively using Cassandra in their architectures. Here are some interesting reading materials on Cassandra if you’d like to explore more.

[Updated: Discussion on Google Buzz ]

February 05, 2010

DealNews: Scaling for Traffic Spikes

Last year Dealnews.com unexpectedly got listed dealnews.comon the front page of yahoo.com for a couple of hours. No matter how optimistic one is, unexpected events like these can take down a regular website with almost no effort at all. What is your plan if you get slashdotted ? Are you ok with a short outage ? What is the acceptable level of service for your website anyway.

One way to handle such unexpected traffic is having multiple layers of cache. Database query cache is one, generating and caching dynamic content is another way (may be using a cronjob). Tools like memcached, varnish, squid can all help to reduce the load on application servers.

Proxy servers ( or webservers ) in front of application servers play a special role in dealnews. They understood the limitations of application servers they were using, and the fact that slow client connections means longer lasting tcp sessions to the application servers. Proxy servers, like varnish, could off-load that job and take care of content delivery without keeping application servers busy. In addition Varnish also acts as a content caching service which further reduces load on the application servers.

Dealnews’ content is extremely dynamic because of which the it uses a very low TTL of 5 minutes for most of its pages. It may not look a lot but at thousands of pages per second, such a cache can do miracles. While caching is great, the one thing every loaded website has to go through is figure out how to avoid the “cache stampede” when the TTL expires. “Cache stampede” is what happens when 100s of request requesting the same resource hit the server at the same time forcing the webserver to forward all 100 request to the app server and the database server because the caches were not good.

Dealnews solves this problem by separating content generation from content delivery. There is a process which they run which converts data from more than 300 tables of normalized data, into 30 tables with highly redundant de-normalized data. This data is kept in such a way that the application servers are required to make queries  using primary keys or unique keys only. With such a design a cluster of Mysql DB servers shouldn’t have any problem handling 1000s of queries per second from the front end application servers.

Twitter drives a lot of traffic and since a lot of that data is redundant, it heavily relies on caches. Its actually so much that the site could completely go down if a few memcached servers go down. Dealnews explicitly tested their application with the memcached servers disabled to see what the worst case scenario was for reinitializing cache. They then optimized their app to the point where the response time only doubled from about 0.75 seconds to 1.5 second per page without memcached servers.

Handling 3rd party content could be tricky. Dealnews treats 3rd party content as lower class citizens. They not only load 3rd party at the very end of the page, they also try to use iframes wherever possible to keep loading of those objects from loading of dealnews.com

If you are interested in the video recording or the slides from the talk, click on the following links.


February 04, 2010

Scaling deployments

Most of the newer, successful, web startups have one thing in common. They release smaller changes more often. Being in operations, I am often surprised how these organizations manage such a feat without breaking their website. Here are some notes from someone in flickr about how they do it. The two most important part of this talk is the observation that Dev, Qa and Operations teams have to slightly blend into each other to achieve deployments at such a velocity, and the fact that they are not afraid to break the website by deploying code from trunk.

  • Don’t be afraid to do releases Flickr logo. If you click it, you'll go home
  • Automate infrastructure (hardware/OS and app deployment)
  • Share version control system
  • Enable one step build
  • Enable one step build and deploy
  • Do small frequent changes
  • Use feature flags (branching without source code branching)
  • Always ship trunk
  • Do private betas
  • Share metrics
  • Provide applications ability to talk back (IM/IRC)   - build logs, deploy logs, alert monitors. real time traffic can all go here
  • Share runbooks and escalation plans
  • Prepare/Plan for failures
  • Do firedrills
  • No fingerpointing…

February 03, 2010

Scaling PHP : HipHop and Quercus

While PHP is very popular, it unfortunately doesn't perform as some of its competitors. One of the ways to make things faster is to write PHP Extensions in C++. In this post we will describe two different ways developers can solve this problem and the milage you might get from either model may vary.

Since Facebook is mostly running PHP, it noticed this problem pretty early, but instead of asking its developers to move from PHP to C++ one of their developers hacked up a solution to transform PHP code into C++.

Yesterday, Facebook announced they are opening up HipHop, a source code transformer, which changes PHP code into a more optimized C++ code and uses g++ to compile it. With some minor sacrifices (no eval support)  they noticed they were able to get 50% performance improvement. And since they serve 400 billion page views every month, that kind of saving can free up a lot of servers.

More info on HipHop

Quercus on the other hand is a 100% java implementation of PHP 5. What makes this more interesting is that Quercus can now run in Google App Engine pretty much the same way JSPs can.

Caucho’s Quercus presents a new mixed Java/PHP approach to web applications and services where Java  Caucho Technologyand PHP tightly integrate with each other. PHP applications can choose to use Java libraries and technologies like JMS, EJB, SOA frameworks, Hibernate, and Spring. This revolutionary capability is made possible because 1) PHP code is interpreted/compiled into Java and 2) Quercus and its libraries are written entirely in Java. This architecture allows PHP applications and Java libraries to talk directly with one another at the program level. To facilitate this new Java/PHP architecture, Quercus provides and API and interface to expose Java libraries to PHP.

The demo of Quercus running on GAE was very impressive. Any pure PHP code which doesn’t need to interact with external services would work beautifully without any issues on GAE. But the absence of Mysql in GAE means SQL queries have to be mapped to datastore (bigtable) which might require a major rewrite to parts of the application. But its not impossible, as they have shown by making wordpress run on GAE (crawl might be a better word though).

While Quercus is opensource and is as fast as regular PHP code in interpreted mode, the compiler which is way faster is not free. Regardless Quercus is a step in the right direction, and I sincerely hope PHP support on GAE is here to stay.

February 02, 2010

Cloud : Agility vs Security

Networking devices on the edges have become smarter over time. So have the firewalls and switches used internally within the networks. Whether we like it or not, web applications over time have grown to depend on them.

Its impossible to build a flawless product because of which its standard practice to disable all unused services on a server. Most organizations today try to follow the n-tier approach to create different logical security zones with the core asset inside the most secure zone. The objective is to make it difficult for an attacker to get to the core asset without breaching multiple sets of firewalls.

Doing frequent system patches, auditing file system permissions and setting up intrusion detection (host or network based)  are some of the other mundane ways of keeping web applications safe from attacks.

Though cloud has made deployment of on-demand infrastructure simpler, its hard to build a walled garden around customers cluster of servers on the cloud in an efficient way anymore. And the absence of such walled gardens and logical security zones means there are more points of entry into the infrastructure which could be exploited. If you replace 10 powerful internal servers with 100 small servers on the cloud, all of a sudden you might have to worry about protecting 100 individual servers instead of protecting a couple of edge devices. In a worst case scenario, one week server in the cluster could expose the entire cluster to an attacker. Here are a few other things to think about...

  • Host based firewalls should allow only traffic which are required/expected
  • Non-essential services should be shut off on the server
  • Some kind of Intrusion detection might be important to have
  • Keys/passwords should be changed periodically
  • System patches (update OS image) need to be applied periodically
  • Authenticate/Authorize all inter-server communication
  • Maintain audit trail for all changes to images/servers if possible

An organization which is completely on the cloud may not have an IT department in its current form, but it might still have an operations team which makes the security policies,  updates OS images, manages billing, monitors system health (and IDS) and trains developers to do the things in the right way.

If your infrastructure is on the cloud, do write back with a note about what you do to protect your applications.

Image source: AMagill

February 01, 2010

Evaluating Cloud Computing


Windows Azure

Windows Azure is an application platform provided by Microsoft to allow others to run applications on Microsoft’s “cloud” infrastructure. Its finally open for business (as of Feb 1, 2010). Below are some links about Azure for those who are still catching up.Windows Azure logo.jpg

Wikipedia: Windows Azure has three core components: Compute, Storage and Fabric. As the names suggest, Compute provides computation environment with Web Role and Worker Role while Storage focuses on providing scalable storage (Blobs, Tables, Queue) for large scale needs.

The hosting environment of Windows Azure is called the Fabric Controller - which pools individual systems into a network that automatically manages resources, load balancing, geo-replication and application lifecycle without requiring the hosted apps to explicitly deal with those requirements.[3] In addition, it also provides other services that most applications require — such as the Windows Azure Storage Service that provides applications with the capability to store unstructured data such as binary large objects, queues and non-relational tables.[3] Applications can also use other services that are a part of the Azure Services Platform.