Scalable web architectures

Building reliable, high performance, highly available clusters

Cassandra : inverted index

Tags: , , , , , ,

Cassandra is the only NOSQL datastore I’m aware of, which is scalable, distributed, self replicating, eventually consistent, schema-less key-value store running on java which doesn’t have a single point of failure. HBase could also match most of these requirements, but Cassandra is easier to manage due to its tiny footprint.

The one thing Cassandra doesn’t do today is indexing columns.

Lets take a specific example to explain the problem. Lets say there are 100 rows in the datastore which have 5 columns each. If you want to find the row which says “Service=app2”, you will have to iterate one row at a time which is like full database scan. In a 100 row datastore if only one row had that particular column, it could take on an average about 50 rows before you find your data.

image

While I’m sure there is a good reason why this doesn’t exist yet, the application inserting the data could build such an inverted index itself even today. Here is an example of how a table of inverted index would look like.

image

If you want to find the “status” of all rows where “Service=app2”, all you have to do is find the list of keys by making a single call to this table. The second call would be to get all the columns values for that row. Even if you have 100 different rows in a table, finding that one particular row, matching your search query, could  now be done in two calls.

Of course there is a penalty you have to pay. Every time you insert one row of data, you would also have to insert multiple rows to build the inverted index. You would also have to update the inverted index yourself if any of the column values are updated or deleted. Cassandra 0.5.0 which was recently released has been benchmarked to insert about 10000 rows per second on a 4 core server with 2GB of RAM. If you have an average of 5 columns per row, that is about 1.5k actual row inserts per second (that includes 5 rows of inserts/updates required for an inverted index). For more throughput you always have an option to add more servers.

Facebook and Digg are both extensively using Cassandra in their architectures. Here are some interesting reading materials on Cassandra if you’d like to explore more.

DealNews: Scaling for Traffic Spikes

Tags: , , ,

Last year Dealnews.com unexpectedly got listed dealnews.comon the front page of yahoo.com for a couple of hours. No matter how optimistic one is, unexpected events like these can take down a regular website with almost no effort at all. What is your plan if you get slashdotted ? Are you ok with a short outage ? What is the acceptable level of service for your website anyway.

One way to handle such unexpected traffic is having multiple layers of cache. Database query cache is one, generating and caching dynamic content is another way (may be using a cronjob). Tools like memcached, varnish, squid can all help to reduce the load on application servers.

Proxy servers ( or webservers ) in front of application servers play a special role in dealnews. They understood the limitations of application servers they were using, and the fact that slow client connections means longer lasting tcp sessions to the application servers. Proxy servers, like varnish, could off-load that job and take care of content delivery without keeping application servers busy. In addition Varnish also acts as a content caching service which further reduces load on the application servers.

Dealnews’ content is extremely dynamic because of which the it uses a very low TTL of 5 minutes for most of its pages. It may not look a lot but at thousands of pages per second, such a cache can do miracles. While caching is great, the one thing every loaded website has to go through is figure out how to avoid the “cache stampede” when the TTL expires. “Cache stampede” is what happens when 100s of request requesting the same resource hit the server at the same time forcing the webserver to forward all 100 request to the app server and the database server because the caches were not good.

Dealnews solves this problem by separating content generation from content delivery. There is a process which they run which converts data from more than 300 tables of normalized data, into 30 tables with highly redundant de-normalized data. This data is kept in such a way that the application servers are required to make queries  using primary keys or unique keys only. With such a design a cluster of Mysql DB servers shouldn’t have any problem handling 1000s of queries per second from the front end application servers.

Twitter drives a lot of traffic and since a lot of that data is redundant, it heavily relies on caches. Its actually so much that the site could completely go down if a few memcached servers go down. Dealnews explicitly tested their application with the memcached servers disabled to see what the worst case scenario was for reinitializing cache. They then optimized their app to the point where the response time only doubled from about 0.75 seconds to 1.5 second per page without memcached servers.

Handling 3rd party content could be tricky. Dealnews treats 3rd party content as lower class citizens. They not only load 3rd party at the very end of the page, they also try to use iframes wherever possible to keep loading of those objects from loading of dealnews.com

If you are interested in the video recording or the slides from the talk, click on the following links.

References

Scaling deployments

Tags: , ,

Most of the newer, successful, web startups have one thing in common. They release smaller changes more often. Being in operations, I am often surprised how these organizations manage such a feat without breaking their website. Here are some notes from someone in flickr about how they do it. The two most important part of this talk is the observation that Dev, Qa and Operations teams have to slightly blend into each other to achieve deployments at such a velocity, and the fact that they are not afraid to break the website by deploying code from trunk.

  • Don’t be afraid to do releases Flickr logo. If you click it, you'll go home
  • Automate infrastructure (hardware/OS and app deployment)
  • Share version control system
  • Enable one step build
  • Enable one step build and deploy
  • Do small frequent changes
  • Use feature flags (branching without source code branching)
  • Always ship trunk
  • Do private betas
  • Share metrics
  • Provide applications ability to talk back (IM/IRC)   – build logs, deploy logs, alert monitors. real time traffic can all go here
  • Share runbooks and escalation plans
  • Prepare/Plan for failures
  • Do firedrills
  • No fingerpointing…

Scaling PHP : HipHop and Quercus

Tags: , , ,

While PHP is very popular, it unfortunately doesn’t perform as some of its competitors. One of the ways to make things faster is to write PHP Extensions in C++. In this post we will describe two different ways developers can solve this problem and the milage you might get from either model may vary.

Since Facebook is mostly running PHP, it noticed this problem pretty early, but instead of asking its developers to move from PHP to C++ one of their developers hacked up a solution to transform PHP code into C++.

Yesterday, Facebook announced they are opening up HipHop, a source code transformer, which changes PHP code into a more optimized C++ code and uses g++ to compile it. With some minor sacrifices (no eval support)  they noticed they were able to get 50% performance improvement. And since they serve 400 billion page views every month, that kind of saving can free up a lot of servers.

More info on HipHop

Quercus on the other hand is a 100% java implementation of PHP 5. What makes this more interesting is that Quercus can now run in Google App Engine pretty much the same way JSPs can.

Caucho’s Quercus presents a new mixed Java/PHP approach to web applications and services where Java  Caucho Technologyand PHP tightly integrate with each other. PHP applications can choose to use Java libraries and technologies like JMS, EJB, SOA frameworks, Hibernate, and Spring. This revolutionary capability is made possible because 1) PHP code is interpreted/compiled into Java and 2) Quercus and its libraries are written entirely in Java. This architecture allows PHP applications and Java libraries to talk directly with one another at the program level. To facilitate this new Java/PHP architecture, Quercus provides and API and interface to expose Java libraries to PHP.

The demo of Quercus running on GAE was very impressive. Any pure PHP code which doesn’t need to interact with external services would work beautifully without any issues on GAE. But the absence of Mysql in GAE means SQL queries have to be mapped to datastore (bigtable) which might require a major rewrite to parts of the application. But its not impossible, as they have shown by making wordpress run on GAE (crawl might be a better word though).

While Quercus is opensource and is as fast as regular PHP code in interpreted mode, the compiler which is way faster is not free. Regardless Quercus is a step in the right direction, and I sincerely hope PHP support on GAE is here to stay.

Cloud : Agility vs Security

Tags: , ,

Networking devices on the edges have become smarter over time. So have the firewalls and switches used internally within the networks. Whether we like it or not, web applications over time have grown to depend on them.

Its impossible to build a flawless product because of which its standard practice to disable all unused services on a server. Most organizations today try to follow the n-tier approach to create different logical security zones with the core asset inside the most secure zone. The objective is to make it difficult for an attacker to get to the core asset without breaching multiple sets of firewalls.

Doing frequent system patches, auditing file system permissions and setting up intrusion detection (host or network based)  are some of the other mundane ways of keeping web applications safe from attacks.

Though cloud has made deployment of on-demand infrastructure simpler, its hard to build a walled garden around customers cluster of servers on the cloud in an efficient way anymore. And the absence of such walled gardens and logical security zones means there are more points of entry into the infrastructure which could be exploited. If you replace 10 powerful internal servers with 100 small servers on the cloud, all of a sudden you might have to worry about protecting 100 individual servers instead of protecting a couple of edge devices. In a worst case scenario, one week server in the cluster could expose the entire cluster to an attacker. Here are a few other things to think about…

  • Host based firewalls should allow only traffic which are required/expected
  • Non-essential services should be shut off on the server
  • Some kind of Intrusion detection might be important to have
  • Keys/passwords should be changed periodically
  • System patches (update OS image) need to be applied periodically
  • Authenticate/Authorize all inter-server communication
  • Maintain audit trail for all changes to images/servers if possible

An organization which is completely on the cloud may not have an IT department in its current form, but it might still have an operations team which makes the security policies,  updates OS images, manages billing, monitors system health (and IDS) and trains developers to do the things in the right way.

If your infrastructure is on the cloud, do write back with a note about what you do to protect your applications.

Image source: AMagill

Evaluating Cloud Computing

Tags: ,

Dilbert.com

Windows Azure

Tags:

Windows Azure is an application platform provided by Microsoft to allow others to run applications on Microsoft’s “cloud” infrastructure. Its finally open for business (as of Feb 1, 2010). Below are some links about Azure for those who are still catching up.Windows Azure logo.jpg

Wikipedia: Windows Azure has three core components: Compute, Storage and Fabric. As the names suggest, Compute provides computation environment with Web Role and Worker Role while Storage focuses on providing scalable storage (Blobs, Tables, Queue) for large scale needs.

The hosting environment of Windows Azure is called the Fabric Controller – which pools individual systems into a network that automatically manages resources, load balancing, geo-replication and application lifecycle without requiring the hosted apps to explicitly deal with those requirements.[3] In addition, it also provides other services that most applications require — such as the Windows Azure Storage Service that provides applications with the capability to store unstructured data such as binary large objects, queues and non-relational tables.[3] Applications can also use other services that are a part of the Azure Services Platform.

The real concerns about Cloud infrastructure (as it is today)

Tags: , , ,

While “private clouds may not be the future” they are definitely needed today. Here are some of the top issues bothering some organizations which have been thinking about going into the cloud. Some of issues were based on Craig Bolding’s talk on “Guide to cloud security”.cluod

  • Unlike your own data center, you will never know what the cloud vendors are running, or how they backup, or what their DR plans are. They will say you shouldn’t care, but do you remember what happened to the Tmobile customer’s on Danger ?
  • Uptime, availability and responsiveness is less predictable than in a self hosted environment. In most cases the cloud vendors may not even choose to let customers know about major maintenance if they don’t anticipate any issues. Organizations who manage their own infrastructure would always try to avoid doing two major changes which have interdependencies.
  • Multi-Tenancy means you may have to worry about a noisy neighbor.
  • Muti-Tenancy could also lead one to interesting issues which were never thought about before. What if there was a way to do an “injection attack”. Depending on how Multi-Tenancy is implemented, you could potentially touch other customers data.
  • Infrastructure and platform lock-in issues are worrying for many organizations who are thinking long term. Most cloud vendors don’t really have a long history to show their track record.
  • Change control and detailed change log is missing.
  • Individual customers don’t have much decision making power on what a vendor should do next. In a privately hosted environment the stake holders are asked before something is done, but in larger infrastructure, you are a small fish in a huge pond.
  • Most cloud vendors have multiple layers of cloud infrastructure dependent on each other. Its hard to understand how issues around one type of cloud could impact others. This is especially true from Security view point. A bad flaw in a lower layer of the architecture could impact all other platforms built over it.
  • Moving applications to cloud means dealing with a different style of programming designed for horizontal scalability, data consistency issues, health monitoring, load balancing, managing state, etc.
  • Identify management is still in early stages. Integration with corporate Identify management infrastructure would be important to make it easy for individuals from large organizations on external clouds.
  • Who takes care of scrubbing disks when data is moved around ? What about data on backup tapes ? This is very important in application handling highly sensitive data.
  • Just like credit card fraud, one has to worry about CPU time fraud. Is the current billing and reporting good enough to help large organizations figure out what is real and what could be fraud ? They need a real-time fraud detection mechanism. And what about loss of service due to DOS attacks ? Who pays for that ?
  • Need a better mechanism to bill large corporations.
  • On the non-technical side, there are a lot of questions related to SLAs, Compliance issues, Terms of services, Legal issues around cross border services, and even questions about whether law enforcement have a different set of rules when search and seizure is required.
  • Not too far from being another form of “outsourcing”.

Photo credit: akakumo

Fixing GSLB (Global Server load balancing)

Tags: , , ,

Standard DNS protocol allows DNS servers to respond with multiple addresses in the replies for simple DNS lookup queries. This, and the way that the order of records is changed in every reply is collectively known as the “Round Robin DNS” technique to load balance across a set of servers.

Though a lot of organizations are using Round Robin DNS to load balance across servers in the same datacenter, some are also trying to use it as an HA solution by load balancing across multiple datacenters. In an event of a failure in one of the datacenter, using such an implementation, the impact could be limited, and with a slight change of DNS configuration (removing the IP of the datacenter which went down) the site could become fully operational again.

It would be nicer if the DNS servers could monitor and remove servers which are inactive or are throwing  errors of some kind. This is what GSLBs are all about. But what they really excel at, which regular DNS servers can’t do, is that they can figure out (in a slightly unscientific way) where a user is located geographically. This allows it to figure out which datacenter is closest to the end user. If a customer in Asia can get to a datacenter within Asia, instead of coming all the way to US, it could save the customer at least 200ms of latency which can significantly improve bandwidth and response from the website.

Though GSLBs, today, are very popular among the larger service providers there are some interesting drawbacks which can limit its usefulness. The core problem is that GSLBs use source IP within theGSLB_Architecture.PNG DNS request to figure out where the customer is located. This works beautifully if the customers laptop is sending these out directly, and in most cases will also work if the customer is using his/her ISP’s DNS server. Unfortunately if the customer uses some free public DNS service like the one google provides which recursively looks up the DNS records for the user, then GSLB would find datacenters which are closest to the DNS server requesting the information instead of the actual end user. A similar problem exists if the user is forced to use a DNS server over a VPN link. Read this post for a better understanding of this problem (Why DNS Based GSLB doesn’t work)

A few days ago Google came out with a solution to this problem which was announced here (A  proposal to extend DNS protocol). They don’t mention GSLB, but there is no doubt this will help solve the GSLB issue mentioned above. Unfortunately, I’m also sure that Google has other, more important reasons, to push for this change. They are interested in location information to “provide better services” (location-aware advertising).

DNS is the system that translates an easy-to-remember name like www.google.com to a numeric address like 74.125.45.104. These are the IP addresses that computers use to communicate with one another on the Internet.

By returning different addresses to requests coming from different places, DNS can be used to load balance traffic and send users to a nearby server. For example, if you look up www.google.com from a computer in New York, it may resolve to an IP address pointing to a server in New York City. If you look up www.google.com from the Netherlands, the result could be an IP address pointing to a server in the Netherlands. Sending you to a nearby server improves speed, latency, and network utilization.

Currently, to determine your location, authoritative nameservers look at the source IP address of the incoming request, which is the IP address of your DNS resolver, rather than your IP address. This DNS resolver is often managed by your ISP or alternately is a third-party resolver like Google Public DNS. In most cases the resolver is close to its users, in which case the authoritative nameservers will be able to find the nearest server. However, some DNS resolvers serve many users over a wider area. In these cases, your lookup for www.google.com may return the IP address of a server several countries away from you. If the authoritative nameserver could detect where you were, a closer server might have been available.

Our proposed DNS protocol extension lets recursive DNS resolvers include part of your IP address in the request sent to authoritative nameservers. Only the first three octets, or top 24 bits, are sent providing enough information to the authoritative nameserver to determine your network location, without affecting your privacy.

Regard less, its a step in the right direction and would significantly help in making web applications more available.

References

Cloud computing in 1963 ( actually Timesharing )

Tags:

Found this on Feld Thoughts. Its not really about cloud computing, but they are interested in making efficient use of computational resources, which is one of the goals of today’s “cloud computing”  as well.

This magnificent video is from 1963 Timesharing: A Solution to Computer Bottlenecks where MIT Professor Fernando Corbato explains how timesharing works to MIT Science Reporter John Fitch (who has one of those magnificent deep reporter voices).

Get Adobe Flash playerPlugin by wpburn.com wordpress themes

© 2009 Scalable web architectures. All Rights Reserved.

This blog is powered by Wordpress and Magatheme by Bryan Helmig.