January 31, 2010

The real concerns about Cloud infrastructure (as it is today)

While “private clouds may not be the future” they are definitely needed today. Here are some of the top issues bothering some organizations which have been thinking about going into the cloud. Some of issues were based on Craig Bolding’s talk on “Guide to cloud security”.cluod

  • Unlike your own data center, you will never know what the cloud vendors are running, or how they backup, or what their DR plans are. They will say you shouldn’t care, but do you remember what happened to the Tmobile customer’s on Danger ?
  • Uptime, availability and responsiveness is less predictable than in a self hosted environment. In most cases the cloud vendors may not even choose to let customers know about major maintenance if they don’t anticipate any issues. Organizations who manage their own infrastructure would always try to avoid doing two major changes which have interdependencies.
  • Multi-Tenancy means you may have to worry about a noisy neighbor.
  • Muti-Tenancy could also lead one to interesting issues which were never thought about before. What if there was a way to do an “injection attack”. Depending on how Multi-Tenancy is implemented, you could potentially touch other customers data.
  • Infrastructure and platform lock-in issues are worrying for many organizations who are thinking long term. Most cloud vendors don’t really have a long history to show their track record.
  • Change control and detailed change log is missing.
  • Individual customers don’t have much decision making power on what a vendor should do next. In a privately hosted environment the stake holders are asked before something is done, but in larger infrastructure, you are a small fish in a huge pond.
  • Most cloud vendors have multiple layers of cloud infrastructure dependent on each other. Its hard to understand how issues around one type of cloud could impact others. This is especially true from Security view point. A bad flaw in a lower layer of the architecture could impact all other platforms built over it.
  • Moving applications to cloud means dealing with a different style of programming designed for horizontal scalability, data consistency issues, health monitoring, load balancing, managing state, etc.
  • Identify management is still in early stages. Integration with corporate Identify management infrastructure would be important to make it easy for individuals from large organizations on external clouds.
  • Who takes care of scrubbing disks when data is moved around ? What about data on backup tapes ? This is very important in application handling highly sensitive data.
  • Just like credit card fraud, one has to worry about CPU time fraud. Is the current billing and reporting good enough to help large organizations figure out what is real and what could be fraud ? They need a real-time fraud detection mechanism. And what about loss of service due to DOS attacks ? Who pays for that ?
  • Need a better mechanism to bill large corporations.
  • On the non-technical side, there are a lot of questions related to SLAs, Compliance issues, Terms of services, Legal issues around cross border services, and even questions about whether law enforcement have a different set of rules when search and seizure is required.
  • Not too far from being another form of “outsourcing”.

Photo credit: akakumo

Fixing GSLB (Global Server load balancing)

Standard DNS protocol allows DNS servers to respond with multiple addresses in the replies for simple DNS lookup queries. This, and the way that the order of records is changed in every reply is collectively known as the “Round Robin DNS” technique to load balance across a set of servers.

Though a lot of organizations are using Round Robin DNS to load balance across servers in the same datacenter, some are also trying to use it as an HA solution by load balancing across multiple datacenters. In an event of a failure in one of the datacenter, using such an implementation, the impact could be limited, and with a slight change of DNS configuration (removing the IP of the datacenter which went down) the site could become fully operational again.

It would be nicer if the DNS servers could monitor and remove servers which are inactive or are throwing  errors of some kind. This is what GSLBs are all about. But what they really excel at, which regular DNS servers can’t do, is that they can figure out (in a slightly unscientific way) where a user is located geographically. This allows it to figure out which datacenter is closest to the end user. If a customer in Asia can get to a datacenter within Asia, instead of coming all the way to US, it could save the customer at least 200ms of latency which can significantly improve bandwidth and response from the website.

Though GSLBs, today, are very popular among the larger service providers there are some interesting drawbacks which can limit its usefulness. The core problem is that GSLBs use source IP within theGSLB_Architecture.PNG DNS request to figure out where the customer is located. This works beautifully if the customers laptop is sending these out directly, and in most cases will also work if the customer is using his/her ISP’s DNS server. Unfortunately if the customer uses some free public DNS service like the one google provides which recursively looks up the DNS records for the user, then GSLB would find datacenters which are closest to the DNS server requesting the information instead of the actual end user. A similar problem exists if the user is forced to use a DNS server over a VPN link. Read this post for a better understanding of this problem (Why DNS Based GSLB doesn’t work)

A few days ago Google came out with a solution to this problem which was announced here (A  proposal to extend DNS protocol). They don’t mention GSLB, but there is no doubt this will help solve the GSLB issue mentioned above. Unfortunately, I’m also sure that Google has other, more important reasons, to push for this change. They are interested in location information to “provide better services” (location-aware advertising).

DNS is the system that translates an easy-to-remember name like www.google.com to a numeric address like 74.125.45.104. These are the IP addresses that computers use to communicate with one another on the Internet.

By returning different addresses to requests coming from different places, DNS can be used to load balance traffic and send users to a nearby server. For example, if you look up www.google.com from a computer in New York, it may resolve to an IP address pointing to a server in New York City. If you look up www.google.com from the Netherlands, the result could be an IP address pointing to a server in the Netherlands. Sending you to a nearby server improves speed, latency, and network utilization.

Currently, to determine your location,
authoritative nameservers look at the source IP address of the incoming request, which is the IP address of your DNS resolver, rather than your IP address. This DNS resolver is often managed by your ISP or alternately is a third-party resolver like Google Public DNS. In most cases the resolver is close to its users, in which case the authoritative nameservers will be able to find the nearest server. However, some DNS resolvers serve many users over a wider area. In these cases, your lookup for www.google.com may return the IP address of a server several countries away from you. If the authoritative nameserver could detect where you were, a closer server might have been available.

Our proposed DNS protocol extension lets recursive DNS resolvers include part of your IP address in the request sent to authoritative nameservers. Only the first three octets, or top 24 bits, are sent providing enough information to the authoritative nameserver to determine your network location, without affecting your privacy.

Regard less, its a step in the right direction and would significantly help in making web applications more available.

References