REST APIs for cloud management and the Database.com launch

I found the top two stories on scalebig last night to be interesting enough for me to dig a little deeper. The one which surprised me the most was William Vambenepe’s post about why he thinks that REST APIs doesn’t matter in context of cloud management. While REST might be ideal for many different things, including web based applications which are accessed mostly by the browsers, Amazon chose to avoid REST for most of its infrastructure management APIs.

Has this lack of REStfulness stopped anyone from using it? Has it limited the scale of systems deployed on AWS? Does it limit the flexibility of the Cloud offering and somehow force people to consume more resources than they need? Has it made the Amazon Cloud less secure? Has it restricted the scope of platforms and languages from which the API can be invoked? Does it require more experienced engineers than competing solutions?

I don’t see any sign that the answer is “yes” to any of these questions. Considering the scale of the service, it would be a multi-million dollars blunder if indeed one of them had a positive answer.

Here’s a rule of thumb. If most invocations of your API come via libraries for object-oriented languages that more or less map each HTTP request to a method call, it probably doesn’t matter very much how RESTful your API is.

The Rackspace people are technically right when they point out the benefits of their API compared to Amazon’s. But it’s a rounding error compared to the innovation, pragmatism and frequency of iteration that distinguishes the services provided by Amazon. It’s the content that matters.

And the other big news was of course the launch of a new cloud datastore by salesforce at Database.com. database.comInterestingly, you should notice, that they decided to brand it with its own website instead of making it part of its existing set of services. Its possible they did it to distance this new service from an impression that its only useful for applications which need other salesforce services. For more in-depth technical information continue reading here.

The infrastructure promises automatic tuning, upgrades, backups and replication to remote data centers, and automatic creation of sandboxes for development, test and training. Database.com offers enterprise search services, allowing developers to access a full-text search engine that respects enterprise security rules

In terms of pricing, Database.com access will be free for 3 users, and up to 100,000 records and 50,000 transactions per month. The platform will $10 per month for each set of 100,000 records beyond that and another $10 per month for each set of 150,000 transactions beyond that benchmark. The enterprise-level services will be an additional $10 per user per month and will include user identity, authentication and row-level security access controls.

Other references: Dreamforce 2010 – Database.com Launch Interview with Eric Stahl Read more

Cloud economics: Not really black and white..

While some of the interest in moving towards public cloud is based on sound economics, there is a small segment of this movement purely due to the “herd mentality”.

The slide on the right is from a Microsoft publication shows that larger networks may be less economical on the cloud (at least today).

Richard Farley, has been discussing this very topic for few months now. He observed that a medium sized organization which already has a decent IT infrastructure including a dedicated IT staff to support it has a significantly smaller overhead than what cloud vendors might make it look like.

Here is a small snippet from his blog. If you are not afraid to get dirty with numbers read the rest here.

Now, we know we need 300 virtual servers, each of which consumes 12.5% of a physical host.  This means we need a total of 37.5 physical hosts.  Our vendor tells us these servers can be had for $7k each including tax and delivery with the cabinet.  We can’t buy a half server, and want to have an extra server on hand in case one breaks.  This brings our total to 39 at a cost of $273k.  Adding in the cost of the cabinet, we’re up to $300k.

There are several non-capital costs we now have to factor in.  Your vendor will provide warranty, support and on-site hardware replacement service for the cabinet and servers for $15k per year.  Figure you will need to allocate around 5% of the time of one of your sys admins to deal with hardware issues (i.e., coordinating repairs with the vendor) at a cost of around $8k per year in salary and benefits.  Figure power and cooling for the cabinet will also cost $12k per year.  In total, your non-capital yearly costs add up to $35k.

One thing which posts doesn’t clearly articulate, is the fact that while long term infrastructure is cheaper to host in private cloud, its may still be more economical to use public cloud for short term resource intensive projects.

James Hamilton: Data center infrastructure innovation

Summary from James’ keynote talk at Velocity 2010 James Hamilton

  • Pace of Innovation – Datacenter pace of innovation is increasing.  The high focus on infrastructure innovation is driving down the cost, increasing reliability and reducing resource consumption which ultimate drives down cost.
  • Where does the money go ?
    • 54% on servers, 8% on networking, 21% on power distribution, 13% on power, 5% on other infrastructure requirements
    • 34% costs related to power
    • Cost of power is trending up
  • Clouds efficiency – server utilization in our industry is around 10 to 15% range
    • Avoid holes in the infrastructure use
    • Break jobs into smaller chunks, queue them where ever possible
  • Power distribution – 11 to 12% lost in distribution
    • Rules to minimize power distribution losses
      • Oversell power – setup more servers than power available. 100% of servers never required in a regular datacenter.
      • Avoid voltage conversions
      • Increase efficiency of conversions
      • High voltage as close to load as possible
      • Size voltage regulators to load and use efficient parts
      • High voltage direct current a small potential gain
  • Mechanical Systems – One of the biggest saving is in cooling
    • What parts are involved ? – Cooling tower, heat exchanges, pumps, evaporators, compressors, condensers, pumps… and so on.
    • Efficiency of these systems and power required to get this done depends on the difference in the desired temperature and the current room temperature
    • Separate hot and cold isles… insulate them (don’t break the fire codes)
    • Increase the operating temperature of servers
      • Most are between 61 and 84
      • Telco standard is 104F (Game consoles are even higher)
  • Temperature
    • Limiting factors to high temp operation
      • Higher fan power trade-off
      • More semiconductor leakage current
      • Possible negative failure rate impact
    • Avoid direct expansion cooling entirely
      • Air side economization 
      • Higher data center temperature
      • Evaporative cooling
    • Requires filtration
      • Particulate and chemical pollution
  • Networking gear
    • Current networks are over-subscribed
      • Forces workload placement restrictions
      • Goal: all points in datacenter equidistant.
    • Mainframe model goes commodity
      • Competition at each layer rather than vertical integration
    • Openflow: open S/W platform
      • Distributed control plane to central control

Cloud : Agility vs Security

Networking devices on the edges have become smarter over time. So have the firewalls and switches used internally within the networks. Whether we like it or not, web applications over time have grown to depend on them.

Its impossible to build a flawless product because of which its standard practice to disable all unused services on a server. Most organizations today try to follow the n-tier approach to create different logical security zones with the core asset inside the most secure zone. The objective is to make it difficult for an attacker to get to the core asset without breaching multiple sets of firewalls.

Doing frequent system patches, auditing file system permissions and setting up intrusion detection (host or network based)  are some of the other mundane ways of keeping web applications safe from attacks.

Though cloud has made deployment of on-demand infrastructure simpler, its hard to build a walled garden around customers cluster of servers on the cloud in an efficient way anymore. And the absence of such walled gardens and logical security zones means there are more points of entry into the infrastructure which could be exploited. If you replace 10 powerful internal servers with 100 small servers on the cloud, all of a sudden you might have to worry about protecting 100 individual servers instead of protecting a couple of edge devices. In a worst case scenario, one week server in the cluster could expose the entire cluster to an attacker. Here are a few other things to think about…

  • Host based firewalls should allow only traffic which are required/expected
  • Non-essential services should be shut off on the server
  • Some kind of Intrusion detection might be important to have
  • Keys/passwords should be changed periodically
  • System patches (update OS image) need to be applied periodically
  • Authenticate/Authorize all inter-server communication
  • Maintain audit trail for all changes to images/servers if possible

An organization which is completely on the cloud may not have an IT department in its current form, but it might still have an operations team which makes the security policies,  updates OS images, manages billing, monitors system health (and IDS) and trains developers to do the things in the right way.

If your infrastructure is on the cloud, do write back with a note about what you do to protect your applications.

Image source: AMagill

The real concerns about Cloud infrastructure (as it is today)

While “private clouds may not be the future” they are definitely needed today. Here are some of the top issues bothering some organizations which have been thinking about going into the cloud. Some of issues were based on Craig Bolding’s talk on “Guide to cloud security”.cluod

  • Unlike your own data center, you will never know what the cloud vendors are running, or how they backup, or what their DR plans are. They will say you shouldn’t care, but do you remember what happened to the Tmobile customer’s on Danger ?
  • Uptime, availability and responsiveness is less predictable than in a self hosted environment. In most cases the cloud vendors may not even choose to let customers know about major maintenance if they don’t anticipate any issues. Organizations who manage their own infrastructure would always try to avoid doing two major changes which have interdependencies.
  • Multi-Tenancy means you may have to worry about a noisy neighbor.
  • Muti-Tenancy could also lead one to interesting issues which were never thought about before. What if there was a way to do an “injection attack”. Depending on how Multi-Tenancy is implemented, you could potentially touch other customers data.
  • Infrastructure and platform lock-in issues are worrying for many organizations who are thinking long term. Most cloud vendors don’t really have a long history to show their track record.
  • Change control and detailed change log is missing.
  • Individual customers don’t have much decision making power on what a vendor should do next. In a privately hosted environment the stake holders are asked before something is done, but in larger infrastructure, you are a small fish in a huge pond.
  • Most cloud vendors have multiple layers of cloud infrastructure dependent on each other. Its hard to understand how issues around one type of cloud could impact others. This is especially true from Security view point. A bad flaw in a lower layer of the architecture could impact all other platforms built over it.
  • Moving applications to cloud means dealing with a different style of programming designed for horizontal scalability, data consistency issues, health monitoring, load balancing, managing state, etc.
  • Identify management is still in early stages. Integration with corporate Identify management infrastructure would be important to make it easy for individuals from large organizations on external clouds.
  • Who takes care of scrubbing disks when data is moved around ? What about data on backup tapes ? This is very important in application handling highly sensitive data.
  • Just like credit card fraud, one has to worry about CPU time fraud. Is the current billing and reporting good enough to help large organizations figure out what is real and what could be fraud ? They need a real-time fraud detection mechanism. And what about loss of service due to DOS attacks ? Who pays for that ?
  • Need a better mechanism to bill large corporations.
  • On the non-technical side, there are a lot of questions related to SLAs, Compliance issues, Terms of services, Legal issues around cross border services, and even questions about whether law enforcement have a different set of rules when search and seizure is required.
  • Not too far from being another form of “outsourcing”.

Photo credit: akakumo

Private clouds not the future ?

James Hamilton is one of the leaders in this industry and has written a very thought provoking post about private clouds not being the future. This is what he said about private clouds when compared to existing not-cloud solutions.

  • A fix, Not the future (reference to an InformationWeek post)
  • Runs at lower utilization levels
  • Consumes more power
  • Less efficient environmentally
  • Runs at higher costs

Though I believe in most of his comments, I’m not convinced with the generalization of the conclusions. In particular, what is the maximum number of servers one need to own, beyond which outsourcing will become a liability. I suspect this is not a very high number today, but will grow over time.

Hardware costs: The scale at which Amazon buys infrastructure is just mind boggling, but organizations buying in bulk could get pretty good deal from those same vendors as well.  Its not clear to me how many servers one has to buy to get discounts like what amazon does.

Utilization levels: Cloud providers optimize utilization by making sure all the servers are getting used all the time. Its also important to remember that because they trying to maximize utilization they don’t always buy all the servers for all of its customers when they sign up. 

At scale, with high customer diversity, a wonderful property emerges: non-correlated peaks. Whereas each company has to provision to support their peak workload, when running in a shared cloud the peaks and valleys smooth. The retail market peaks in November, taxation in April, some financial business peak on quarter ends and many of these workloads have many cycles overlaid some daily, some weekly, some yearly and some event specific. For example, the death of Michael Jackson drove heavy workloads in some domains but had zero impact in others.

This is something which bothers Enterprise IT departments everywhere when they are building private clouds. Can they get away with buying less servers than what the organization really needs and at times say “no” to some departments when they run out of computing power ?  Its hard to beat the scale of shared clouds.

The other reason why utilization levels are low in private clouds is because most organizations don’t have computationally-intensive batch jobs which could take advantage of servers be done while servers are not in use. On Amazon one could even bid for a lower price on unused EC2 resources.

This is a tough problem and I don’t think private clouds can outperform shared clouds.

Power usage: Inefficient cooking and power conversion losses can quickly make hosting infrastructure more expensive. Having domain experts can definitely help, and that’s not something smaller organizations can do either.

Platform: There aren’t any stable, proven, internal cloud infrastructure platform  which comes cheap. VMware’s ROI calculator might claim its cheap, but I’m not convinced yet. The xen/kvm options look very stable, but they don’t come with decent management tools. In short there is a lot of work which needs to be done just to pick a platform.

A private hadoop cluster is still a cloud infrastructure. At lot of organizations are now switching to similar batch processing based clouds which could be shared for different kinds of jobs. And there are still others who could decide to invest in smarter deployment and automation scripts to fully utilize their private infrastructure without using virtualization.

Overhead of the shared cloud: Larger an organization is, more difficult it is for it to migrate to a shared cloud. In fact migrating an existing live RDBMS based application over to cloud would be impossible without significant resources to architect the whole application and datastore. These organizations also have extensive well tested security policies and guidelines in place, all of  which would have to be thrown to the dogs if they have to put their data on a public network over which they have no control. But I do believe this is a temporary problem which will be resolved over time in favor of shared clouds.

Cost: Cloud infrastructure providers are not non-profit organizations. Though are here to make money, they would still be significantly cheaper for many. But do your homework and make sure you and your management team is ok with giving up infrastructure control for some cost savings.

That being said, here are my predictions for next couple of years.

  1. Except to see more non-virtualized, application clouds in the enterprise.
  2. Expect the shared cloud providers to get even more cost effective over time as competition increases.
  3. See more open source initiatives to build tools which manage private cloud infrastructures.
  4. See more interesting tools which provide end-users the ability to visualize actual cost of resources they are using. Making the cost more transparent, could guide developers to design smarter applications.

Private clouds: By Amazon

A few days ago I blogged about how VMware is going to do a huge push into “private clouds” around the VMware 2009 conference. But little did we know that Amazon had something up its sleeve as well. It has announced it today.

AWS now supports creation of Virtual Private Cloud with private address space (including RFC 1918) which could be locked down by a VPN connection to only your organization only. You still get most of the benefit of Amazons cheap hardware pricing but you get to lock down the infrastructure for security reasons.

Regardless of how you see it, this is huge for IT and the developer community. Some may love it, and I’m sure some will be pretty angry at Amazon for trying to commodities security and making it look as if network security was as simple as that.

With VMware’s announcements next week, there is no doubt in my mind that the next one year at least there will be a significant push towards “private clouds”.

Vmware: internal + external “private” clouds

Last year at VMware 2008 conference they discussed something called diagram-private-cloud-fed-large[1]vCloud. Before VMware 2009, they will be announcing external clouds providers around that platform which allow internal clouds to extend their infrastructure to external clouds.

What VMware is trying to do is allow organizations to build cloud networks with the possibility of moving few services/components to external clouds.

vCenterServer_TN_2[1]To make this seamless the VMware vSphere tool which currently allows internal cloud management will be enhanced to allow it to manage instances on the external cloud almost as if it was part of the internal cloud. In fact if the rumors are true, they will even support vMotion across to external cloud providers (restrictions apply).

VMware is getting on the cloud bandwagon in a big way… just take a look at the number of sessions they have mentioning cloud.

Weekend reading material

 

Products/Ideas

  1. redishttp://code.google.com/p/redis/ : Redis is a key-value database. It is similar to memcached but the dataset is not volatile, and values can be strings, exactly like in memcached, but also lists and sets with atomic operations to push/pop elements.
  2. HBasehttp://hadoop.apache.org/hbase/ : HBase is the Hadoop database. Its an open-source, distributed, column-oriented store modeled after the Google paper, Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop.
  3. Sherpahttp://research.yahoo.com/node/2139
  4. BigTablehttp://labs.google.com/papers/bigtable-osdi06.pdf
  5. voldemort – It is basically just a big, distributed, persistent, fault-tolerant hash table. For applications that can use an O/R mapper like active-record or hibernate this will provide horizontal scalability and much higher availability but at great loss of convenience. For large applications under internet-type scalability pressure, a system may likely consists of a number of functionally partitioned services or apis, which may manage storage resources across multiple data centers using storage systems which may themselves be horizontally partitioned. For applications in this space, arbitrary in-database joins are already impossible since all the data is not available in any single database. A typical pattern is to introduce a caching layer which will require hashtable semantics anyway. For these applications Voldemort offers a number of advantages
  6. Dynamo – A highly available key-value storage system that some of Amazon’s core services use to provide an “always-on” experience.  To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.
  7. Cassandra – Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store. Cassandra brings together the distributed systems technologies from Dynamo and the data model from Google’s BigTable. Like Dynamo, Cassandra is eventually consistent. Like BigTable, Cassandra provides a ColumnFamily-based data model richer than typical key/value systems.
  8. Hypertable – : Hypertable is an open source project based on published best practices and our own experience in solving large-scale data-intensive tasks.
  9. HDFS – The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets.

Blog/Posts/Links

  1. Eventually Consistent 
  2. Bunch of Links at bytepawn
  3. Fallacies of Distributed Computing