The unbiased private vs AWS ROI worksheet

One of the my problems with most cloud ROI worksheets is that they are heavily weighted for use-cases where resource usage is very bursty. But what if your resource requirements aren’t bursty ? And what if you have a use case where you have to maintain a small IT team to manage some on-site resources due to compliance and other issues ? 


In his latest post, Richard shares his worksheet for everyone to play with.



Cloud economics: Not really black and white..

While some of the interest in moving towards public cloud is based on sound economics, there is a small segment of this movement purely due to the “herd mentality”.

The slide on the right is from a Microsoft publication shows that larger networks may be less economical on the cloud (at least today).

Richard Farley, has been discussing this very topic for few months now. He observed that a medium sized organization which already has a decent IT infrastructure including a dedicated IT staff to support it has a significantly smaller overhead than what cloud vendors might make it look like.

Here is a small snippet from his blog. If you are not afraid to get dirty with numbers read the rest here.

Now, we know we need 300 virtual servers, each of which consumes 12.5% of a physical host.  This means we need a total of 37.5 physical hosts.  Our vendor tells us these servers can be had for $7k each including tax and delivery with the cabinet.  We can’t buy a half server, and want to have an extra server on hand in case one breaks.  This brings our total to 39 at a cost of $273k.  Adding in the cost of the cabinet, we’re up to $300k.

There are several non-capital costs we now have to factor in.  Your vendor will provide warranty, support and on-site hardware replacement service for the cabinet and servers for $15k per year.  Figure you will need to allocate around 5% of the time of one of your sys admins to deal with hardware issues (i.e., coordinating repairs with the vendor) at a cost of around $8k per year in salary and benefits.  Figure power and cooling for the cabinet will also cost $12k per year.  In total, your non-capital yearly costs add up to $35k.

One thing which posts doesn’t clearly articulate, is the fact that while long term infrastructure is cheaper to host in private cloud, its may still be more economical to use public cloud for short term resource intensive projects.

James Hamilton: Data center infrastructure innovation

Summary from James’ keynote talk at Velocity 2010 James Hamilton

  • Pace of Innovation – Datacenter pace of innovation is increasing.  The high focus on infrastructure innovation is driving down the cost, increasing reliability and reducing resource consumption which ultimate drives down cost.
  • Where does the money go ?
    • 54% on servers, 8% on networking, 21% on power distribution, 13% on power, 5% on other infrastructure requirements
    • 34% costs related to power
    • Cost of power is trending up
  • Clouds efficiency – server utilization in our industry is around 10 to 15% range
    • Avoid holes in the infrastructure use
    • Break jobs into smaller chunks, queue them where ever possible
  • Power distribution – 11 to 12% lost in distribution
    • Rules to minimize power distribution losses
      • Oversell power – setup more servers than power available. 100% of servers never required in a regular datacenter.
      • Avoid voltage conversions
      • Increase efficiency of conversions
      • High voltage as close to load as possible
      • Size voltage regulators to load and use efficient parts
      • High voltage direct current a small potential gain
  • Mechanical Systems – One of the biggest saving is in cooling
    • What parts are involved ? – Cooling tower, heat exchanges, pumps, evaporators, compressors, condensers, pumps… and so on.
    • Efficiency of these systems and power required to get this done depends on the difference in the desired temperature and the current room temperature
    • Separate hot and cold isles… insulate them (don’t break the fire codes)
    • Increase the operating temperature of servers
      • Most are between 61 and 84
      • Telco standard is 104F (Game consoles are even higher)
  • Temperature
    • Limiting factors to high temp operation
      • Higher fan power trade-off
      • More semiconductor leakage current
      • Possible negative failure rate impact
    • Avoid direct expansion cooling entirely
      • Air side economization 
      • Higher data center temperature
      • Evaporative cooling
    • Requires filtration
      • Particulate and chemical pollution
  • Networking gear
    • Current networks are over-subscribed
      • Forces workload placement restrictions
      • Goal: all points in datacenter equidistant.
    • Mainframe model goes commodity
      • Competition at each layer rather than vertical integration
    • Openflow: open S/W platform
      • Distributed control plane to central control

Private clouds not the future ?

James Hamilton is one of the leaders in this industry and has written a very thought provoking post about private clouds not being the future. This is what he said about private clouds when compared to existing not-cloud solutions.

  • A fix, Not the future (reference to an InformationWeek post)
  • Runs at lower utilization levels
  • Consumes more power
  • Less efficient environmentally
  • Runs at higher costs

Though I believe in most of his comments, I’m not convinced with the generalization of the conclusions. In particular, what is the maximum number of servers one need to own, beyond which outsourcing will become a liability. I suspect this is not a very high number today, but will grow over time.

Hardware costs: The scale at which Amazon buys infrastructure is just mind boggling, but organizations buying in bulk could get pretty good deal from those same vendors as well.  Its not clear to me how many servers one has to buy to get discounts like what amazon does.

Utilization levels: Cloud providers optimize utilization by making sure all the servers are getting used all the time. Its also important to remember that because they trying to maximize utilization they don’t always buy all the servers for all of its customers when they sign up. 

At scale, with high customer diversity, a wonderful property emerges: non-correlated peaks. Whereas each company has to provision to support their peak workload, when running in a shared cloud the peaks and valleys smooth. The retail market peaks in November, taxation in April, some financial business peak on quarter ends and many of these workloads have many cycles overlaid some daily, some weekly, some yearly and some event specific. For example, the death of Michael Jackson drove heavy workloads in some domains but had zero impact in others.

This is something which bothers Enterprise IT departments everywhere when they are building private clouds. Can they get away with buying less servers than what the organization really needs and at times say “no” to some departments when they run out of computing power ?  Its hard to beat the scale of shared clouds.

The other reason why utilization levels are low in private clouds is because most organizations don’t have computationally-intensive batch jobs which could take advantage of servers be done while servers are not in use. On Amazon one could even bid for a lower price on unused EC2 resources.

This is a tough problem and I don’t think private clouds can outperform shared clouds.

Power usage: Inefficient cooking and power conversion losses can quickly make hosting infrastructure more expensive. Having domain experts can definitely help, and that’s not something smaller organizations can do either.

Platform: There aren’t any stable, proven, internal cloud infrastructure platform  which comes cheap. VMware’s ROI calculator might claim its cheap, but I’m not convinced yet. The xen/kvm options look very stable, but they don’t come with decent management tools. In short there is a lot of work which needs to be done just to pick a platform.

A private hadoop cluster is still a cloud infrastructure. At lot of organizations are now switching to similar batch processing based clouds which could be shared for different kinds of jobs. And there are still others who could decide to invest in smarter deployment and automation scripts to fully utilize their private infrastructure without using virtualization.

Overhead of the shared cloud: Larger an organization is, more difficult it is for it to migrate to a shared cloud. In fact migrating an existing live RDBMS based application over to cloud would be impossible without significant resources to architect the whole application and datastore. These organizations also have extensive well tested security policies and guidelines in place, all of  which would have to be thrown to the dogs if they have to put their data on a public network over which they have no control. But I do believe this is a temporary problem which will be resolved over time in favor of shared clouds.

Cost: Cloud infrastructure providers are not non-profit organizations. Though are here to make money, they would still be significantly cheaper for many. But do your homework and make sure you and your management team is ok with giving up infrastructure control for some cost savings.

That being said, here are my predictions for next couple of years.

  1. Except to see more non-virtualized, application clouds in the enterprise.
  2. Expect the shared cloud providers to get even more cost effective over time as competition increases.
  3. See more open source initiatives to build tools which manage private cloud infrastructures.
  4. See more interesting tools which provide end-users the ability to visualize actual cost of resources they are using. Making the cost more transparent, could guide developers to design smarter applications.