Shipping Trunk : For web applications

I had briefly blogged about this presentation before from Velocity 2010. I wish they had released the video for this session. I went through this slide deck again today to see if Paul mentioned any of the problems organization like ours are dealing with in its transition from quarterly releases to weekly/continuous releases.

One of the key observations Paul made during his talk is that most organizations still treat web applications as desktop software and have very strict quality controls which may not be as necessary since releasing changes for web app in a SAAS (Software as a service) is much more cheaper than for releasing patches for traditional desktop software.

Here are some of the other points he made. For really detailed info check out the [slides]

  • Deploy frequently, facilitating rapid product iteration
  • Avoid large merges and the associated integration testing
  • Easily perform A/B testing of functionality
  • Run QA and beta testing on production hardware
  • Launch big features without worrying about your infrastructure
  • Provide all the switches your operations team needs to manage the deployed system

Slides: Always ship Trunk: Managing Change In Complex Websites

Continuous deployments may not be for everyone: Culture

If you have read this blog before, you know how much I admire those who use continuous deployments in production. Doing that at scale is even more impressive. But the message which gets lost sometimes is that Continuous deployments may not be for everyone.

Most continuous integration environments usually do all of their deployments from trunk. Which means every check-in has to be production quality. Digg’s Andrew Bayer gives a good explanation of how they do code reviews and pre-code check-ins before code is merged into trunk.

Site uptime and reliability depends on a comprehensive QA process to protect against unintentional mistakes. And for rapid deployments one has to abandon manual QA processes in favor of 100% automated testing with the goal of getting close to 100% code coverage. Thats hard if the code is not written in a way which can be tested easily.

image

But, unit and integration tests alone cannot guarantee quality. In addition to testing code which has been implemented in the application, there needs to be tests to look for things which shouldn’t be implemented. For example, it would be nice to have tests to look for non-parameterized SQL calls in parts of code where it shouldn’t exist. If you know there is a wrong way to do something, write a test case for it so that its caught as soon as someone does it.

Some of this would be easy to do if you already follow a test driven development process where you have to write tests before you write code.

The biggest difference between an organization which follows Continuous deployment and one which doesn’t is in how QA is done. QA becomes a shared responsibility where everyone has to contribute. No matter how many tools or guidelines one publishes, if teams using this process don’t believe in it, the quality and availability of website will suffer. Pascal-Louis Perez (from KaChing) used a diagram like the one here to explain how this “culture” is at the heart of continuous deployment.

“Culture” also explains why most of the older organizations who follow a more traditional form of deployment are having a hard time understanding and adapting to this process.

Are you using Continuous deployments in your environment ? What was your biggest hurdle ?

Automated, faster, repeatable, scalable deployments

While efficient automated deployment tools like Puppet and Capistrano are a big step in the right direction, its not the complete solution for an automated deployment process. This post will explore some of the less discussed issues which are as important for automated, fast, repeatable scalable deployments. 

Rapid Build and Integration with tests

  • Use Source control to build an audit trail: Put everything possible in it, including configurations and deployment scripts.
  • Continuous Builds triggered by code check-ins can detect and report problems early image
    • Use tools which provide targeted feedback about build failures. It reduces noise and improves over all quality faster
    • Faster the build happens after a check-in, better are the chances for bugs to get fixed quickly. Delays can be costly since broken builds could impact other developers as well
    • Build smaller components (fail fast)
  • Continuous integration tests of all components can detect errors which may not be caught a build time.

Automated database changes

Can database changes be automated ? This is probably one of the most interesting challenges for automation, especially if the app requires data migrations which can’t be rolled back. While it would be nice to have only incremental changes introduced into each deployment (which are guaranteed to be forward and backward compatible), there might be some need for non-trivial changes once in a while. As long as there is a process to separate the trivial from non-trivial changes, it might be possible to automate most of the database changes through an automation process.

Tracking which migrations have been applied and which are pending is a very application specific problem for which there are no silver bullets.

 

Configuration management

 
Environment-specific properties

Its not abnormal to have different sets of configuration for dev and production. But creating different build packages for different target environments is not the right solution. If you need to change properties between environments pick a better way to do it.

  • Either externalize the configuration properties to a file/directory location outside your app folder, such that repeated deployments don’t overwrite properties.
  • Or, update the right properties automatically during deployment using a deployment framework which is capable of that.
Pushing at deployment time or pulling at run time

In some cases pulling new configuration files dynamically after application startup might make more sense. This is especially true for applications on an infrastructure like AWS/EC2. If applications were already deployed on the base OS image, then it will come up automatically when the system boots up. Some folks keep only minimal information in the base OS image, and use a datastore like S3 to download the latest configuration from. In a private network where using S3 is not possible, you could replace it with some kind of shared store like SVN/NFS/FTP/SCP/HTTPetc.

Deployment frameworks

 
3rd Party frameworks
  • Fabric – Fabric is a Python library and command-line tool for streamlining the use of SSH for application deployment or systems administration tasks.
  • Puppet -  Put simply, Puppet is a system for automating system administration tasks.
  • Capistrano – It is designed with repeatability in mind, letting you easily and reliably automate tasks that used to require login after login and a small army of custom shell scripts.  ( also check out webistrano )
  • Bcfg2 – Bcfg2 helps system administrators produce a consistent, reproducible, and verifiable description of their environment, and offers visualization and reporting tools to aid in day-to-day administrative tasks.
  • Chef – Chef is a systems integration framework, built to bring the benefits of configuration management to your entire infrastructure.
  • Slack – slack is an evolution from the usual "put files in some central directory" that is fairly common practice.
  • Kokki – System configuration management framework influenced by Chef
Custom or Mixed frameworks

The tools listed above are not the only set of tools available. Simple bash/sh scripts, ant scripts, even tools like cruisecontrol and hudson can be used for automated deployments. Here are some other interesting observations 

  • Building huge monolithically applications are thing of the past. Understanding how to break them up into self-contained, less inter-dependent components is the challenge.
  • If all of your servers get the same exact copy of application and configuration, then you don’t need to worry about configuration management. Just find a tool which deploys files fast.
  • If your deployments have a lot of inter-dependencies between components then choose a tool which gives you a visual interface of the deployment process if required.
  • Don’t be shy to write wrapper scripts to automate more tasks.
Push/Pull/P2P Frameworks

Grig has an interesting post about Push vs Pull where he lists the pros/cons of both the systems. What he forgot to mention is P2P which is the way twitter is going for its deployment. P2P has advantages from both Push and Pull architecture but comes with its own set of challenges. I haven’t seen an opensource tool using P2P yet, but I’m sure its not too far out.

Outage windows

Though deployments are easier with long outage windows, thats something hard to come by. In an ideal world one would have a parallel set of servers which one could cut over to with a flip of a switch. Unfortunately if user data is involved this is almost impossible to do. The next best alternative is to do “rolling updates” in small batches of servers. The reason this could be challenging is because the deployment tool needs to make sure the app really has completed initialization before it moves on to the next set of servers.

This can be further complicated by the fact that at times there are version dependencies between different applications. In such cases there needs to be a robust infrastructure to facilitate discovery of the right version of applications.

Conclusion

Deployment automation, in my personal opinion, is about the process, not the tool. If you have any interesting observations, ideas or comments, please feel free to write to me or leave a comment on this blog.

Scaling deployments

Most of the newer, successful, web startups have one thing in common. They release smaller changes more often. Being in operations, I am often surprised how these organizations manage such a feat without breaking their website. Here are some notes from someone in flickr about how they do it. The two most important part of this talk is the observation that Dev, Qa and Operations teams have to slightly blend into each other to achieve deployments at such a velocity, and the fact that they are not afraid to break the website by deploying code from trunk.

  • Don’t be afraid to do releases Flickr logo. If you click it, you'll go home
  • Automate infrastructure (hardware/OS and app deployment)
  • Share version control system
  • Enable one step build
  • Enable one step build and deploy
  • Do small frequent changes
  • Use feature flags (branching without source code branching)
  • Always ship trunk
  • Do private betas
  • Share metrics
  • Provide applications ability to talk back (IM/IRC)   – build logs, deploy logs, alert monitors. real time traffic can all go here
  • Share runbooks and escalation plans
  • Prepare/Plan for failures
  • Do firedrills
  • No fingerpointing…