While efficient automated deployment tools like Puppet and Capistrano are a big step in the right direction, its not the complete solution for an automated deployment process. This post will explore some of the less discussed issues which are as important for automated, fast, repeatable scalable deployments.
Can database changes be automated ? This is probably one of the most interesting challenges for automation, especially if the app requires data migrations which canâ€™t be rolled back. While it would be nice to have only incremental changes introduced into each deployment (which are guaranteed to be forward and backward compatible), there might be some need for non-trivial changes once in a while. As long as there is a process to separate the trivial from non-trivial changes, it might be possible to automate most of the database changes through an automation process.
Tracking which migrations have been applied and which are pending is a very application specific problem for which there are no silver bullets.
Its not abnormal to have different sets of configuration for dev and production. But creating different build packages for different target environments is not the right solution. If you need to change properties between environments pick a better way to do it.
In some cases pulling new configuration files dynamically after application startup might make more sense. This is especially true for applications on an infrastructure like AWS/EC2. If applications were already deployed on the base OS image, then it will come up automatically when the system boots up. Some folks keep only minimal information in the base OS image, and use a datastore like S3 to download the latest configuration from. In a private network where using S3 is not possible, you could replace it with some kind of shared store like SVN/NFS/FTP/SCP/HTTPetc.
The tools listed above are not the only set of tools available. Simple bash/sh scripts, ant scripts, even tools like cruisecontrol and hudson can be used for automated deployments. Here are some other interesting observations
Grig has an interesting post about Push vs Pull where he lists the pros/cons of both the systems. What he forgot to mention is P2P which is the way twitter is going for its deployment. P2P has advantages from both Push and Pull architecture but comes with its own set of challenges. I havenâ€™t seen an opensource tool using P2P yet, but Iâ€™m sure its not too far out.
Though deployments are easier with long outage windows, thats something hard to come by. In an ideal world one would have a parallel set of servers which one could cut over to with a flip of a switch. Unfortunately if user data is involved this is almost impossible to do. The next best alternative is to do â€œrolling updatesâ€ in small batches of servers. The reason this could be challenging is because the deployment tool needs to make sure the app really has completed initialization before it moves on to the next set of servers.
This can be further complicated by the fact that at times there are version dependencies between different applications. In such cases there needs to be a robust infrastructure to facilitate discovery of the right version of applications.
Deployment automation, in my personal opinion, is about the process, not the tool. If you have any interesting observations, ideas or comments, please feel free to write to me or leave a comment on this blog.