Abstract: A quick overview of various strategies to ease deployment of web applications, and some common pitfalls and failure modes to avoid. Intended to be broadly technology-agnostic.
Introduction
Over the course of my career, I've worked in a number of different environments, each with their own particular processes and procedures for deploying systems, from development to production. Over time, a number of best practice patterns and common anti-patterns have emerged, which this article will attempt to enumerate and explain. I hope this information will give you pointers and direction to improve your processes so that that deployment is made both easier and less error-prone. As much as is possible, I will be technology agnostic, so of course your particular environment may vary or require additional steps, but following along the broader themes listed here should be helpful.
Step Zero: Intelligent Use of Source Control
This seems like something that is almost forehead-slap obvious, but you should be using source control during development. Working without it is like doing high-wire acrobatics without a safety net. There is no project so small, either in scope or staff, that cannot benefit from some sort of source configuration management (SCM) system in place. Selecting which SCM to use (e.g., Subversion, git, Perforce) depends upon your team's development style and environment choice , and is beyond the scope of this article. In general, the Pragmatic Programmers, O'Reilly and APress books covering particular systems tend to be good resources.
Further, source control must be used intelligently to be of much use. Common anti-patterns include things like having the development environment exist in only one shared space that uses one source control checkout so people are colliding over editing the same physical bits for any given file, or never branching/tagging so that trying to determine the exact state of your system at a given point in the past is an exercise in frustration. If you ever have to ask "is anyone else editing this file?," you are either using a supremely broken SCM or you are doing something gravely wrong.
Your SCM setup should enable you to work concurrently and in isolation with a minimum of hassles, allowing easy integration of work done concurrently on the same module or set of modules; easy reproduction of the system as it was at any point in time in the past; and, ideally, easy searching of commit history because all of these scenarios will come up repeatedly in any project of significant scale. For example, if you find yourself trying to diagnose a recurrence of an issue with a particular ticket number, your diagnosis will be vastly sped up if it is easy to find all of the commits related to that ticket number in the past. Similarly, being able to tell easily who exactly added a particular feature months or years ago might make it much faster to track down the organizational knowledge required to extend or fix it in the present.
The more isolation feasible in the environment, the less coordination overhead is required to work together as a team on a given workload, meaning that your productivity scales more linearly with additional developer resources. Good usage of a SCM can aid this by making it easy to keep individual development environments in sync; a common best practice is to make sure that each developer can run their own stand-alone copy of the execution environment based upon a frequently-updated SCM checkout. As a specific example, with the common LAMP (or similar) technology stack, it is easy for each developer to have an account on a shared machine, with a checkout in their home directory that is used as the root for a vhost/distinct port so that each developer may work in isolation talking to different ports on the same host.
Having a particular "branch" or "tag" devoted to staging/testing and production environments is a common best practice. The particulars of how this is done vary by SCM, but every modern SCM should have some facility corresponding to one or the other of the above if not both. Branching/merging tend to be slightly more complex operations in most SCM systems, but the effort expended in learning these operations will repay itself many times over as you begin to be able to take a more sophisticated approach to the various states and stages of your system's evolution over time. See steps two and three below for further discussion along this theme.
Step One: Deployment Planning
Take the time to write out a deployment plan, even if it's just a brief one. At a minimum, your deployment plan should include:
- Name and short purpose description of the project (seems obvious, but depending upon how widely this information is distributed and how big your organization is, your readers may not automatically know what you're working on in order to tie your authorship back to what this proposed production deployment is all about)
- Names of and contact information for the staff responsible for its development (particularly tech leads and project managers)
- Source location (e.g., links into the SCM's web interface or descriptions of how to retrieve the source for SCM's that don't have a web interface)
- List of affected systems/what resources will be used (i.e., which servers this code will be pushed to, are there any extra steps that have to take place, like running of database modification scripts or setting up of extra server software/new configuration for existing resources?)
- Deployment and Rollback procedures (this may reference standard operating procedures in other documents if there is nothing out of the ordinary for the given deployment)
A wiki is a common tool for this, but even an email to the right people or a mailing list can suffice. One advantage of a persistent document is that it can "grow" over time as the project evolves rather than being reconstructed from scratch on each deployment (i.e., things like project staff or the particular branch of the project's source being used may change only infrequently, but dated/versioned logs might be kept of which revisions were rolled out when, or which revisions required an extra step, etc.). This is particularly important for continuous deployment environments (see next section). It is also easy to maintain a template like this, so that each one follows a common layout and helps staff remember frequently overlooked steps (e.g., helping developers remember to mention system software config changes required or database schema changes).
This may seem like unnecessary overhead (developers who truly enjoy writing documentation are not common), but retaining organizational knowledge about what is deployed where, when and why is crucial to keeping non-trivial systems running over time. Even if the original staff who deployed a system are still with the company (staff turnover being a fact of life for any organization), remembering precisely what was done and why potentially years after the fact is not an easy feat. The effort invested in this now will repay itself in the future.
Step Two: Continuous Deployment
or Phased Deployment?
There are two common modes of deployment for web applications. The first models traditional software engineering by using phased deployment, where phases of release correspond to planned or scheduled bundles of additional features and new bug fixes. A variant of this is the "boxcar" or "feature train" model that ships a defined release on some set schedule ("If it's ready in this six week window, it goes on the shipment train, if not, it waits for the next one."). This is common for environments that have rigorous quality assurance or change control requirements, as it allows a built in time period prior to each phase's release for those processes to execute in a regular, repeatable fashion on a known schedule. In these environments it is common to "snapshot" the particular phase for deployment in some fashion, via something like a "branch" or "tag" as discussed in step zero. For example, a Q3 phased release for a system might have "prod-Q3-2011" as its source branch/tag. The state of the system so denoted might then further be used as the base for issue remediation hotfixes that must go live in between regular phased releases. For an automated deployment environment (see next section), the system would need to be aware of the correct current branch/tag to use as its deployment source (or perhaps offer the option of the currently available sources that match a given pattern to the user making the deployment).
The second, and more recent development, is continuous deployment. With continuous deployment, new features or bug fixes may go live at any time. Some environments push live to every user at the same time, and others use a "feature flag" approach where a given user must have a given flag or set of flags in their active session or profile to be exposed to the new code. Care must be taken for "feature flag" setups to ensure that tests of the system (see monitoring and verification section below) are using the correct flag or sets of flags to accurately capture the state of the system as the end users see it.
What I will say next has proven to be one of the more contentious parts of this essay in internal discussion, so I freely admit that this is a point worthy of further thought particularly as time provides more evidence on the ways that continuous deployment works or fails. I do not believe that continuous deployment systems should be configured such that the source for pushes to production machines (e.g. a branch or trunk or whichever nomenclature is appropriate for your environment) is the same space that developers initially check code into. I am willing to stipulate that things like developer mindset and discipline in concert with automated checking scripts within the commit process may eliminate many sources of error that could be introduced in such an "insta-live" system, but I'm also a big believer in the power of Murphy. Having some separation here, however low friction, should help prevent many errors (maybe something like a code review queue or holding pen that things go through before going to production, or development in branches with deploys drawn from trunk with many and small merges vs fewer large ones). In my mind continuous deployment is more about release automation, scope of work per released quanta, and democratization of the release process combining to empower individuals to release quickly than any particular SCM configuration litmus test.
Which of these approaches works best for your team may be dictated by the business/regulatory needs your application must satisfy, or may be limited only by the consensus of personal preferences involved. Generally speaking more conservative environments will trend toward phased deployment out of necessity.
Step Three: Deployment Automation
The easier you can make it to do the "right things" for your environment, the more likely people are to do them. What these steps might be will of course vary widely, but common examples may include things like moving files into place (static assets, interpreted code), compilation and movement/packing of resultant binaries (for compiled language environments), application of database changes, and so forth. Almost all of these various steps may be automated via some mechanism (e.g., via scripting languages either directly on a command line, or perhaps in more sophisticated environments an actual deployment manager standalone application). Particularly large, multi-server systems may choose to roll deployments out in stages to increasingly larger subsets of their total infrastructure, effectively using A/B testing with progressively larger portions of their active user base to check for any problems as scale increases or negative user experience feedback; this is obviously much better done in as automated as possible fashion as the chance for simple errors increases dramatically as the number of manual interactions increases.
As a quick example of a simple implementation of this kind of setup, several years ago I worked in a PHP-based environment which used the "qa" and "prod" CVS tags to tag particular revisions of various files as being suitable for deployment to a particular environment. When a developer (with the right access privileges) accessed the deployment manager web application, he could select which tag to deploy; the system would do a CVS checkout of that tag to a scratch area and then do an rsync command to move all of the code (and associated static assets) to the appropriate server(s). This radically reduced the overhead of deploying code, although it was not perfect in the sense that database and other system config changes still required the involvement of the relevant systems teams. A variant of this in use at another organization similarly depended on conventional branches for "qa" and "prod", but instead of using rsync would use ssh to invoke svn up commands directly on each affected machine (Apache was configured to deny access to .svn directories).
One tool that seems often overlooked in this area is use of OS-native software packaging mechanisms to distribute content and execute scripts required for the given change set. These scripts may be either tailored to the particular release, or may be general standard scripts that by convention draw data from named portions of the source being deployed (e.g. a "db/001.sql ... N.sql" file set might be iteratively applied in order if they exist, or a "etc/001.patch ... N.patch" set of patch files might be applied in a similar fashion). Use of this sort of packaging system will make it much quicker to verify that a given app is installed, what files are associated with it, whether any of those files have been modified, and so forth, and also makes installation/upgrade/removal far more automated. Another example for a Java-based system might be an OS-level package that contains the compiled WAR file and pre-/post-install scripts to invoke the correct application server steps to install or update the application.
Step Three: Monitoring and Verification
Being able to keep a real-time watch on your system's performance and user behavior is extremely important during and after a deployment. If server errors surge after a push, clearly the deployment will need to roll back, but other more subtle failure modes may also be important (a change that leads to increased latency on the site might hurt conversion/activity rates of users, for example). Having a system in place to collect and monitor these technical and business metrics will go a long way toward increasing your assurance that a given deployment has not introduced any issues.
In a related vein, having a suite of integration tests that you can run on production to quickly verify that all expected functionality is working at any given point in time can be extremely handy (so that you don't have to wait for a user to stumble on the one out of the way use case that happens to now throw an error). This becomes particularly powerful in systems large enough that manual testing of the entire API/UI is inefficient. These integration tests must be distinguished from unit tests which are likely also part of your testing and deployment strategy, albeit at a more granular source-code level. In all cases, designing for modularity and testability will make your life much easier when it comes to verifying the behavior of your software, but that is a matter for another article.
The resources and further reading section below has links to a few different tools for both areas listed above. There are many other options, of course, so finding the best fit for your environment would be a matter of further research.
Conclusion
I hope this article has given you some insight into how to improve your deployment processes, with the goal being reduction in complexity and uncertainty related to making your system evolve to fit ever-changing business needs. The steps outlined above may be adopted/adapted to your organization in stages, but the more fully you adopt them the more synergistic benefits you will see. In all cases, the guiding principle should be to make it easier to do the right things for your environment and minimize end-user complexity. No matter what technology stack you are using, and no matter what type of application you are writing, getting deployment right can make the difference between going crazy from stress and having a happy, productive work day.
Resources and Further Reading
Source Configuration Management Systems
- Subversion -- A common centralized SCM used by many organizations; free software. Quality books covering "svn" are available from several publishers, and some are available online freely as well, e.g. the red bean svn book
- Git -- An increasingly popular distributed SCM, used by large projects such as the Linux kernel; free software. As with svn above, git has several good texts in print and some are available online e.g. Pro Git
- trac -- A web interface to several common SCMs (svn, git, etc.); integrates a ticket management system and wiki as well as source browser, free.
- mtrack -- Similar to trac but with several enhancements e.g. native ability to handle multiple projects per single install (trac as shipped is intented to have one instance per managed project)
Deployment Planning
- dokuWiki -- A common and full-featured wiki; free. PHP based so anything supporting that (apache or similar on Unix, IIS on Windows, etc.) should at least have a good chance of running it.
Deployment Automation
- Scripting Languages -- This will greatly depend on your environment, but almost any enterprise computing platform these days will have some sort of scripting mechanism, e.g. perl, python, ruby, etc. (Windows versions in particular of things like perl and python may be obtained from ActiveState both freely and with support contracts.)
- rsync -- an intelligent method of syncing files between two computers, free software.
- Vlad the Deployer -- a free, ruby-based deployment automation system. I've seen this used in-house in concert with additional development in ruby to produce Solaris and CentOS packages automatically as well as rolling them out to the target systems.
- Ruby on Rails as a system deserves credit for thinking about deployment automation more than many other frameworks, e.g. database migrations and deployment managers like Capistrano/Bundler.
- Your chosen operating system's package management documentation; generally speaking any enterprise grade server operating system will have some sort of package management and documentation/guides will exist for how to make/maintain packages for that system.
- Cloud-based deployments are another special case, as many "cloud" infrastructures are themselves scriptable to allocate/deallocate additional resources, making another level of potential automation as well as simply managing the deployment of code and config changes. An example of this is Nimbul from the New York Times (centered around Amazon's set of elastic/cloud services).
Monitoring and Verification
- Selenium -- Selenium is a way to record and then play back web application interactions via browser, and is useful when constructing behavioral/integration tests to verify a site's functioning.
- Nagios -- Commonly used infrastructure monitoring tool, can be a bit of a bear to set up the first time; free.
- Cacti -- A graphing and trending application, free.
- Circonus -- Circonus takes the setup and maintenance hassles out of monitoring and trending, available as a service.