Version control systems are nothing new to the world of software development. I'll take the time now to unapologetically call you an idiot if you don't already have all your code and configurations in a version control system. Once you start using version control, there are several approaches available and, interestingly, online applications work turns out to be profoundly different than shrink-wrapped software.

Traditional Software Development

With shrink-wrapped software, you have features and fixes that are integrated into the product and effectively queued up into what is called a release. Development of features is performed in version control on what is commonly called a "branch" that allows isolation of the developed feature until it is in an acceptable state to be "integrated" back into a main line of development (also a branch) that is used for integration testing. Eventually, the features are merged into a release branch and find their way to clients. Bug fixes and security related issues are addressed in a similar fashion (sometime backwards in the process to fast-track their release to consumers of the product). This is a fly-by, over-simplified description of the typical software development life-cycle.

A lot of people believe that how one manages to a release can profoundly affect the product; two common strategies being "agile" and "waterfall." I'll argue that both are valid, both have their place, and both work in traditional software development. The end goal is the same: ship a quality product within the bounds of expectations set by product management with the clients. Typically, product releases are made available to clients on regular intervals. I've commonly seen three, six, twelve and even 18 month release cycles. Bug fixes, security updates, patches, hot-fixes (they have many names) are released more frequently (monthly, or for problematic products weekly). The client is responsible for upgrading their systems and, if either feature or fix releases happen too frequently, the process can become overly burdensome.

I'll come out and make a rather unconventional claim: the approach described thus far only works well when the number of clients using the software is larger than one. The larger the user-base, the better this model works. It might seem at first that large web sites that have millions of users would be able to use this model to develop their service, but now we've just exposed the crux of the issue. Millions of users use their service, not their product. In fact, in most cases, the only user of the actual software product is that single web site. This alone shakes the foundation of traditional development paradigms. Online environments have many parameters that make this approach untenable.

Get it on(line)

Developing software for an online service, and developing traditional software, have some fundamental differences.

In the online world, a software product drives a service to which users have access. There is, most often, a single copy of the actual software product in use. There is one consumer of the software: you. The users are consumers of the service built atop the software.

Most online services have thousands, if not millions, of users, and as such the tolerance for disruptive upgrades is reduced (often eliminated). You are forced into an environment where each production upgrade happens only once, there are no practice runs and it simply has to work.

In a traditional software model, new features can be distributed to clients that are less risk-averse as a part of an early-adopter program (a.k.a. beta program or tech preview program). This approach allows varied real-world tests of the new features so that when they are made generally available in the product, the confidence in their correctness and performance is sufficiently high. This simply doesn't work when the software you write for your service is only used by one client.

Perhaps most challenging is the pace at which competition moves. In the online world, I can have an idea this morning, an implementation this afternoon and every client of my service that shows up tomorrow will see it. In fact, things can and do happen much faster than that. You might think that rapid concept-to-availability push is reckless. You might be right. But, your competition is doing it.

The question is, how to you maintain a competitive pace and conquer all these challenges, when the odds are stacked against you? The real problem here is that the traditional software model bundles many changes into a release and even the tiniest mistake can result in a failure of the entire release (one mistake can break the whole product). Each change should always be accompanied with a reversion plan. Sometimes those plans are as easy as redeploying the product sans the change, sometimes they are more involved. When hundreds of changes are combined into a single release, the reversion to a previous release becomes the intricate mess of hundreds of change reversion procedures. When posed in these terms, the answer becomes a bit more clear.

Each change could contain a mistake that could cripple the product. However, if we make each change its own release, then the failure is isolated to a micro-release that can be reverted with much less disruption.

This leads to the very controversial technique of "deploying from trunk." Trunk (or HEAD or tip) is a version control term describing the bleeding edge of the product. As people work on fixing regressions and other bugs, as well as add new features into the product, they are adding, modifying and removing code and configuration from version control. If these changes are applied continuously and micro-releases are done continuously, when the inevitable mistake occurs the reversion process is isolated and prevents rollback casualties.

What's a rollback casualty? If I make change A and you make change B and they make their way into a single release, we have a casualty if either (but not both) change has a bug requiring reversion. Due to my mistake in A, we need to downgrade the product to the previous release inducing a rollback of your perfectly functioning work on B. What's worse is that you could have put a lot of work into B ensuring that it was done perfectly because you know that rolling it back would be painful, but I knew that rolling back A would not be disruptive so I was much less careful. This is just a nasty mess all around.

Big changes are scary, there's a lot to test and a lot to plan. By making micro-releases you amortize the risk by investing in deployment efforts in a highly granular fashion.

So the real question is: How do you make this safe? Online applications are not just a piece of code being run. They consist of many moving parts that each change (often independently), but all depend upon each other for correct operation; this is what makes rolling back certain failed deployments so challenging. It might be challenging, but success is sweet: eBay, Etsy, and flickr. It's a tricky balance that combines various philosophies:

  • "devops": engineering and operations are married and need to collaborate
  • micro-releases: releases must never get too large, instead amortize risk with small, controlled releases
  • dark launching features: building the feature out over time in a deployed and operational form to be simply "turned on" when properly qualified
  • wired off: the approach that features should have on/off switches to provide an alternative to rolling back deployments
  • fail forward: when things go wrong, have a solid plan to work forward to success (within your SLAs) instead of rolling back and trying again later.

Each of these techniques require their own in depth despcription, so we'll leave that for future Seeds articles. For now, just consider that a traditional software engineering mindset can put you at a desperate disadvantage in the world of online software engineering.