On the Engineering of SaaS
By: Theo Schlossnagle 1 Mar '11
Software has been around for a long time in various forms: open and closed, commercial and non-commercial. The one thing that holds true about software products is that you, as a consumer, have to acquire them, install them and operate them. For the past several years, there has been an industry movement away from providing software in this traditional sense and instead providing the use of the software as a service (SaaS). SaaS has been around in many forms. Many companies (and investors) have recognized the opportunities that SaaS provides as a business model, but transitioning to it from a standard software development model requires a lot more than an executive decision. Herein I’ll try to lend some insight into what’s in store for you as you transition from a software company into a SaaS company.
1. A customer of one.
Typical software engineering processes are well-evolved and quite rigorous. They are designed to ensure that the product you release and ship around the world will boast minimal defects and incur as little as possible in the way of defect handling via patching or upgrading. While it may not be extraordinarily difficult to package the next version of your product, you must deal with making the installation/upgrade process as fool-proof as possible or you risk leaving customers stranded mid-upgrade. Getting the entire customer-base to upgrade to the latest version in a reasonable fashion is intractable and the more rapidly you release your product, the more frustrated customers become and the more unique versions you have to support “in the wild”.
SaaS engeering couldn’t be more different. Why? The typical software product driving a SaaS architecture has exactly one customer: you. You have one version of the product in production and it has to work all the time. An upgrade process, for example, is an entirely different beast. Making it robust and repeatable is far less important than making it quick and reversible. This is because the upgrade only ever happens once: on your install. Also, it only ever has to work right in one, exact variant of the environment: yours. And while typical customers of software can schedule an outage to perform an upgrade, scheduling downtime in SaaS is nearly impossible. So, you must be able to deploy new releases quickly, if not entirely seamlessly — and in the event of failure, rollback just as rapidly.
You will find that your needs in operating the product will have a tremendous impact on the the engineering roadmap. Interestingly, you will likely find that the features incorporated into the product should have been on the roadmap in the first place, but you lacked the insight or foresight, because you were not responsible for operating the product in a production setting. From here on out, while you build the service for your users, you build the underlying software products for a customer of one.
2. You aren’t a software company anymore.
You aren’t a software company anymore, you are an operations company. Software as a Service is much more about service than software. In fact, the users of your service will be just as satisfied thinking that magic pixies power the service they use as some complex software system. With this change comes a rather intimidating shift in expectations. Users expect software to have bugs, they expect to schedule downtime to upgrade, install, backup or otherwise manage the software product they are operating. With a service, however, there is a strong predisposition of users to expect things to be “always on.” As a simple analogy, if you sell a user a diesel generator, they will expect it to need maintenance, needs refueling and have the occasional service issue. Sell them electrical service and watch them come with pitchforks demanding refunds if you have an outage of any sort.
While this may seem silly at first, the expectation isn’t out of line. It’s a simple bit of economies of scale. Your job as a SaaS company is to operate the software, so logically you should do a better job than they would. Additionally, you are operating it for a large set of users, so it is a reasonable expectation that you have refined your operational techniques. Lastly, they pay you for one thing: to operate the service — so you had better get it right.
Working as an engineering company with an operations focus rather than a product focus can be a significant challenge for traditional software engineering companies. You should expect to see roles removed, roles introduced and organizational structure changed to add accountability for operating your service as your users expect.
3. Continuous Deployment
One of the greatest advantages of being a customer of one for your software is that you don’t have to worry about the oddball deployment or “that guy” that refuses to upgrade. It means that once you’ve deployed the latest version of code into production, you have no legacy copies, no troubleshooting of version differences and a definitively less complicated error reporting process. This, however, can cause a paradigm shift in development and deployment processes. It means that you can have a bug report at 8 a.m., a fix by 8:15 a.m. and a deployment by 8:20 a.m. Traditional software engineering companies have no other word to describe this but “insane.” It might seem reasonable to simply elect not to subscribe to that pattern of behavior due to the risks involved, but there is weakness in that stance.
In the era of SaaS, companies have engineered processes to successfully manage the risks of rapid deployment schedules (OmniTI, IMVU). What was once a patch release every two weeks can now be managed as hundreds of patch releases per day (in the extreme case). By carefully engineering risk out of the deployment process, a SaaS company gains agility to launch fixes, improvements and features into production at any time. If your competition can do this and you cannot, you are disadvantaged.
While it may take considerable effort to redefine your engineering processes to adequately limit risk and allow for continuous deployments, the advantages are significant. Due to the velocity of deployments that must be supported, the process of deploying itself must be engineered to be non-disruptive to services. This alone has the side effect of enabling feature launches, upgrades and triage without consequential downtime. It is the first step toward an “always on” architecture.
4. Quality Assurance is now a continuous process.
Quality assurance has a strong role in software engineering. While there is much effort expended automating QA, an automated QA process is sufficient if your service is used only by automated systems. If humans consume your service, you must also have human-driven QA processes. So, while much of the QA process can be automated and performed rapidly, it will never replace human usage of the application to detect both errors and perceived errors. In SaaS systems, the velocity of user-facing change is (at least) an order of magnitude higher than in traditional software engineering. It is inevitable that bugs will not only appear, but that they will reappear. Performing a full QA regression prior to each release is often unfeasible.
Your users are just as much members of your QA team as your employees. By making your users aware of that, by treating their feedback, complaints, bug reports and feature requests as first-class items, you enable them to improve your QA process and, more importantly, increase their tolerance for your mistakes. John Martin has a short, but enlightening, diatribe about the quintessential difference between QA in traditional environments and SaaS.
Perhaps the single most significant change to embrace is that of QA’s place. What was once a engineering phase, a deliverable, or a series of bars on a project manager’s Gantt chart (ultimately leading to a celebratory day of shipping a product release) is now a continuous and critical operational role within continually delivered and continually used service.
5. Multi-tenancy design.
So far, we’ve discussed mostly process changes that enable transforming from a builder of software to an operator of software. The last paradigm shift is perhaps the hardest as it relates to design philosophy and design goals rather than design processes.
When traditional software is designed, it runs on a system or set of systems for a single user. While a “user” in this sense can be an individual, or a business unit or perhaps even a whole organization, it is clearly not “all users.” It is the difference between engineering a car and engineering a complete metropolitan transit system. It is an issue of designing at scale.
Not only does this mean designing and building software that can handle thousands of times the load that your previous design enabled, but also engineering the solution to malfunction elegantly. Malfunction elegantly? Yes. All human engineered products will malfunction, it is a simple fact of life. In a SaaS, it is essential that when this happens that the malfunction is isolated to the smallest possible component of the service or to a specific customer. Back to our transit metaphor: the failure of a single bus, subway train or taxi must adversely affect as few users as possible; ideally, only those physically on the failed unit. This consideration is simply (and obviously) not present in the design of a single car.
The engineering paradigm shift from a single-user product to a multi-tenancy product is the most challenging metamorphosis required by a software company that intends to adapt and survive in the SaaS era. Two books that talk about the underlying mechanics of these challenges are Scalable Internet Architectures (written by me) and The Art of Scalability by Abbott and Fisher
Making good on a promise
While you may not have made a promise about what your SaaS offering will provide, the industry has set some undeniable expectations about what SaaS generally delivers. At a minimum, you must meet these expectations or your users will abandon you. These expectations are naturally derived from the key drivers for adopting SaaS: no maintenance, no upgrades, always current, always available, no commitment (or the desire for operational expenses over capital expenses). It is imperative that you understand where the bar is set for those wishing to shift into a SaaS delivery model.
With the exception of software companies that produce software that powers SaaS, most traditional software companies must evolve into a SaaS delivery model or suffer death at the hands of competition. Evolving into a SaaS delivery model without addressing the above key points will lead to substandard service, artificially high operating costs, user attrition and eventual collapse. You have to do it. You have to do it right. Are you ready?