Our Experiences with Chef: Adoption Challenges
By: Clinton Wolfe 9 Jan '13
The chef system deployment and configuration tool from OpsCode is a powerful, flexible tool. While OpsCode’s documentation and marketing tries to focus on its ease of use, they provide little guidance on patterns of real-world use. Many different organizations are experimenting with ways of using Chef. While it’s still a bit early for best practices to emerge, collecting experiences across a variety of business types can help to discover what works and what doesn’t.
This is not a Chef tutorial, nor a journal of one person’s experience with Chef over a month. You will not learn what a resource is or enough Ruby for Chef. Instead, we’re assuming you have used Chef enough to know what it is, what it does, and at least one way to do what you want to do. You may have authored several cookbooks for your organization’s internal use, and taken slightly different approaches as you learned what works and what doesn’t over time.
Opscode offers several ways of using Chef. Most of them revolve around using a central store of cookbooks and configuration information, known as the Chef server. Chef server includes powerful cross-node search features. Thus, any node can query any other node for information, such as installed software and (depending on storage mechanism) credentials to various services.
Because OmniTI is a consulting company providing a range of development, managed services and hosting offerings, we must be able to isolate client assets from each other. Every tool we use must have excellent multi-tenancy support. At the same time, we have many tools and configuration data that we must share across clients that are specific to our environment. We want to share not only cookbooks, but roles and some data bags; and have fine-grained control over which nodes may access each.
There are essentially two options for running Chef Server. One of those is Hosted Chef, an Opscode IAAS offering, includes access control mechanisms. Cookbooks and node information are bundled into Organizations and cross-organization access is not permitted. Cookbook re-use is permitted via cookbook dependency resolution, but role and data bag re-use is not possible.
The open-source version of Chef-server provides no access control mechanisms, making it an unacceptable multi-tenant solution for security reasons.
The alternatives were:
- Invest in developing an ACL system for the open source release of Chef Server. Unlike the Hosted Chef ACL system, the proposed functionality would include inter-organization access to data bags, cookbooks and roles, with ACL enforcement. This would introduce a high likelihood of incompatible changes as the chef-server codebase develops.
- Run multiple, single-tenant chef-server instances. We still have to keep copies of shared roles, databags and cookbooks on each Chef server, not to mention maintain them all.
- Use chef-solo. Since we now cannot rely on chef-server to distribute cookbooks, nodes, data bags and roles, we must invent a mechanism to distribute that data securely. Because we already had this material in git repos, with appropriate ACL mechanisms, this was a fairly low-cost option. We pay a price in complexity and learning curve for new projects, because our custom bootstrapping code is peculiar to our organization and is only used at project startup time.
For the time being we are using the third option, but we may push that complexity back into the Chef server and run single-tenant Chef servers with some additional magic (without going so far as to patch chef-server itself). You can get the chef-solo-helper toolkit here.
For more on this topic, see threads here.
Deployment touches on a lot of stakeholders. Each group has different skill sets and experience levels. You’ll need to use an approach that supports a variety of uses, and presents a smooth, shallow learning curve.
We’ve had generally good outcomes thinking of our Chef users in four general camps:
Black-box users. Don’t care how Chef works internally at all; want to be able to work entirely in attributes. The default behavior of the recipes is good enough, and any tweaking should be done through attribute settings. May need support determining how roles should be decomposed, and how precedence works.
Bespoke Recipe Authors. Need to do peculiar things that no other project requires. Uses attributes, but also cookbook defaults, and thinks in terms of resources. Familiar with the broad spectrum of resources available. Knows how to template, and to fallback to bash resources where needed. Needs support to make sure that common functionality across projects gets properly formalized into a shared cookbook. They need to have access to repos for their Chef cookbooks that are private to the project.
Shared Cookbook Authors. Implements functionality around a specific software system (for example, Apache or Postgres). Often a topic expert on that system, or works closely with someone who is an expert. Intermediate Ruby knowledge. Defines an attribute space, then creates LWRPs and libraries to create multiple layers of reusable, generalized functionality.
Bootstrappers. In our environment, our Ops team is responsible for provisioning the computing environments (be they Solaris-style Zones, VMs or cloud instances). These people deal with the bootstrapping process, in which the chef-solo (or chef-client) software is installed, and the machine’s Chef configuration is a checkout of version control using our multi-repo chef-solo-helper tool. Their perspective is broad and shallow; they will have limited familiarity with the custom cookbooks of any one project, but deep familiarity with shared services like LDAP and network configuration. They want the VCS access control mechanisms to be straightforward.
We need to store the Chef configuration data in a version control system, because it is important work that must be tracked and versioned. Some of this data is shared (because it is common to our hosting environment, regardless of client/project) while some of it is proprietary information (the particulars of how a certain client runs their web stack; credentials to access the source code of their custom-developed proprietary web application). Thus, we must be able to have flexible, but secure access to the various repos.
As a development services company, we pride ourselves on our flexibility. Some clients have an internally-hosted VCS system; some want us to host the VCS system, but dictate the choice of VCS; other leave it entirely up to us. When it comes to authentication and authorization, we have a similar situation. We try to keep as much in our internal LDAP system as possible, but still must accommodate other solutions, for historical and political reasons.
Flexibility is the enemy of standardization, and automation thrives in highly standardized environments. These disparate VCS and authentication systems are friction points, especially to the users who bootstrap systems. Anyone who works across projects may find themselves dealing with a missing SSH key one day, or a bad htpasswd entry the next. Realizing that “oh yeah this project uses authentication system X which I have to debug using procedure Y” is a drag on the process, and increases ramp-up time each time people switch projects. We have had some success fighting the good fight: re-examining our reasons for using an oddball approach; automating where possible; documenting everywhere. We’ve also established standard operating procedures for new projects, and only deviate from that at client demand, communicating the increased cost back to them.
Another major hurdle to adoption of configuration management is a fear of loss of control over configuration files. Of course, the control over config files isn’t lost, but rather shifted. Prior to Chef, most projects managed config files using version control. Two fiefdoms were established: “system” config files, like sudoers, resolv.conf; and “application” config files, like httpd.conf, postgresql.conf, my-custom-application-properties.conf. The system config files were typically less of a point of contention - the Ops team was more interested in having Chef manage these files directly.
It can be a nasty surprise to find that a file you just edited got overwritten by Chef on its next run. We addressed this by placing, in every template file, a large warning at the top:
# This file is maintained by chef - LOCAL CHANGES WILL BE LOST # This project’s chef repo is at: <%= node[:motd][:chef_repo] %> # Use the project chef repo to set attributes which affect # the rendering of this template. # This template is from the dev-support chef repo, at # <%= node[:motd][:chef_repo] %> # in the file cookbooks/jenkins/templates/default/bash-profile.erb
The warning includes pointers to the correct repo and file in which to make edits properly.
The scenario here is the classic “it’s 3 in the morning and the client is losing money; I just need this to WORK.” So you make a manual edit to a config file, even though you know Chef will overwrite it in 30 minutes (we use chef-solo under cron).
First, we did accommodate this by adding a killswitch option to our chef-solo-helper - /var/chef-solo/killswitch exists, the run will be aborted. We then back that up with monitoring, so we know when Chef has been manually disabled.
But in general, this is a weak complaint that gets weaker as Chef is adopted more widely in an organization. It presumes that making changes properly in Chef is somehow difficult. That is true at first, if the individual is new to Chef, or if the processes or project is unfamiliar. As a company focused on a devops approach, we expect basic Chef competency from most staff - certainly those senior enough to be the ones responding to the late-night emergency.
Additionally, this fear is only realistic on very small projects. If there are more than a handful of machines, making a manual edit is impractical and error-prone. On most of our larger projects, with dozens or hundreds of nodes, “oh no I have to make the change in Chef” becomes “thank goodness I can just make that change via Chef.”
This desire comes up frequently. Chef’s ‘template’ resource supports two modes: overwrite the file if missing, and overwrite the file if different. Often, we want some other process to be able to edit some part of the file, typically VCS, a web service or an OS piece.
There are a number of ways to handle this.
- Make sure there isn’t an existing cookbook that solves your problem. Editing hosts, resolv.conf, interfaces, and crontabs are all handled by excellent publically available cookbooks.
- Depending on the file format being configured, you may be able to use an Include directive (or similar) to pull in additional files. You can then have one file be under Chef control, one under web service control, etc.; the master file can then be Cheffed or not.
- Use a partial editing resource. There is a file_edit resource out there; and some people have tried various things using tools like xmlstarlet. We look at this as a last resort, as it is very sensitive to changes.
Some projects will have a branch for dev, a branch for prod, etc. A configuration file will then exist in each branch, with the environmental differences present only in each branch.
While this may have made some sense prior to the availability of tools like Chef, we feel this is a bad practice once Chef is in use. Environments are defined by Chef - not VCS branches. It’s a much better tool for the job, because environmental differences can be captured and applied throughout the stack, not just at the application configuration level.In addition, storing environments in VCS branches means that merges must always be cherrypicked - moving dev’s edits to the conf file to the prod branch must be done carefully and manually.
Instead, we found that a one-time merge is more useful - creating a template based on the merge of the various environment branches, then parameterizing each conflicting value. Then, environment roles are created to store the values. This makes it very clear what the differences are between the two environments, and makes it easy to provision new environments that are a mix (for example, a UAT environment).
One particular challenge with Chef is that when you make a change, the work product is a complete system (or a group of systems). We had good results using Vagrant, a tool that integrates chef with VirtualBox. This allowed individuals to make changes to local chef repos, test them in Vagrant, and then push. Because no actual machines are involved, individuals may make mistakes without consequence - enabling learning.
Two other major points about learning Chef are worth mentioning, though they will be covered in detail in later articles.
- When you make an attribute change in chef, you have no inherent context as to where that change will have an impact. Each organization must develop practices (naming conventions, role decomposition guidelines and good comments) that make clear the scope of a change. We’ll explore attribute tree best practices in detail in part two of this series.
- Chef tries to turn your system configuration into code. That means you now inherit all the woes of software engineering: making changes in a coordinated manner and ensuring that changes integrate well are now an even greater concern. In part three of this series, we’ll look at applying software quality assurance and release management practices to Chef cookbooks and roles.
Our experiences with Chef have been at times exhilarating (“I can’t believe that change was so easy to make!”) and at times terrifying (ditto). Nearly all of the technical challenges were easily overcome with a bit of ingenuity - but the changes to processes and mindsets have been more disruptive, requiring perseverance and a variety of approaches. In some cases - like the multi-tenant bootstrapper - a purely technological solution was appropriate; in others a mix of technical and process adaptations were needed. As best practices emerge in the Chef community, we’ll continue to adapt and grow.