§ Seeds / 2013 / Winter

Winter 2013 an OmniTI publication



Preserving Node.js Packages and Sanity

Software development is often an exercise in standing on the shoulders of giants. This is particularly true with Node.js, which is very much geared toward modular code. Managing the dependencies in a sane, predictable fashion is critical for smooth deploys and restful nights. There isn't a perfect solution, and no approach will work in all aspects, but npm's package.json in conjunction with npm's shrinkwrap feature will get the job done cleanly in most situations.

example |-- config |-- lib | `-- foo.js |-- node_modules | |-- A | |-- B | `-- Z |-- test |-- LICENSE.md |-- package.json `-- README.md

Node.js modules are self-contained, reusable pieces of code written either in pure javascript or a mix of C++ and javascript, the latter of which must be compiled before use. They reside in the node_modules directory and are included elsewhere by using require("foo"). The require() method will walk up the directory structure from the calling file, check each level for a node_modules directory and then for the module. This should be kept in mind, as it is what allows things like modules to each have their own dependencies, but can also cause unexpected behavior if more than one node_modules exists in the main project.

The vast majority of public modules can be found in one of two places: GitHub and npm. GitHub is a popular hosting service for code projects using Git revision control. Using a module from there would require a git clone of the repo and an npm build if compilation is necessary. The other option, npm, is a package manager for Node.js and is installed automatically with the core. Simply running npm install will create a node_modules directory in the current working directory, download the module, and then compile it automatically. Often modules will be hosted and maintained on GitHub and published to npm, but not always.

Manual installation is fine and dandy when developing, but eventually the project is going to have to be deployed or distributed and some way to replicate all of that work will be needed. One's first instinct might be to check node_modules into the project repository. The upside is absolute consistency, but the downside includes a larger repo with more noise, as well as issues with compiled modules being reused across systems; modules will generally need to be rebuilt anyway on each system.

{ "name": "example", "version": "1.0.0", "author": "John Doe ", "description": "An example project", "main": "./lib/foo.js", "repository": { "type": "git", "url": "https://github.com/some github user/example.git" }, "keywords": [ "example", "foo", "bar" ], "dependencies" : { "A" : "2.5.0", "B" : "0.1.2", "Z" : "git://github.com/some github user/Z.git" }, "license": "MIT" }

A much better approach, though not perfect on its own, is to use npm's built in features. In a node project, configuration options, including dependencies, can be specified in a package.json file, as shown above. Dependencies can be saved as npm module names, npm module names and version numbers, or as git links. If npm install is run, it will examine the package.json file and attempt to install any dependencies listed that are not already installed to the node_modules directory. In this case, it will install modules A, B, and Z. Additionally, npm rebuild can be run to recompile any native modules, which is useful (and necessary) after a node version update.

There are, however, two major drawbacks to just using npm and package.json. The first is that the system it is run on will need access to whatever servers are hosting the modules, whether it is the npm servers, github, or somewhere else. This may not be possible under certain security situations.

The second is that, while the dependency versions can be specified in package.json, and npm install will install those versions, this does not guarantee consistent subdependencies, as shown in the diagram below.

Installed on Jan 1

node_modules |-- A @ 2.5.0 (installs 2.5.0) | `-- C @ >= 1.0.0 (installs 1.0.1) `-- B @ 0.1.2 (installs 0.1.2) |-- D @ * (installs 2.4.1) `-- E @ 1.5.1 (installs 1.5.1)

Installed on Dec 1

node_modules |-- A @ 2.5.0 (installs 2.5.0) | `-- C @ >= 1.0.0 (installs 2.1.1) `-- B @ 0.1.2 (installs 0.1.2) |-- D @ * (installs 3.0.7) `-- E @ 1.5.1 (installs 1.5.1)

Each dependency module will have its own package.json, which often does not specify versions for all of its dependencies, meaning the latest versions will be installed. One can control the version of A and B, but not C, D, and E in the example. It is possible that the subdependencies installed are a new enough version that they are no longer compatible with the dependency versions specified in the main package.json or behave slightly differently. This can lead to some nasty surprises. Enter shrinkwrap.

Shrinkwrap is a way to lock down the entire dependency tree. After the base package.json is built and modules installed, running npm shrinkwrap will create a file called npm-shrinkwrap.json. This file will contain the exact version of every dependency and subdependency in the tree. When npm install is run again, it will check both package.json and npm-shrinkwrap.json, using the latter to install the specific versions of everything. The shrinkwrap file can be checked in to the repository, along with package.json, and one can be confident that modules will be consistent across systems.

There are still some caveats, however. It does not solve the issue of requiring access to the dependency hosting. The package.json, npm-shrinkwrap.json, and installed modules (node_modules) must be kept in sync. If there is something in node_modules that isn't listed in package.json, or one is missing, or the versions are different, then shrinkwrapping will fail. This means that all modules in node_modules must be handled through the package.json; any modules to be loaded outside that process need to be kept separate, which must be remembered when requiring modules in the code. When versions are changed in package.json and node_modules, a new shrinkwrap must not be forgotten. None of these are particularly problematic, but are extra considerations when working with the codebase.

Using the outlined methods to combine npm's package.json dependencies with shrinkwrap's version lockdown, we have found that our deploys have become smoother, and less hair has been lost when dealing with Node.js.