29 March, 2023•5 minute read

pnpm is blazing fast: real world benchmarks

One of the major projects I’ve overseen while working at Crimson Education was a complete rewrite of Crimson Global Academy’s main web application, CGA Home. CGA Home is the portal which students and staff members use to join online lessons, mark attendance, view assignments, and otherwise run their school life. It’s a major touch point for our customers.

Over CGA’s two years of existence, CGA Home had grown to be a mess. The launch of CGA was done under incredible time pressure, and the team had to build out a lot of infrastructure very quickly. Launching a global online high school is extremely difficult due to all the moving pieces at play.

Technical debt accrued during the early days hadn’t been managed effectively, and the decision was made shortly after I joined to do a complete rewrite of the app. This is usually a pretty terrifying prospect, but it all worked out pretty well for us.

There were a number of technical and architectural improvements made during this rewrite that I’ll talk about at another time, but in this post I want to focus in on our choice of package manager.

Node.js package management

For better or for worse, the Node ecosystem loves small modules. Lots of small modules. The left-pad incident from 2016 is perhaps the best example of just how small Node packages can get, and how widely these tiny packages can end up being used. The node_modules folder–where your dependencies get installed to–for any nontrivial Javascript project tends to become huge really fast as these tiny packages rapidly accumulate.

If you haven’t caught it yet, Jordan Scales’ satirical Medium post on the topic is well worth the read.

Lots of small packages is not necessarily a bad thing, but reconciling, downloading, and writing thousands of packages to disk isn't ideal for performance. The "standard" Node package manager, npm, has long been riddled with performance issues and while things have improved–the release of yarn forced npm to get better–it's still rather slow and inefficient.

One of the tooling changes I made to CGA Home was to swap from npm to pnpm. In doing so, we significantly cut down on package installation times and greatly improved safety thanks to pnpm’s unique way of laying out packages inside the node_modules directory.

Methodology

Our build system at Crimson is based on Buildkite, and I added a caching system similar to the one in Gitlab CI. You specify a file—the hash of which is used as a cache key—and then specify a directory that should be persisted across builds until the cache key changes.

Using this system we can specify our lockfile as the cache key, and persist our package manager’s “store” across builds. Doing this means that so long as the lockfile doesn’t change, we don’t need to download packages from the npm registry. This achieves a lot of speedup: our CGA Home web app pulls in 2,969 modules overall. Downloading one large gzip archive is substantially faster than downloading 2,969 gzip archives individually..

Builds get triggered by GitHub webhooks which end up running our "scheduler lambda." This lambda function is responsible for generating and applying a Kubernetes Job which results in the build actually running on our cluster. Using agent targeting, pipeline steps are able to specify the memory and RAM needed.

For CGA Home, we run frontend builds with specs equivalent to half of a t4g.xlarge EC2 instance. Because we don't take up the entire EC2 instance there’s a little bit of variance introduced build-to-build when there are noisy neighbors. We haven’t yet done the work to add gp3 disk support to our Kubernetes cluster, so we’re still on the slower gp2 disks.

Benchmarks: npm vs pnpm

Here are the statistics. They really do speak for themselves:

Package manager	Cache?	Time
npm	no	4m30s
npm	yes	2m30s
pnpm	no	1m40s
pnpm	yes	20s

The top line summary here is that a pnpm install with a cached store is 13.5x times faster than an npm install with an empty cache. That degree of speedup from swapping a tool is very rare in the DevOps world.

For point of comparison I’ve seen tsc to swc migrations result in a maximum 2x speedup for Typescript transpilation, and that comes with some safety tradeoffs¹ whereas the pnpm migration improves safety as pnpm is a stricter package manager than npm².

If you’re working on a Javascript codebase, swapping to pnpm is likely the biggest and easiest improvement you can make to your tooling right now.

Something else which feels worth mentioning is that the pnpm lockfile format is substantially more space efficient than the npm lockfile is. When I migrated CGA Home to pnpm, our new pnpm-lock.yaml was 75k lines smaller than the package-lock.json file it replaced, and we saw a size reduction from 3.98MB to 0.91MB.

Lockfile size reductions aren't too significant for a single application in a single repository. Monorepo setups with lots of different applications, on the other hand, can greatly benefit from the slimmer lockfiles offered by pnpm. Saving ~3MB multiplied by however many applications or services you have will make a material difference to your git clone times, and further speed up your pipeline execution³.

How can you switch?

There’s a nifty pnpm import command which will get you most of the way. pnpm import reads your old package manager’s lockfile and translates it into a pnpm lockfile.

pnpm lays out node_modules in a unique way where only direct dependencies (those specified in package.json) are placed inside your project’s node_modules. The benefit of this approach is that it’s impossible to import from a transitive dependency—which is a recurring source of bugs in real-world systems—but the downside is that you’re likely relying on this unsafe behavior if your codebase is large enough.

Simply running your app or attempting to transpile its Typescript is enough to debug missing direct dependencies. If you see errors related to missing packages, then check package.json to make sure that package is actually listed as a dependency. If adding missing direct dependencies doesn't get your application in a runnable state, the pnpm docs have a few other solutions you can try.

After that you’ll need to update your CI/CD to use pnpm commands⁴. If you’re coming from npm then this is reasonably straightforward as the pnpm CLI is largely a drop-in replacement for npm. The one command which might trip you up is npm ci, which should be replaced with the following invocation: pnpm install --frozen-lockfile.

After that short process, you should be able to enjoy significantly faster pipeline runs.

Most notably, swc doesn't perform any typechecking. You can, of course, run tsc with --noEmit but then you start eating into the performance gains of a clean swap.↩
There's a good write up on the topic here: "pnpm's strictness helps to avoid silly bugs"↩
Relative to other pieces of work in a pipeline, the time spent cloning your repository is usually a minor consideration. Shaving off seconds does matter, however, when your pipeline is otherwise very optimized.↩
If you are using a recent PaaS provider to deploy your application, then you may not need to do anything here! Vercel, for instance, is able to detect which package manager is in use and use the correct commands automatically.↩

« Newer post

Cool language features: Swift guard statements

Older post »

English’s stranglehold on NLP has been broken