pnpm is blazing fast: real world benchmarks
One of the major projects I’ve overseen while working at Crimson Education was a complete rewrite of Crimson Global Academy’s main web application, CGA Home. CGA Home is the portal which students and staff members use to join online lessons, mark attendance, view assignments, and otherwise run their school life. It’s a major touch point for our customers.
Over CGA’s two years of existence, CGA Home had grown to be a mess. The launch of CGA was done under incredible time pressure, and the team had to build out a lot of infrastructure very quickly. Launching a global online high school is extremely difficult due to all the moving pieces at play.
Technical debt accrued during the early days hadn’t been managed effectively, and the decision was made shortly after I joined to do a complete rewrite of the app. This is usually a pretty terrifying prospect, but it all worked out pretty well for us.
There were a number of technical and architectural improvements made during this rewrite that I’ll talk about at another time, but in this post I want to focus in on our choice of package manager.
Node.js package management
For better or for worse, the Node ecosystem loves small modules. Lots of small modules. The
left-pad incident from 2016 is perhaps the best example of just how small Node packages can get, and how widely these tiny packages can end up being used. The
If you haven’t caught it yet, Jordan Scales’ satirical Medium post on the topic is well worth the read.
Lots of small packages is not necessarily a bad thing, but reconciling, downloading, and writing thousands of packages to disk isn't ideal for performance. The "standard" Node package manager, npm, has long been riddled with performance issues and while things have improved–the release of yarn forced npm to get better–it's still rather slow and inefficient.
One of the tooling changes I made to CGA Home was to swap from npm to pnpm. In doing so, we significantly cut down on package installation times and greatly improved safety thanks to pnpm’s unique way of laying out packages inside the
Our build system at Crimson is based on Buildkite, and I added a caching system similar to the one in Gitlab CI. You specify a file—the hash of which is used as a cache key—and then specify a directory that should be persisted across builds until the cache key changes.
Using this system we can specify our lockfile as the cache key, and persist our package manager’s “store” across builds. Doing this means that so long as the lockfile doesn’t change, we don’t need to download packages from the npm registry. This achieves a lot of speedup: our CGA Home web app pulls in 2,969 modules overall. Downloading one large gzip archive is substantially faster than downloading 2,969 gzip archives individually..
Builds get triggered by GitHub webhooks which end up running our "scheduler lambda." This lambda function is responsible for generating and applying a Kubernetes
Job which results in the build actually running on our cluster. Using agent targeting, pipeline steps are able to specify the memory and RAM needed.
For CGA Home, we run frontend builds with specs equivalent to half of a
t4g.xlarge EC2 instance. Because we don't take up the entire EC2 instance there’s a little bit of variance introduced build-to-build when there are noisy neighbors. We haven’t yet done the work to add
gp3 disk support to our Kubernetes cluster, so we’re still on the slower
Benchmarks: npm vs pnpm
Here are the statistics. They really do speak for themselves:
The top line summary here is that a pnpm install with a cached store is 13.5x times faster than an npm install with an empty cache. That degree of speedup from swapping a tool is very rare in the DevOps world.
For point of comparison I’ve seen
swc migrations result in a maximum 2x speedup for Typescript transpilation, and that comes with some safety tradeoffs1 whereas the pnpm migration improves safety as pnpm is a stricter package manager than npm2.
Something else which feels worth mentioning is that the pnpm lockfile format is substantially more space efficient than the npm lockfile is. When I migrated CGA Home to pnpm, our new
pnpm-lock.yaml was 75k lines smaller than the
package-lock.json file it replaced, and we saw a size reduction from 3.98MB to 0.91MB.
Lockfile size reductions aren't too significant for a single application in a single repository. Monorepo setups with lots of different applications, on the other hand, can greatly benefit from the slimmer lockfiles offered by pnpm. Saving ~3MB multiplied by however many applications or services you have will make a material difference to your
git clone times, and further speed up your pipeline execution3.
How can you switch?
There’s a nifty
pnpm import command which will get you most of the way. pnpm import reads your old package manager’s lockfile and translates it into a pnpm lockfile.
pnpm lays out
node_modules in a unique way where only direct dependencies (those specified in package.json) are placed inside your project’s
node_modules. The benefit of this approach is that it’s impossible to import from a transitive dependency—which is a recurring source of bugs in real-world systems—but the downside is that you’re likely relying on this unsafe behavior if your codebase is large enough.
Simply running your app or attempting to transpile its Typescript is enough to debug missing direct dependencies. If you see errors related to missing packages, then check package.json to make sure that package is actually listed as a dependency. If adding missing direct dependencies doesn't get your application in a runnable state, the pnpm docs have a few other solutions you can try.
After that you’ll need to update your CI/CD to use pnpm commands4. If you’re coming from npm then this is reasonably straightforward as the pnpm CLI is largely a drop-in replacement for npm. The one command which might trip you up is
npm ci, which should be replaced with the following invocation:
pnpm install --frozen-lockfile.
After that short process, you should be able to enjoy significantly faster pipeline runs.
- Most notably,
swcdoesn't perform any typechecking. You can, of course, run
--noEmitbut then you start eating into the performance gains of a clean swap.↩
- There's a good write up on the topic here: "pnpm's strictness helps to avoid silly bugs"↩
- Relative to other pieces of work in a pipeline, the time spent cloning your repository is usually a minor consideration. Shaving off seconds does matter, however, when your pipeline is otherwise very optimized.↩
- If you are using a recent PaaS provider to deploy your application, then you may not need to do anything here! Vercel, for instance, is able to detect which package manager is in use and use the correct commands automatically.↩