23 February, 20238 minute read

Code doesn’t explain itself. Write more.

There’s a golden rule to fostering a great engineering culture, and it is to aggressively move information out of your head and on to ink. Thoughts are ephemeral and private while paper is eternal and public. Writing often keeps stakeholders aligned, builds institutional knowledge, and forces you to be explicit and deliberate in your process.

As Bryan Cantrill so beautifully describes in his talk at Monktoberfest 2016, software engineering knowledge has always been passed from generation to generation via oral tradition in much the same way that schoolyard nursery rhymes are. While spoken word does have obvious value and has served our species well over the millenia, it should be clear to us all that the human memory has its limitations. As society grows more complicated and the information we need to communicate richer, we've built libraries and published books to ensure our knowledge transfer scales with our civilization.

Your software is no different.

Software documentation systems

In public, there has never been a better time to become a software engineer. In fact there are so many blog articles and video tutorials out there, says Bryan, that one of the most popular topics amongst software engineers is the meta question of what content to consume. We’re spoiled for choice, and every engineer on the planet will attribute some of their skillset to content they’ve found online. While the quality of online software engineering content can be quite variable, sifting through the noise can be done efficiently with search engines.

In private, things are a far far more uneven. Some companies -- Basecamp being a great example -- treat documentation and internal communications seriously, knowing that maintaining their organizational knowledgebase is key to moving quickly and increasing bus factor. Most other companies don't approach this with the same degree of care, and they wind up losing velocity.

Oftentimes the pushback against documentation comes from software engineers. "Just read the code, bro" is a refrain I've heard a lot in my career, and at first glance it seems fair enough: the only guaranteed explanation for the code is the code itself, and any derivative work--like a documentation file--is at risk of being wrong, or becoming wrong as the code it relates to evolves.

As true as that might be, it's not an argument against writing documentation in the first place--it's an argument against writing excessive documentation. If reading the code were truly all one had to do to understand it, then software engineers would rarely ever need to ask questions of their colleagues. Code itself provides a perfect description of its functionality, but it lacks a narrative to help guide unfamiliar engineers.

Code on its own struggles to answer why things were implemented the way they were, and often times it struggles to provide a succinct explanation of what it is accomplishing. If you have a scheduled worker that performs an analysis on the past 6 days of data in your system, then you'll need some sort of documentation--ideally an inline comment--explaining the business requirement around that magical number six. If your job is part of some larger orchestrated workload, then you'll likely need some sort of documentation explaining the overall architecture and ownership model of the different pieces of data flowing through it.

The goal isn't to write exhaustive documentation that becomes out of date the moment an engineer renames a variable, it's to write just enough documentation. To start with, that could just be inline comments and a docs/ folder filled with Markdown files.

This approach doesn't scale, but it will serve a small team well.

What makes a good documentation system?

While a docs/ folder in a Git repository is a good start, it's far from an ideal solution. The key metric documentation systems need to optimize for is discoverability. The quality of your documentation is simply doesn't matter if no one can find where it's stored, or if the information is laid out in a confusing and user hostile manner.

A single docs/ folder rapidly runs into its limitations once you either spin up your first infrastructure team, or find yourself in a situation where product teams are frequently needing to communicate with each other. Your infrastructure team's job is to ship a platform that product teams can leverage to rapidly ship valuable outputs--to do that, they need to understand how that platform works--and while some degree of developer workflow standardization is to be expected, the reality is that disparate product teams often run very differently from each other.

The problem with a docs/ folder is that if you're trying to get at something quickly, you only have two options:

  1. Scanning through the list of documentation file names.
  2. Performing an exact text search using your editor's find all feature1

Both of these methods are really bad at surfacing relevant information quickly. The first strategy falls apart when working with either a large number of files, or files which cover a great deal of content. The second strategy falls apart when you fail to search for exactly what you're looking for -- in other words it essentially fails every time, especially when you consider how formatting interferes with exact text search.

A good documentation system imposes some sort of predictable structure to the documentation so that humans are able to pattern match their way to the information they're looking for. In situations where that doesn't end up working, there should be another method for finding the information such as a search system.

I think the things you really need in a documentation system are:

  1. Hierarchy. It should be possible to collocate related bits of documentation under an umbrella that's separate from other, unrelated bits of documentation. Most documentation tools will give you this--Docusaurus, for instance, implements categories well.
  2. Scannability. Where plain Markdown files really start to break down is the variance in reader experience between all the various viewer apps. Someone looking at a Markdown file in VSCode doesn't get niceties like a table of contents without going out of their way for it.
  3. Searchability. In cases where a human can't find their way to the right doc the first time, being able to search--and have that search not just be an exact text search--is invaluable. Notion's search is worth a mention here, as it is surprisingly good.
  4. Editability. It should be dead easy for your team members to make contributions to your documentation system. Any friction at all will dramatically reduce folks' willingness to patch incorrect information, or keep information up to date as the system evolves. Once your documentation gets stale, it's game over--it's hard to get people to care about old, out of date documents.
  5. Analytics. All documentation carries a maintenance burden, and it's often better to just delete content that isn't being consumed. Analytics doesn't necessarily have to be built in directly to your documentation system, but there should be some way of layering it on so you can monitor your documentation's value.

What documentation systems are good to use?

If you have a dedicated developer experience team, then something like Docusaurus is actually very good. While it doesn't come with search out of the box, adding a search plugin is easy and static site generators like Docusaurus mean you get to retain ownership of your documentation. Instead of being held at the mercy of Notion or Google Drive, you can keep .md files stored in a Git repository collocated with the code they relate to and are free to change provider if your requirements shift.

MkDocs is reasonable too, and for teams that have adopted a polyrepo setup there's a multirepo plugin available which you can use to stitch together documentation files scattered throughout your Git provider. Being able to provide a unified interface into your organizational knowledge is valuable because it means engineers only need to learn one system.

Otherwise, Notion is not the worst idea in the world for holding documentation. It's a turnkey solution requiring little in the way of setup, and it delivers a serviceable documentation experience. While you can't source control your Notion pages--necessitating that documentation is updated separately from code--there's an official mechanism for exporting your content to Markdown, it has a decent built in search feature, and the editor is quite expressive.

How do you author good documentation?

This is where things get complicated, and there are a lot of different opinions. I think there are two key processes to writing documentation:

  1. Identifying topics which need documentation
  2. Actually writing the documentation for those topics

Doing both of these well is difficult, and you'll probably make mistakes along the way. Documentation that isn't working well should be treated like an underperforming feature: jettison it to avoid paying the ongoing maintenance burden.

Identifying documentation topics

A few good ways of identifying topics worthy of documentation are:

  1. Questions you had during your first few weeks, which had no written answers.
  2. Questions you've had to answer more than two times.
  3. Major architectural decisions, and their impact.
  4. Technical debt, its history, and possible solutions.

The first point here is particularly valuable. Software engineers are usually fairly unproductive during their first month or two at a company due to the time needed to get up to speed with the organization's codebase. Keeping an eye out for sharp edges during the onboarding process and expanding the company's documentation is a remarkably good way to make contributions early on in your tenure, and if your company is undergoing active expansion it's easy to prove the value of that work when incoming new hires refer to it.

The second point is, in my experience, undervalued. As a senior+ engineer on a team, you will always be fielding questions from your more junior team members and a lot of those questions will be duplicates--oftentimes, the same junior engineer will be the one asking the duplicate question! Every time you need to answer a question you've answered before, you are burning valuable time regardless of whether you sit down to write a doc file. You can cut your losses short by spending a bit of time upfront putting the answer to ink once you identify a recurring question. Often, the answer is simple and fits in the form of a short step-by-step guide.

Writing quality documentation

All engineering organizations will have their own measure for what constitutes quality documentation so it's challenging to provide universal guidelines which aren't overly vague. It's also important to note that while the high quality of something like the new React.js documentation site is self-evident, there is a very real cost-benefit tradeoff that needs to be made when authoring your internal documentation. An infrastructure team can likely afford to spend much more time on documentation than product teams can.

If I had to suggest a few rules of thumb, they'd probably look like this:

  1. Don't neglect prerequisite knowledge. Link to supplemental information which readers must know about in order to understand the document they're on.
    • It's worth emphasizing the value of the network effect here.
  2. Keep the language simple. If someone's reading your documentation it's because they're confused and want to learn how something works. There's no need to make that process any more difficult than it inherently is.
  3. Make sure instructions for building, running, or testing the thing you're documenting are included
  4. For any sort of guide, include a troubleshooting section with common issues and their remedy.
  5. Explain any relevant architectural decisions and their tradeoffs
  6. Try to keep the documentation as collocated to the system it's describing as possible. Inline comments are better than separate doc files for small things.

Realistically, there is no one way here. The kind of documentation you write to describe the procedure for adding a domain to some allowlist will be fundamentally different from the kind of documentation you'll write to explain an architectural footgun inside your legacy Java monolith. That's okay.

The key thing is to measure and identify what's performing, and what's useless. Dump the useless stuff that no one's referrring to, and make more of the good stuff that everyone's looking at.

  1. Sure, you could use something like grep to get more expressive search, but your junior engineers aren't going to actually do that.

Don't want to miss out on new posts?

Join 100+ fellow engineers who subscribe for software insights, technical deep-dives, and valuable advice.

Get in touch 👋

If you're working on an innovative web or AI software product, then I'd love to hear about it. If we both see value in working together, we can move forward. And if not—we both had a nice chat and have a new connection.
Send me an email at hello@sophiabits.com