Intentionally breaking types to guide scary refactors
At my last job we sold a quoting SaaS to tradespeople, and naturally a big component of our software was the quote builder itself. Tradies would get customers to jump on their portal, and from there the customer would input all the details of their job into our quote builder. After a bit of backend magic, the customer and tradie alike would receive an accurate price for the job to be done.
There were two main inputs we needed to handle. For a job to install new tiling in a kitchen and bathroom the customer would input two “areas” (one kitchen, one bathroom), and then for each area they would select “services” such as underfloor heating or wall tiling.
Originally we’d built the quote builder such that you could only have one of each area included in your job. That didn’t work quite so well for cases where someone wanted to get two different bathrooms tiled as part of the same job, so we needed to refactor.
To be really specific here, we need to add an id
(or key
) field to the below interface. Up until this point we had been using areaId
to uniquely identify areas which the customer had added in the quote builder, but this breaks down when you have two distinct “Bathroom” areas and want to be able to add/remove services to each one independently.
interface QuoteArea { // the ID of an area table row // e.g. "d2f1e61f-483d-464a-98dc-6d36d3eb0bf4" areaId: string; // list of services the customer wants // for this area services: QuoteService[]; // e.g. "Ready", "Needs Demolition", ... workState: WorkState;}
How would you go about performing this refactor? The challenge here is to ensure that all codepaths currently using areaId
as the unique identifier are correctly updated to use our new id
field.
LSPs are fantastic for mechanical tasks like renaming a function or shifting an export from one file to another, but they don’t offer much for semantic changes such as this one.
Tests can be a helpful guardrail in some cases, but there’s no guarantee they’ll exercise the semantics we are changing and we’ll likely need to add new ones which will push out our timeline.
At the time I made this refactor the company was still in its infancy, and our software was constantly changing. In this situation, tests don’t tend to stay relevant long enough for them to pay dividends. “Ship first, ask questions later” is a good mantra to live by in an early-stage startup.
Fortunately, it’s actually entirely possible to both have our cake and eat it. We can have a strong guarantee that we updated everything we need to while also not eating any additional overhead costs. We can achieve this by inverting our usage of the type system.
In other words: we typically use type systems to keep our program working. In this case, we’re going to use the type system to intentionally break our program.
Intentionally breaking the types
Here’s what this technique concretely looks like. Imagine we have the following Redux action creator:
function addService( areaId: string, serviceId: string,) { // ...}
Updating the callsites of this function is pretty easy on paper. All we need to do is list out all uses of this function and then swap them over from area.areaId
to area.id
. But in practice it’s really easy to miss one and wind up with bugs.
To guarantee we’ve correctly updated every callsite, we’ll make the following change:
function addService( // Note the type change! quoteAreaId: number, serviceId: string,) { // ...}
After we make this change, every single usage of this function will fail to typecheck. If we tried to PR the code in this state, our CI tests would fail and we’d be prevented from merging it in.
Our list of compiler errors now serves as a “to do” list that we can refer to. We can work through this list one item at a time, and when we’ve fixed all our errors we will know with 100% certainty that everything has been updated. Here’s what updating a callsite looks like when using this technique:
dispatch(addService( quoteArea.id as unknown as number, serviceId,));
We forcibly cast quoteArea.id
to number
here because this type change is temporary. We’ve only made id
a number for the purposes of getting a nice to do list from the compiler. When we’re finished updating all our code to use the new field we’ll revert the type back to string
and drop all the casts. There’s no possibility for human error to creep in when dropping the casts because the compiler will give us a new list of type errors when we swap id
over to string
.
Simply work through the type errors, run the app in your browser to test everything’s working as expected, and then swap back to the correct type. After that work is done, you’re finished. And you know for sure that you haven’t missed anything.
What if you don’t have nice abstractions?
The thing about this technique is that it isn’t specific to functions. You can apply it anywhere types are involved.
Let’s say that your architecture isn’t so great and the business logic of interacting with these QuoteArea
objects is littered throughout your codebase. While you won’t be able to tweak the signature of a function like addService
, you absolutely can just change the definition of QuoteArea
:
interface QuoteArea { id: string; // areaId: string; // ...}
If you simply comment out the field then every single location in the code which uses that field will throw an error. We could also choose to stick with the approach we took in the previous example, and instead swap areaId: string
over to areaId: number
. Both options work just fine.
Strictly speaking there are some subtle differences between the two options. A log message which outputs areaId
won’t complain about the type change, but will complain about the deletion of the property for instance. Sometimes this difference can matter!
The downside of this approach is that everything touching that field is going to cause a compiler error—even code that’s supposed to refer to the old value. Consider the case of some code responsible for creating a new area and adding it to the state of the quote. We still need to store the areaId
regardless of the existence of our new id
field, but because we’ve adjusted the underlying QuoteArea
type we’ll get a compiler error. That’s noise.
The added noise you take on by adjusting the type is obviously going to depend on how widespread the usage of the type is. The messier and larger your codebase, the more noise you’ll have. In TypeScript it’s at least easy to quickly sift through these extraneous errors by using compiler comments:
quote.areas.push({ id: createUuid(), // @ts-expect-error areaId: area.id, // ...});
The @ts-expect-error
comment will silence TypeScript’s error complaining about how QuoteArea#areaId
either doesn’t exist or has the wrong type while we work through out list of errors, and unlike a @ts-ignore
it will noisily cause an error when we roll back to using the correct types instead of sitting there silently.
Which brings me to my next point: this technique actually works with any language that sports a static type system, but it’s most useful when writing TypeScript. Other languages—broadly speaking—just don’t have the same degree of flexibility in their type system.
In C++, for instance, while you can change the type of QuoteArea#areaId
and use casts to work through type errors, you don’t benefit from TypeScript’s separation of runtime and build time. Swapping areaId
’s type in C++ results in an observable difference in runtime behavior which limits your ability to run and test the code while you work through the compiler errors. It’s just not as ergonomic.
Branded types
Another option that’s available to use in TypeScript which may or may not be available in other languages are “branded types”. Instead of swapping out string
for number
, we could instead have performed a refactor like this:
type Id<Type> = string & { __type: Type }; interface QuoteArea { id: Id<'QuoteArea'>; areaId: Id<'Area'>; // ...}
The advantage of this strategy is that after you do the forward pass through the initial set of compiler errors you are finished. You don’t need to swap your Id<'Area'>
type back over to a string
because Id<'Area'>
is, in fact, now the correct type for areaId
.
The downside of this approach is that generally it will increase complexity elsewhere. A stock-standard API client generated from an OpenAPI spec is going to type your API’s Area#id
field as a string
, so you’ll need to manually cast the value over to the branded type somewhere anyway. Branded types make a lot of sense when you have complete control over your tooling, but outside of that they tend to add a bit of friction.
Types are a superpower
Static type systems are pretty awesome. They’re an indispensable asset for avoiding bugs, and a good type system can completely remove the need for entire classes of unit tests. Going back to dynamic languages after you’ve gotten over the initial learning curve of static typing is painful.
But those use cases only scratch the surface of what’s possible with a static type system. Static types give us the ability to confidently break our program, knowing that once we’ve solved all the compiler errors it’ll work again.
The areaId
refactor I’ve talked about in this post would have been a massive undertaking in a dynamically typed language and likely would have taken close to a week, if not longer.
But in this case I was able to complete the refactor in a little over an afternoon. Breaking my program and fully leaning in to the type system saved me a huge amount of time.
When I was first onboarded to that company, one of the very first things I did was instate a rule that all new code had to be written in TypeScript for this very reason. Static type systems remove risk from business and very rapidly pay for themselves.
Big scary refactors are a lot less scary when you have the right tools and know how to use them correctly.