11 May, 2024•6 minute read

Why you shouldn’t namespace GraphQL mutations

One of GraphQL’s limitations is that it has no native concept of namespaces. All types defined in your schema exist in a shared global scope.

Generally we mirror this flat structure in the definition of our Mutation and Query object types. That is, we generally write schemas like this:

Click to copy

type Mutation {  postCreate(post: PostInput!): PostCreateResult!  postUpdate(id: ID!, post: PostInput!): PostUpdateResult!}

And not like this:

Click to copy

type PostMutation {  create(post: PostInput!): PostCreateResult!  update(id: ID!, post: PostInput!): PostUpdateResult!} type Mutation {  post: PostMutation!}

That isn’t to say you won’t see people try to namespace their mutations in this manner. There are a few blogs floating around which talk about this strategy, and the GraphQL .NET docs site makes this design seem rather benign:

You can group sets of functionality by adding a top level group. **You can apply this same trick to mutations and subscriptions.
— GraphQL .NET, "Query Organization"

But in reality you don’t want to do this, because it is an antipattern that invites subtle bugs in to your system. In this post I will be explaining why this is the case.

The problem with namespaced GraphQL mutations

Let’s work through the problem with some code. Here I am going to make an edit to a draft blog post, and then publish it to the world:

Click to copy

mutation {  postUpdate(id: "post_xxx", post: ...) {    # ...  }  postPublish(id: "post_xxx") {    # ...  }}

Recall that when you start a GraphQL operation with the mutation keyword you are simply running a query against your schema’s Mutation type. The Mutation type works just like any other object type in your schema; object types can’t be leaf nodes, which means you are required to specify a selection set.

In the specimen code snippet, the selection set on Mutation is {postUpdate, postPublish}. Requesting the postUpdate field here is not fundamentally any different from requesting the id field off the Post type.

Having said this, there is one special exception to this that can cause problems. When GraphQL executes a selection set, the execution can happen in one of two modes:

Normal execution mode allows fields in the selection set to resolve in any arbitrary order, or even in parallel. The GraphQL server is free to resolve fields however it decides is most optimal. Most of the time, your GraphQL selection sets are executed in this mode.
Serial execution mode mandates that fields in the selection set are resolved one at a time, from top to bottom according to their textual order. Serial execution happens in exactly one circumstance: when resolving top-level fields on the Mutation type.

This is important to understand. GraphQL bakes in an assumption that all fields in your schema—except for top-level fields on the Mutation type—are side effect-free, idempotent, and not reliant on execution order. These three assumptions make it safe to resolve fields in any order.

Mutations obviously break all three assumptions, so there must be some kind of rule in specification to provide deterministic execution. The decision made by the spec authors is to resolve them serially, from top-to-bottom of the operation.

To be concrete: in our specimen code block the postUpdate field is guaranteed to completely finish execution before the GraphQL server starts executing postPublish.

Here’s the exact language from Section 6.2.2 of the spec:

If the operation is a mutation, the result of the operation is the result of executing the mutation’s top level selection set on the mutation root object type. This selection set should be executed serially.
It is expected that the top level fields in a mutation operation perform side‐effects on the underlying data system. Serial execution of the provided mutations ensures against race conditions during these side‐effects.

On the other hand, in the next code block we have completely thrown away this guarantee:

Click to copy

mutation {  post {    update(id: "post_xxx") {      # ...    }    publish(id: "post_xxx") {      # ...    }  }}

With nested fields like update and publish, we are right back to the normal mode of executing selection sets. Being able to resolve fields in parallel is huge for performance, and the selection sets on a mutation field can be quite large when you are working with a complicated schema. It just wouldn’t be practical for the GraphQL spec to mandate every field in the operation be resolved serially, which is why this carve out only applies to the top-level fields.

So in this new example the GraphQL spec will guarantee that our top-level {post} selection set executes serially. After post resolves, the server will then start work executing our nested {update, publish} selection set. At this level of the query, we are no longer executing serially which means it is possible for us to end up publishing our post before our update gets saved.

In the case of a node failure, it’s even possible for us to publish the post and completely lose the update if the failure occurs at the right time. This nondeterminism is the reason why I do not recommend namespacing of mutations in this way. Organization through alphabetization (postCreate, postUpdate instead of createPost, updatePost) achieves most of the organizational benefits you’d get from namespacing with none of the downsides.

Abusing aliases for fun and profit

Of course there is a workaround to this. I’m documenting it in this section for completeness’ sake, but I do urge you to never actually use this in a real codebase because it’s weird and esoteric.

Let’s think about this from first principles. The top-level mutation fields are guaranteed to be executed serially by GraphQL, so you might think that by doubling up on your top-level fields you might be able to maintain serial execution while also benefitting from namespacing:

Click to copy

mutation {  post {    update(id: "post_xxx", post: ...) {      # ...    }  }  post {    publish(id: "post_xxx") {      # ...    }  }}

The thinking goes that each top-level post field is only finished resolving once all of its nested fields have been resolved. Our GraphQL server should, therefore, work like this:

The first post field starts resolving.
The nested update field starts and finishes resolving.
The first post field finishes resolving.
The second post field starts resolving.
The nested publish field starts and finishes resolving.
The second post field finishes resolving.

This sounds very reasonable, but in reality you will observe different runtime behavior than this if you run this operation on a real server.

The intuition is completely correct, and this is how things would work if not for another part of the GraphQL spec. In Section 5.3.2, an algorithm is defined which allows GraphQL servers to merge selection sets.

Recall that earlier I noted that GraphQL assumes all fields aside from top-level mutation fields to be idempotent. Given this assumption the server is free to optimize cases where an individual field is requested multiple times—and selection set merging is how that optimization is implemented. There are some technical details involved here, but in general an accurate intuition is that fields which share the same name and input arguments are eligible for merging.

In our latest code snippet, our two top-level post fields do share the same name and input arguments (none), which means our GraphQL server will happily merge them together. After selection set merging, the GraphQL operation that our server will actually run looks like this:

Click to copy

mutation {  post {    update(id: "post_xxx", post: ...) {      # ...    }    publish(id: "post_xxx") {      # ...    }  }}

In other words, we are right back to where we started!

There’s a post on freeCodeCamp which also highlights this issue, although the author’s diagnosis of the problem is inaccurate. In his post he contends that the root cause of the problem lies in post being resolved as soon as the post resolver returns an empty object—the update and publish nested fields, he says, are then free to execute in any order the GraphQL server decides optimal.

Here’s the gross part. We can disprove his theory and force serial execution of our mutations by abusing field aliases:

Click to copy

mutation {  a: post {    update(id: "post_xxx", post: ...) {      # ...    }  }  b: post {    publish(id: "post_xxx") {      # ...    }  }}

This version of our operation, dear reader, does work in the way we thought the other one would. Field aliases opt us out of selection set merging because the field names no longer match, which means that our GraphQL server will actually execute this exact operation as we have written it.

The Apollo team like namespacing mutations, and this solution is the one they propose.

I truly cannot express how much I dislike this. To know the purpose of these field aliases, an engineer needs to have an extremely deep understanding of the GraphQL spec. The vast majority of engineers simply do not have this, and frankly they shouldn’t need to. This GraphQL operation is too smart for its own good, and I don’t like overly smart code.

There’s another oddity about field aliases that’s worth touching on here. While selection set merging does care about field aliases, both Relay and Apollo’s normalized caches do not. This can result in local cache consistency problems when you have a nondeterministic query field requested under two aliases!

I especially dislike overly smart code when it is self defeating. The whole point of namespacing mutations is to improve organization and readability, and these field aliases completely sabotage those goals. KISS!

APIs

GraphQL

« Newer post

My take on GraphQL naming conventions

Older post »

How to track your SaaS’ carbon footprint on AWS

The problem with namespaced GraphQL mutations

Abusing aliases for fun and profit

« Newer post

Older post »

Get in touch 👋