24 October, 2023•5 minute read

How to implement Global Object Identification

Global object identification is a GraphQL specification which describes the Node interface and its associated node(id: ID!) query. Node is a very simple interface with only one required id field, and the spec mandates that the id for any object of a type implementing Node is globally unique across your graph.

Node is a valuable convention to have because it enables libraries to efficiently manage caches and refetch data without needing intimate knowledge of your particular GraphQL schema. Given an ID and a type name, it’s possible for libraries like Relay to—as an example—automatically generate pagination queries using the node query for an extremely ergonomic developer experience.

There are also performance wins. If you need to refetch a deeply nested part of your graph, then without global object identification you’d be forced to refetch the entire original query and resolve every level of that query on the server. If your server correctly implements global object identification, on the other hand, then you can directly query node and immediately resolve the latest version of that data.

The global object identification spec provides a good description of the mechanics, but is light on guidance with respect to how you should go about implementing the node query. How would you go about taking in a UUID and fetching its associated record? It’s not immediately obvious if you’re a newcomer to GraphQL.

This post describes three techniques for implementing global object identification, each relevant at different stages in the product lifecycle.

Database-native type-prefixed ID fields

Perfect for greenfield projects where you don’t have any legacy to contend with. Under this strategy, you prefix all of your ID values with a type name à la Stripe or Akahu. I’ve written about this ID format before, and it’s my preferred method for implementing global object identification because it requires zero additional engineering effort while also providing nice quality of life improvements.

If you use my resource-id package then a user ID will look like user_2WEtZFgethnfiLIqnXlmxygsyOs. It’s easy to parse out the type from such an ID, and from there it’s trivial to fetch the associated record from whatever datastore is responsible for holding users.

If you’re starting a new project, I urge you to follow this strategy. The benefits go far above and beyond making global object identification simple to support.

Unfortunately most software projects aren’t using this ID format, however, and it can be difficult to retrofit to a years-old codebase. If you’re trying to add global object identification to a brownfield project, you’ll need to take a different approach.

The “Node ID” pattern

Under this strategy, you treat your actual ID value as an implementation detail and expose a different global ID value in your GraphQL service.

Conceptually, we subtly change our mode of thinking: a GraphQL object isn’t simply a view into a database table, so there’s no reason why id must mirror the id field inside our datastore. Instead, we reframe the id field as representing the ID of the GraphQL node we’re looking at. It’s the ID of a node in a graph—not a row in a database table.

This is the solution used by GitLab in their GraphQL schema. It’s ideal for when you’re adding GraphQL to a service which doesn’t already speak GraphQL.

A user object might look like the following using this strategy:

Click to copy

{  "id": "gid://your-company/User/1",  "userId": 1,  "name": "Grace Hopper",  // ...}

The exact format of the external ID doesn’t really matter. You could use GitLab’s format (shown in the example), an URN, a type-prefixed ID format, or anything else—the key thing is that you should be consistent, and the format you choose should embed the type of the resource inside it.

💡 The userId field is included in that JSON object primarily to make it clear that the 1 inside the external ID is referring to a database ID. In practice there’s not really much reason to include this redundant information in your GraphQL schema.

The key difference between this pattern and the previous one is that our external ID doesn’t exist inside our database. Instead of only needing to parse out the type name, we also need to parse out the actual ID value in order to actually look up the thing the ID is referring to.

Because these external IDs don’t exist inside our database, it introduces some room for human error because everyone on your team needs to remember to implement a custom resolver for the id field. GitLab make things more ergonomic through a middleware which automatically converts anything called id into their external ID format, but whether you can do this will depend on your specific tech stack.

While I’ve specifically been talking about resources which already have an ID, you might actually have some familiarity with pattern in another context: resources with composite keys. Consider a user_group bridging table which stores group membership info. If we don’t care about tracking historic group membership, then we could design such a table like so:

Click to copy

create table "user_group" {  group_id text not null,  user_id text not null,   constraint "pk_user_group" primary key ("group_id", "user_id"),  constraint "fk_user_group_group" foreign key ("group_id") references "group" ("id"),  constraint "fk_user_group_user" foreign key ("user_id") references "user" ("id")}

Here we don’t have a single ID value, which means we can’t implement the GraphQL Node interface. You work around this by either adding an id resolver which munges the composite ID together (the simplest implementation might be something like group_id + “:” + user_id), or opt out of global object identification for this data type entirely.

The ID service

Let’s imagine you have a really old API service with a lot of third-party consumers, using an inconvenient ID format like UUID. Backwards compatibility requirements force you to maintain the current ID format for objects which already exist.

Depending on how you advertised your API’s constraints it may not even be possible to change ID format in a forwards-compatible way, either! If you’ve always exposed 32-character UUIDs up until now and never advertised a different maximum length, it is extremely likely that some consumer is storing your UUIDs inside either a uuid or varchar(32) database column that will break if you swap to something else. Hyrum’s Law is a painful reality.

💡 Good API design really boils down to two things: setting clear quotas to limit load, and documenting API constraints and guarantees to give yourself room for evolution down the line. Stripe docs say their object IDs can be up to 255 characters long, even though none of their object IDs are close to that long in practice. Give yourself breathing room.

The only way to sanely implement global object identification in this context is to add in some kind of “ID metadata service.” If you’re ~~Twitter~~ X this could be bolted on to something like Snowflake—an actual, dedicated ID generation service—or if you’re worried about bottlenecks / failure points you could have a background worker running which tails a Postgres replication log, DynamoDB stream, or similar.

Regardless, the goal is to produce a datastore which looks something like this:

Click to copy

create table "idstore" {  "id" text primary key,  "type" text not null}

Which ultimately allows you to look up the type of an opaque UUID reasonably efficiently. Of course, this strategy requires that all of your ID values are unique—serial IDs aren’t going to work here, because there’s no way to determine whether 1 refers to the user with ID 1 or the group with ID 1.

To be honest, this isn’t a great solution and if you wind up here then things have gone wrong. The crux here is that in our hypothetical scenario we have external consumers of our API which makes breaking changes exceptionally difficult. Your customers aren’t going to care that you carefully communicated the breaking change ahead of time—they’re only going to care that you broke their integration.

Most companies aren’t selling an API as a product, and so in most cases these changes are a lot easier to make. You can ensure that all other teams at your company are prepared for your upcoming breaking change because the number of consumers is smaller and you have a much closer relationship with them. In cases where your API is being consumed by internal stakeholders, I would recommend taking on the breaking change and going for either of the previous solutions—they’re both significantly easier to implement compared to this.

Architecture

GraphQL

Tutorial

« Newer post

OpenAI’s strange 502s

Older post »

Efficient data retention policies

Database-native type-prefixed ID fields

The “Node ID” pattern

The ID service

« Newer post

Older post »

Get in touch 👋