4 November, 2024•5 minute read

GraphQL collection lookups

One of the powerful design patterns that can be employed within a GraphQL schema is the concept of collection lookups. This pattern allows API consumers to retrieve specific elements from a collection using singular fields, enhancing both usability and performance.

To demonstrate this pattern, let’s imagine we’re building a blogging platform (perhaps for sophiabits.com!). The core data type we’ll focus on is a Post which—among other things—shall contain a list of arbitrary key-value pairs and a list of images. Attributes are used by bloggers to store little tidbits of information that could be consumed by a plugin system.

Below is a potential schema. Note the paired attribute/attributes and image/images fields:

Click to copy

scalar URL type Attribute {  key: String!  value: String!} type Image {  src: URL!} type Post {  attribute(key: String!): Attribute  "Arbitrary key-value metadata for this Post"  attributes: [Attribute!]!   image: Image  "List of feature images to display in a carousel"  images: [Image!]!   # ...}

When the singular field of the pair can be used to retrieve individual pieces of data from its corresponding list field, I call it a collection lookup. Collection lookups are a surprisingly versatile design pattern, and they feature in many GraphQL schemas I’ve built. Let’s explore the different “levels” of collection lookup, and think about how we might apply them to our blogging platform.

Types of collection lookup

“First element” lookups

This is the simplest form of collection lookup, and it usually happens accidentally as a result of changing requirements. Say that we originally only supported a single image per post, to be used as a hero image.

If our requirements change and we decide to support displaying a carousel of images for our hero, then we must change our schema. Introducing a new images field is simple enough, but we must also figure out what we want to do with the pre-existing image field. The path of least resistance is to keep it around, and have it function as an alias for images[0].

“By key” lookups

Sometimes a resource contains a collection of items, but as an API consumer you only care about one or two particular elements. The attributes field is a classic example of this. API consumers can stash arbitrary key-value pairs inside this field, and these attributes may be useful for a variety of different purposes. Here are some theoretical attributes we could be interested in storing:

discussOnHnUrl — A link to a Hacker News thread which discusses the post. Useful on the page for the actual post, as we could use it to render a link to the thread.
effort — A subjective measure of how much effort went in to writing the post. We’d use this for internal analytics purposes; the extent to which effort correlates to post success might inform our content strategy.
isFeatured — Determines whether the post is “featured.” We’d want to use this from both the blog post index page, and when rendering individual blog posts.

If we want to render a post for display, then we don’t care about attributes that are only used for analytical purposes. While we could filter these out ourselves in code, it would be much better if we were able to request only those attributes we require. GraphQL does not give us this capability natively, but we can implement it ourselves with an attribute(key: String!): Attribute field.

On its own this field doesn’t look so useful, but aliasing enables consumers to fetch all the attributes they care about with a single API request and even give them helpful names:

Click to copy

query GetPostAndSomeAttributes {  post(id: "post_xyz") {    id    title    # fetch 2x attributes we care about    discussOnHnUrl: attribute(key: "discussOnHnUrl") {      value    }    isFeatured: attribute(key: "featured") {      value    }  }}

What I love about this is we are essentially turning these custom attributes into first-class fields on our API object. Instead of the consumer needing to perform a potentially costly linear-time search over the attributes list, they’ll get back an object which looks like the following. It’s really elegant.

Click to copy

{  "data": {    "post": {      "id": "post_xxx",      "title": "My cool post!",      "discussOnHn": {        "value": "https://news.ycombinator.com/..."      },      "isFeatured": {        "value": "false"      }    }  }}

“Search” lookups

An extension of key lookups, why not support looking up collection elements based on non-primary key fields? We could turn our image field into a miniature search engine by allowing consumers to select images by their aspect ratio:

Click to copy

enum ImageLayoutType {  LANDSCAPE  PORTRAIT} type Post {  image(layout: ImageLayoutType): Image  images: [Image!]!  # ...}

You can add as many filters as you’d like. Perhaps you want to support looking up images based on OCRed text content. You can find a real-world example of this pattern in action inside Shopify’s Storefront API. A Shopify Product contains many different Variants, and the Product.variantBySelectedOptions field allows API consumers to find product variants based on their options like size, color, etc.

Search lookups are a great example of how we can push business logic into our GraphQL schema to keep consumer code lean. Once you start incorporating this into your schemas, you’ll find opportunities to apply it almost everywhere. In a previous article, I considered the possibility of a User.inGroup(id: ID!) field, for instance.

The true value of this pattern reveals itself when you begin working with large lists. Imagine if users were able to join thousands of groups: fetching a list of thousands of IDs would be disastrously inefficient if all you cared about was whether the user belonged to one group in particular. In REST we might have been tempted to add another endpoint—and take the hit of an extra network round trip—but in GraphQL we’re able to encode this functionality into the object type itself. As you only pay for what the client explicitly requests, there is no performance hit for your other API consumers.

“Transform” lookups

Now for the final example of this pattern, we’ll make a small tweak to the previous schema:

Click to copy

scalar Percent enum ImageFilterType {  BLUR  GRAYSCALE} type ImageFilter {  type: ImageFilterType!  intensity: Percent} type Post {  image(layout: ImageLayout, filter: ImageFilter): Image  images: [Image!]!  # ...}

The schema still supports looking up an image by its aspect ratio, but I’ve also added support for the consumer to apply some kind of transformation to the image they looked up.

Supporting an image transformation on the images list itself is difficult, because the compute required could be very large if we support large page sizes. It’s far more realistic to support this functionality when it’s scoped to only a single image at a time. It’s still possible for a consumer to retrieve multiple transformed image in a single round trip through aliasing, but it’s also easier for us to guard against expensive queries through query cost limits. It’s hard to apply cost limits to the list field because you can’t know how many elements are inside the list until your server actually starts resolving the field.

Again, this pattern is not limited to the specific example demonstrated here. In addition to image filters, it might make sense to let API consumers resize the image per their requirements, or to let them request images with different quality levels.

The advantage here is that you can shift expensive operations off of low-powered clients and on to your backend. Moving the compute requirements to your own system enables you to power higher-fidelity experiences that would otherwise be impossible to run smoothly on the user’s own device.

Naming considerations

The collection lookup pattern informs our naming decisions elsewhere in the schema. When naming list fields, you want to prefer names which have straightforward singular terms and vice versa. The attribute/attributes and image/images pairs feel good to use because going from the plural to the singular form is very intuitive.

Contrast the term attributes with metadata, which is the field Stripe use for their RESTful API. Because there is no capability for fetching only a subset of metadata fields via REST this works well enough, but in a context where we want to apply the collection lookup pattern the utility of the name metadata is limited. The singular form of “metadata” is “metadatum,” which isn’t as obvious as simply dropping an “s.”

Avoid terms like metadata which don’t have obvious singular/plural pairs. Names like attributes, fields, properties, or tags are all vastly superior choices.

« Newer post

Leveraging logprobs to build better generative AI systems

Older post »

Using the Sunset header with GraphQL

Types of collection lookup

“First element” lookups

“By key” lookups

“Search” lookups

“Transform” lookups

Naming considerations

« Newer post

Older post »

Get in touch 👋