Enhance documentation on error handling in GraphQL #2208

hitherejoe · 2025-10-29T14:05:54Z

Description

Added section on modeling errors as data in GraphQL APIs, detailing recoverable and unrecoverable errors, and how to structure mutations and queries to handle errors effectively.

This has been added based off some conversations with @Urigo and some of the team at The Guild.

A couple of notes:

I added this as a new part of the Errors section. While the "Errors as Data" section is quite long, this felt like the best fit.
I've adjusted the examples to match the existing operations that are used in the page, happy to tweak things more if need-be!

Added section on modeling errors as data in GraphQL APIs, detailing recoverable and unrecoverable errors, and how to structure mutations and queries to handle errors effectively.

vercel · 2025-10-29T14:05:58Z

@hitherejoe is attempting to deploy a commit to the The GraphQL Foundation Team on Vercel.

A member of the Team first needs to authorize it.

linux-foundation-easycla · 2025-10-29T14:06:01Z

The committers listed above are authorized under a signed CLA.

✅ login: hitherejoe / name: Joe Birch (b7cded6)

martinbonnin

Super excited by this, thanks for looking into it.

A bunch of questions. Plus I'd love to see a section about caching considerations. What is cached vs non-cached.

martinbonnin · 2025-10-29T14:50:00Z

src/pages/learn/response.mdx

+  These are errors that are not the users fault (developer errors), which are generally things that the user can’t recover from. For example, the user not being authenticated or a resource not being found. This will also include scenarios such as   
+  server crashes, unhandled exceptions and exhausted resources (for example, memory or CPU)


There is also a sub-dimension: is "exceptional" vs "persistent".

exceptional: if the client retries, it will most likely not get an error.

Out of memory error: irrecoverable && exceptional

Missing resource: irrecoverable && persistent

The persistent errors, you might want to model as data as well because then they become cacheable. So you don't do the same request over and over again.

I'm curious about the "missing resource" case.

This sounds like "I query for Hero:123, and that record is missing". In this case, I'd normally return null.

Maybe I'm not thinking of the right scenario?

@martinbonnin good point, do you mean that even in the case of a missing resource error, we would want this returned as modelled data? I can see the benefit here from caching - we previously had a NotFoundError that can be thrown in cases where a resource is not available, but I think the differentiator here would be, is it the developers fault or the users fault? We have been using typed errors for user recoverable scenarios where you need to communicate something to the user - I'm not saying that is necessarily how everyone should do it, but we were trying to find a balance between having many typed errors and keeping things simple

@eddeee888 yea in the case of queries I can see that being valuable. We have been mainly modelling errors as data for mutations because we've been approaching typed errors as 'user recoverable errors'. We have had a few cases where recoverable errors have been needed for mutations, but it's not often. I think the case you have described will be valid for a lot of queries - in the case of our API users do not have direct control over the information that is being used for queries (so errors are likely our fault), so we tend to mark things as non-null and use the errors array if something goes wrong. Maybe there is a better approach to that though 😁

In this case, I'd normally return null.

Right 👍 See also my comment below. I think we need both null and union.

we tend to mark things as non-null and use the errors array if something goes wrong

Yes! Also, you might be interested in graphql/graphql-spec#1163 to disable error propagation in those cases.

I think the differentiator here would be, is it the developers fault or the users fault?

Excellent question. I would say it's the user fault. They gave a wrong id. They can recover by navigating to another Track/Product/etc... If they keep going to the same missing product page, I don't want to do a network request every time. So probably model as null in this case or an union if evolution is a concern (see also still that same other discussion)

martinbonnin · 2025-10-29T15:00:13Z

src/pages/learn/response.mdx

+
+For example, the user could be querying for data that requires them to upgrade their plan, or to update their app. These are user-recoverable errors and utilising errors as data can improve both the Developer and User experience in these scenarios.
+
+While this is a likely to not be common when implementing queries, this approach allows us to return user recoverable errors when required.


What do we think of this:

type Query { hero(id: ID!): Hero # error if not found hero(id: ID!): HeroPayload! # typed error if not found }

I think (without any data to back it up) that hero(id: ID!): Hero is widely more used. But this is technically recoverable so hero(id: ID!): HeroPayload! might make more sense?

I usually go with the Payload approach in queries as well. As it gives clients the ability to handle partial error.

For example, let's say we want to query for two sets of heroes to compare side by side on a client, I'd write the operation like this:

query HeroCompare { heroSetOne: heroes(ids: ["1","2", "3", "4", "5"]) { id name } heroSetTwo: heroes(ids: "6","7", "8", "9", "10") { id name } }

In this case:

If there's an error in either heroSetOne or heroSetTwo, the whole query will error and we'd get data: null

This could be disruptive for clients because we'd lose the ability to display the successful call

If we used the Payload approach though, if one fails (but properly handled in try/catch), the other may still complete and allows clients to handle partial failure

For example, this'd be the operation

query { heroSetOne: heroesPayload(input: ["1","2","3","4","5"]) { ... on HeroesPayloadOk { result { id name } } ... on ResultError { __typename } } heroSetTwo: heroesPayload(input: ["6","7","8","9","10"]) { ... on HeroesPayloadOk { result { id name } } ... on ResultError { __typename } } }

And if heroSetOne fails we'd get something like this:

{ "data": { "heroPayloadSetOne": { "__typename": "ResultError" }, "heroPayloadSetTwo": { "result": [ { "id": "6", "name": "Hero:6" }, { "id": "7", "name": "Hero:7" }, { "id": "8", "name": "Hero:8" }, { "id": "9", "name": "Hero:9" }, { "id": "10", "name": "Hero:10" } ] } } }

This is also applicable to nested fields too.

Let's say we want to build a Hero details page. A Hero has Friends and Starships. Each may trigger requests to appropriate datasources:

type Hero { friends: HeroFriendsPayload! starships: HeroStarshipsPayload! }

If either friends or starships fail, clients may still have the option to display the successful call, rather than seeing a full page error.

I would also lean on the Payload approach here - I personally prefer this in most cases anyway as it keeps things extensible for the future. It adds a small amount of overhead for the client (having to resolve the payload type and then the result, as opposed to directly getting back the result), but feels like a better long-term approach to allow for change

I agree that hero(id: ID!): Hero is more widely used and something came up as a want from our FE team to keep things simpler. While it could be a preference, I think we could communicate here the benefits of using the payload approach and give reasons for leaning it it

What about modeling the absence of a resource as a (semantic null):

type Query { """ returns the track for this id or null if none is found. """f track(id: ID!): Track }

This is the simplest.

Payload on the other hand gives more room for evolution. The day a track gets a geographical restriction or a copyright strike, it's easy to add more types:

union TrackPayload = TrackSuccess | TrackNotFound | TrackIsNotAvailableInYourCountry | TrackIsCopyrighted type Query { track(id: ID!): TrackPayload }

I can see room for both cases TBH. It's a conciseness vs evolvability tradeoff and different use cases may want both.

martinbonnin · 2025-10-29T15:02:00Z

src/pages/learn/response.mdx

+
+#### Mutations
+##### Modelling Errors
+Every mutation defined in the schema returns a Payload union - this allows mutations to return their success state, along with any user facing errors that may occur. This should follow the format of the mutation name, suffixed with Payload - e.g `{MutationName}Payload`.


I'd root for {MutationName}Result personally but I have no idea what is used in the wild so I'd be happy to follow any existing pattern.

Whatever we decide, we should document it and make it a lint rule. See also graphql/graphiql#4000

Small technical nit: it's probably {FieldName}Result more than {MutationName}Result because MutationName usually refers to the name of the mutation in the executable document, not in the schema.

@martinbonnin for an example of our case, we have a "channelCreate" mutation, so we have a corresponding "ChannelCreatePayload". The reason for using payload instead of result was because our payload only ever contains the success state, and I felt like "Result" implies either a success or failure state (maybe I am being pedantic there though 😅)

Just checking, {FieldName}Result would still imply "ChannelCreateResult"? Just want to make sure I am following 😁

The reason for using payload instead of result was because our payload only ever contains the success state,

I'm confused by the lines just below then. That seems the payload can also be an error?

union CreateHeroPayload = CreateHeroSuccess | HeroLimitReached | HeroValidationFailed

Just checking, {FieldName}Result would still imply "ChannelCreateResult"

Right. The mutation root field name is createChannel:

union ChannelResult = ChannelSuccess | BadId | NotAllowed type Mutation { createChannel(id: ID!): ChannelResult }

A "mutation name" would be there:

mutation CreateChannel {...}

It's technically the same value here but in the language, "mutation name" and "field definition name" are very different AST nodes

eddeee888 · 2025-10-30T12:52:33Z

src/pages/learn/response.mdx

+
+```graphql
+interface MutationError {
+  message: String!


I'm usually on the fence about servers returning errors as string:

If there were multiple clients (web, mobile), there's no guarantee this message would work for all of them e.g. some devices could be narrower so a long message won't work

A client could have more specific requirement e.g. timezone, translations, locale, etc.

So, usually I find my team bypassing the message and using the error type (either __typename or enum). This gives more power to the client to handle errors how they want, which can be more flexible (there's pros/cons of course)

Yea agreed on this, on our side clients are using the typed error to build the error message themselves. We need to enforce some type in this interface though to avoid an empty interface. I think in the actual typed errors contextual information will be provided, but we would still need to declare something in the MutationError 🤔

eddeee888 · 2025-10-30T13:24:34Z

src/pages/learn/response.mdx

+##### Consuming Errors
+
+When it comes to consuming typed errors, clients can use the ... on pattern to consume specific errors being returned in the response. In some cases, clients will want to know exactly what error has occurred and then use this to communicate some information to the user, as well as possibly show a specific user-path to recover from that error. When this is the case, clients can consume the typed error directly within the mutation.
+Clients only need to consume the specific typed errors that they need to handle. For errors that fall outside of this required, the catch-all `... on MutationError` can be used to consume remaining errors in a generic fashion to communicate the given message to the user.


I'm curious if we should go more specific how to consume different types of errors?
For example, I usually use __typename, but would mentioning something like that be too granular here?

I think there's no harm in extra clarity here, I can add some extra examples here - __typename is a common approach so would likely be helpful!

Enhance documentation on error handling in GraphQL

b7cded6

Added section on modeling errors as data in GraphQL APIs, detailing recoverable and unrecoverable errors, and how to structure mutations and queries to handle errors effectively.

martinbonnin reviewed Oct 29, 2025

View reviewed changes

Urigo requested review from benjie and enisdenjo October 29, 2025 15:04

eddeee888 reviewed Oct 30, 2025

View reviewed changes

		These are errors that are not the users fault (developer errors), which are generally things that the user can’t recover from. For example, the user not being authenticated or a resource not being found. This will also include scenarios such as
		server crashes, unhandled exceptions and exhausted resources (for example, memory or CPU)


		For example, the user could be querying for data that requires them to upgrade their plan, or to update their app. These are user-recoverable errors and utilising errors as data can improve both the Developer and User experience in these scenarios.

		While this is a likely to not be common when implementing queries, this approach allows us to return user recoverable errors when required.

Enhance documentation on error handling in GraphQL #2208

Are you sure you want to change the base?

Enhance documentation on error handling in GraphQL #2208

Uh oh!

Conversation

hitherejoe commented Oct 29, 2025

Description

Uh oh!

vercel bot commented Oct 29, 2025

Uh oh!

linux-foundation-easycla bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martinbonnin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eddeee888 Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eddeee888 Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eddeee888 Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eddeee888 Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

linux-foundation-easycla bot commented Oct 29, 2025 •

edited

Loading

eddeee888 Oct 30, 2025 •

edited

Loading

eddeee888 Oct 30, 2025 •

edited

Loading

eddeee888 Oct 30, 2025 •

edited

Loading

eddeee888 Oct 30, 2025 •

edited

Loading