-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Enhance documentation on error handling in GraphQL #2208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: source
Are you sure you want to change the base?
Conversation
Added section on modeling errors as data in GraphQL APIs, detailing recoverable and unrecoverable errors, and how to structure mutations and queries to handle errors effectively.
|
@hitherejoe is attempting to deploy a commit to the The GraphQL Foundation Team on Vercel. A member of the Team first needs to authorize it. |
|
|
martinbonnin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super excited by this, thanks for looking into it.
A bunch of questions. Plus I'd love to see a section about caching considerations. What is cached vs non-cached.
| These are errors that are not the users fault (developer errors), which are generally things that the user can’t recover from. For example, the user not being authenticated or a resource not being found. This will also include scenarios such as | ||
| server crashes, unhandled exceptions and exhausted resources (for example, memory or CPU) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also a sub-dimension: is "exceptional" vs "persistent".
exceptional: if the client retries, it will most likely not get an error.
- Out of memory error:
irrecoverable && exceptional - Missing resource:
irrecoverable && persistent
The persistent errors, you might want to model as data as well because then they become cacheable. So you don't do the same request over and over again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about the "missing resource" case.
This sounds like "I query for Hero:123, and that record is missing". In this case, I'd normally return null.
Maybe I'm not thinking of the right scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martinbonnin good point, do you mean that even in the case of a missing resource error, we would want this returned as modelled data? I can see the benefit here from caching - we previously had a NotFoundError that can be thrown in cases where a resource is not available, but I think the differentiator here would be, is it the developers fault or the users fault? We have been using typed errors for user recoverable scenarios where you need to communicate something to the user - I'm not saying that is necessarily how everyone should do it, but we were trying to find a balance between having many typed errors and keeping things simple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eddeee888 yea in the case of queries I can see that being valuable. We have been mainly modelling errors as data for mutations because we've been approaching typed errors as 'user recoverable errors'. We have had a few cases where recoverable errors have been needed for mutations, but it's not often. I think the case you have described will be valid for a lot of queries - in the case of our API users do not have direct control over the information that is being used for queries (so errors are likely our fault), so we tend to mark things as non-null and use the errors array if something goes wrong. Maybe there is a better approach to that though 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, I'd normally return null.
Right 👍 See also my comment below. I think we need both null and union.
we tend to mark things as non-null and use the errors array if something goes wrong
Yes! Also, you might be interested in graphql/graphql-spec#1163 to disable error propagation in those cases.
I think the differentiator here would be, is it the developers fault or the users fault?
Excellent question. I would say it's the user fault. They gave a wrong id. They can recover by navigating to another Track/Product/etc... If they keep going to the same missing product page, I don't want to do a network request every time. So probably model as null in this case or an union if evolution is a concern (see also still that same other discussion)
|
|
||
| For example, the user could be querying for data that requires them to upgrade their plan, or to update their app. These are user-recoverable errors and utilising errors as data can improve both the Developer and User experience in these scenarios. | ||
|
|
||
| While this is a likely to not be common when implementing queries, this approach allows us to return user recoverable errors when required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we think of this:
type Query {
hero(id: ID!): Hero # error if not found
hero(id: ID!): HeroPayload! # typed error if not found
}
I think (without any data to back it up) that hero(id: ID!): Hero is widely more used. But this is technically recoverable so hero(id: ID!): HeroPayload! might make more sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually go with the Payload approach in queries as well. As it gives clients the ability to handle partial error.
For example, let's say we want to query for two sets of heroes to compare side by side on a client, I'd write the operation like this:
query HeroCompare {
heroSetOne: heroes(ids: ["1","2", "3", "4", "5"]) {
id
name
}
heroSetTwo: heroes(ids: "6","7", "8", "9", "10") {
id
name
}
}In this case:
- If there's an error in either
heroSetOneorheroSetTwo, the whole query will error and we'd getdata: null - This could be disruptive for clients because we'd lose the ability to display the successful call
If we used the Payload approach though, if one fails (but properly handled in try/catch), the other may still complete and allows clients to handle partial failure
For example, this'd be the operation
query {
heroSetOne: heroesPayload(input: ["1","2","3","4","5"]) {
... on HeroesPayloadOk {
result {
id
name
}
}
... on ResultError {
__typename
}
}
heroSetTwo: heroesPayload(input: ["6","7","8","9","10"]) {
... on HeroesPayloadOk {
result {
id
name
}
}
... on ResultError {
__typename
}
}
}And if heroSetOne fails we'd get something like this:
{
"data": {
"heroPayloadSetOne": {
"__typename": "ResultError"
},
"heroPayloadSetTwo": {
"result": [
{
"id": "6",
"name": "Hero:6"
},
{
"id": "7",
"name": "Hero:7"
},
{
"id": "8",
"name": "Hero:8"
},
{
"id": "9",
"name": "Hero:9"
},
{
"id": "10",
"name": "Hero:10"
}
]
}
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also applicable to nested fields too.
Let's say we want to build a Hero details page. A Hero has Friends and Starships. Each may trigger requests to appropriate datasources:
type Hero {
friends: HeroFriendsPayload!
starships: HeroStarshipsPayload!
}If either friends or starships fail, clients may still have the option to display the successful call, rather than seeing a full page error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also lean on the Payload approach here - I personally prefer this in most cases anyway as it keeps things extensible for the future. It adds a small amount of overhead for the client (having to resolve the payload type and then the result, as opposed to directly getting back the result), but feels like a better long-term approach to allow for change
I agree that hero(id: ID!): Hero is more widely used and something came up as a want from our FE team to keep things simpler. While it could be a preference, I think we could communicate here the benefits of using the payload approach and give reasons for leaning it it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about modeling the absence of a resource as a (semantic null):
type Query {
"""
returns the track for this id or null if none is found.
"""f
track(id: ID!): Track
}This is the simplest.
Payload on the other hand gives more room for evolution. The day a track gets a geographical restriction or a copyright strike, it's easy to add more types:
union TrackPayload = TrackSuccess | TrackNotFound | TrackIsNotAvailableInYourCountry | TrackIsCopyrighted
type Query {
track(id: ID!): TrackPayload
}I can see room for both cases TBH. It's a conciseness vs evolvability tradeoff and different use cases may want both.
|
|
||
| #### Mutations | ||
| ##### Modelling Errors | ||
| Every mutation defined in the schema returns a Payload union - this allows mutations to return their success state, along with any user facing errors that may occur. This should follow the format of the mutation name, suffixed with Payload - e.g `{MutationName}Payload`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd root for {MutationName}Result personally but I have no idea what is used in the wild so I'd be happy to follow any existing pattern.
Whatever we decide, we should document it and make it a lint rule. See also graphql/graphiql#4000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small technical nit: it's probably {FieldName}Result more than {MutationName}Result because MutationName usually refers to the name of the mutation in the executable document, not in the schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martinbonnin for an example of our case, we have a "channelCreate" mutation, so we have a corresponding "ChannelCreatePayload". The reason for using payload instead of result was because our payload only ever contains the success state, and I felt like "Result" implies either a success or failure state (maybe I am being pedantic there though 😅)
Just checking, {FieldName}Result would still imply "ChannelCreateResult"? Just want to make sure I am following 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for using payload instead of result was because our payload only ever contains the success state,
I'm confused by the lines just below then. That seems the payload can also be an error?
union CreateHeroPayload = CreateHeroSuccess | HeroLimitReached | HeroValidationFailedJust checking,
{FieldName}Resultwould still imply "ChannelCreateResult"
Right. The mutation root field name is createChannel:
union ChannelResult = ChannelSuccess | BadId | NotAllowed
type Mutation {
createChannel(id: ID!): ChannelResult
}A "mutation name" would be there:
mutation CreateChannel {...}It's technically the same value here but in the language, "mutation name" and "field definition name" are very different AST nodes
|
|
||
| ```graphql | ||
| interface MutationError { | ||
| message: String! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm usually on the fence about servers returning errors as string:
- If there were multiple clients (web, mobile), there's no guarantee this message would work for all of them e.g. some devices could be narrower so a long message won't work
- A client could have more specific requirement e.g. timezone, translations, locale, etc.
So, usually I find my team bypassing the message and using the error type (either __typename or enum). This gives more power to the client to handle errors how they want, which can be more flexible (there's pros/cons of course)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea agreed on this, on our side clients are using the typed error to build the error message themselves. We need to enforce some type in this interface though to avoid an empty interface. I think in the actual typed errors contextual information will be provided, but we would still need to declare something in the MutationError 🤔
| ##### Consuming Errors | ||
|
|
||
| When it comes to consuming typed errors, clients can use the ... on pattern to consume specific errors being returned in the response. In some cases, clients will want to know exactly what error has occurred and then use this to communicate some information to the user, as well as possibly show a specific user-path to recover from that error. When this is the case, clients can consume the typed error directly within the mutation. | ||
| Clients only need to consume the specific typed errors that they need to handle. For errors that fall outside of this required, the catch-all `... on MutationError` can be used to consume remaining errors in a generic fashion to communicate the given message to the user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious if we should go more specific how to consume different types of errors?
For example, I usually use __typename, but would mentioning something like that be too granular here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's no harm in extra clarity here, I can add some extra examples here - __typename is a common approach so would likely be helpful!
Description
Added section on modeling errors as data in GraphQL APIs, detailing recoverable and unrecoverable errors, and how to structure mutations and queries to handle errors effectively.
This has been added based off some conversations with @Urigo and some of the team at The Guild.
A couple of notes: