Skip to content

Conversation

@michaelkaplan13
Copy link
Contributor

@michaelkaplan13 michaelkaplan13 commented Nov 6, 2025

Why this should be merged

Avoids GOAWAY errors from various API servers by avoiding client implementations that send extra unnecessary RST_STREAM and PING frames. The workaround is adopted from the Cloudflare blog linked below, and in comments.

How this works

See https://blog.cloudflare.com/go-and-enhance-your-calm/#reading-bodies-in-go-can-be-unintuitive

How this was tested

Using a minimal reproduction script here. On that branch, running the script with the changes in json.go should yield no errors, but removing the changes in json.go and running the script consistently results in ~125 GOAWAY errors.

Open to suggestions for how to better test it in an automated fashion. I could not think of a good way to do so myself since it depends on the external server setup.

Need to be documented in RELEASES.md?

No

Copilot AI review requested due to automatic review settings November 6, 2025 23:33
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR prevents GOAWAY errors from API servers by ensuring HTTP response bodies are fully consumed before closing connections. The changes implement a workaround for HTTP/2 connection management issues where closing a partially-read response body triggers unnecessary RST_STREAM and PING frames that can cause servers to send GOAWAY messages.

Key changes:

  • Added io.Copy(io.Discard, resp.Body) calls before closing response bodies in error paths and after successful parsing
  • Modified deferred resp.Body.Close() calls to deferred functions that first drain the body
  • Added the io import to files where it was previously not needed

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
utils/rpc/json.go Added body draining in error paths and after successful JSON decoding
utils/dynamicip/ifconfig_resolver.go Changed defer statement to drain body before closing
tests/fixture/tmpnet/node.go Changed defer statement to drain body before closing
tests/fixture/tmpnet/monitor_processes.go Changed defer statement to drain body before closing
tests/fixture/tmpnet/check_monitoring.go Changed defer statement to drain body before closing
api/metrics/client.go Added body draining in error paths and after successful metric parsing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 17 to 23
// CleanlyCloseBody avoids sending unnecessary RST_STREAM and PING frames by ensuring
// the whole body is read before being closed.
// See https://blog.cloudflare.com/go-and-enhance-your-calm/#reading-bodies-in-go-can-be-unintuitive
func CleanlyCloseBody(body io.ReadCloser) error {
_, _ = io.Copy(io.Discard, body)
return body.Close()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember when first adding the error returning logic to the prod clients that it was kind of "we should probably return the error if one happens"... But honestly, if the client was able to get the response... I feel like dropping the error isn't unreasonable.

Should we just ignore this everywhere and make this function not return an error? We could ignore the error once here, and then everywhere just defer CleanlyCloseBody(resp.Body)

Suggested change
// CleanlyCloseBody avoids sending unnecessary RST_STREAM and PING frames by ensuring
// the whole body is read before being closed.
// See https://blog.cloudflare.com/go-and-enhance-your-calm/#reading-bodies-in-go-can-be-unintuitive
func CleanlyCloseBody(body io.ReadCloser) error {
_, _ = io.Copy(io.Discard, body)
return body.Close()
}
// CleanlyCloseBody avoids sending unnecessary RST_STREAM and PING frames by
// ensuring the whole body is read before being closed.
// See https://blog.cloudflare.com/go-and-enhance-your-calm/#reading-bodies-in-go-can-be-unintuitive
func CleanlyCloseBody(body io.ReadCloser) {
_, _ = io.Copy(io.Discard, body)
_ = body.Close()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very reasonable and I'm happy to make this change because it cleans up the code, though I do think it's a potential behavior change outside the scope of the intention of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to my testing, but io.Copy can return a cancelation error since it respects the context - so we might want to bubble up that error instead.

_ = resp.Body.Close()
}()

ipBytes, err := io.ReadAll(resp.Body)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already read until EOF here - do we still need to drain the response body?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have the CleanlyCloseBody helper, I think it's most defensive to use it uniformly across the board rather than depending on the fact that the body is always fully read otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment in this diff in this package w.r.t io.ReadAll

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, now that we have the CleanlyCloseBody helper, I think it's most defensive to use it uniformly across the board rather than depending on the fact that the body is always fully read otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this client behavior update in this file? This is only used in e2e tests so I feel like this is over-kill for an environment that is just a smoke test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes a difference either way practically speaking, but I think applying best practices everywhere is better than not, particularly now that we're just uniformly deferring the call to the helper everywhere (which I realize is a change from when this comment was left). Thoughts?

@joshua-kim joshua-kim moved this to In Progress 🏗️ in avalanchego Nov 7, 2025
Co-authored-by: Stephen Buttolph <stephen@avalabs.org>
Copy link
Contributor

@StephenButtolph StephenButtolph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Hopefully this will be avoided upstream. But until then this seems to be a reasonable fix.

@StephenButtolph StephenButtolph added this pull request to the merge queue Nov 10, 2025
Merged via the queue into master with commit 5565231 Nov 10, 2025
35 checks passed
@StephenButtolph StephenButtolph deleted the michaelkaplan13/goaway-workaround branch November 10, 2025 20:30
@github-project-automation github-project-automation bot moved this from In Progress 🏗️ to Done 🎉 in avalanchego Nov 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

5 participants