-
Notifications
You must be signed in to change notification settings - Fork 4k
roachprod-microbench: post GitHub issues for performance regressions #157045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
roachprod-microbench: post GitHub issues for performance regressions #157045
Conversation
Previously, performance regressions detected during the weekly microbenchmark comparison were only reported via Slack notifications. This made it difficult to track and ensure timely follow-up on regressions, as they were often discussed informally without formal issue tracking. This change extends the existing `--post-issues` flag to work with the compare command. When enabled, the system automatically creates GitHub issues for performance regressions that exceed 20% (the "red" regression threshold). Each issue includes: - Package name and list of regressed benchmarks - Regression percentages and formatted deltas - Link to the Google Sheet with detailed comparison data - Labels: O-microbench and C-performance for easy filtering The implementation reuses the same GitHub posting infrastructure and environment variables (GITHUB_BRANCH, GITHUB_SHA, GITHUB_BINARY) as the existing benchmark failure reporting. Issues are created per package to avoid spam, with up to 10 regressions listed in each issue summary. This change also renames `postBenchmarkIssue` to `postIssuesToGitHub` for consistency, as it now handles both execution failures and performance regressions. Epic: None Release note: None
dbaabf1 to
f71f5f2
Compare
rishabh7m
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- How was this tested? Can you paste a screenshot or the link of any generated issue?
- Does this change needs to be backported?
| --sheet-desc="$sheet_description" \ | ||
| ${influx_token:+--influx-token="$influx_token"} \ | ||
| ${influx_host:+--influx-host="$influx_host"} \ | ||
| ${TRIGGERED_BUILD:+--post-issues} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this is required and cannot be the default setting? Is there any similar flag available for the slack notification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flag already exists, I am just using it. See here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we'd only want this to happen on a triggered build. Since people are allowed to run custom comparisons which should not post (although it's not done often).
| artifactsDir := fmt.Sprintf("%s/%s", e.outputDir, benchmarkResponse.key) | ||
| formatter, req := createBenchmarkPostRequest(artifactsDir, response, timeout) | ||
| err = postBenchmarkIssue(context.Background(), e.log, formatter, req) | ||
| err = postIssuesToGitHub(context.Background(), e.log, formatter, req) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this renaming is required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is a generic, and can be used to post any issues to github.
In the beginning, I thought we are posting issue for two different events, benchmark and regression, but now I realise both are benchmark issues, and naming was correct. I will now revert this change.
| } | ||
|
|
||
| // postIssuesToGitHub posts a benchmark issue to github. | ||
| func postIssuesToGitHub( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: postIssueToGitHub. It is taking post request for a single issue.
| formatter, req := createRegressionPostRequest(pkgName, regressions, sheetLink, c.sheetDesc) | ||
| err := postIssuesToGitHub(c.ctx, l, formatter, req) | ||
| if err != nil { | ||
| return errors.Wrapf(err, "failed to post regression issue for package %s", pkgName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will also not create the GitHub issue for the remaining regression packages if the earlier one failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. I will log the error and continue.
No, it was not, let me test and update this PR.
No I don't think so. |
I've added the unit test that verifies the format of issue posted during regression. |
The unit test is great, but it would still be nice to test it end-to-end. You can create a dummy issue to serve as an example.
In case of a misconfiguration (or other bug), what if every package results in a regression? We should limit the total (possible) number of created GH issues. |
Fair point, I will create a dummy issue and update the description.
This changes creates one GitHub issue per package with all severe regressions (skips creating issue, incase it there's none in the pkg). Currently there are 23 packages and from the historical slack messages in #perf-ops, I think on an average we get ~8 pkg with atleast one regression, it would be safe to put 10 issues as limit on stop creating. LMK your thoughts. Also, would like to understand how exactly do you want to limit. Stop creating issues after 10 issues or donot even create one if there's more than 10. |
| PackageNameShort: req.PackageName, | ||
| } | ||
| title := formatter.Title(data) | ||
| if !strings.Contains(title, "pkg/sql") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit / Personal preference to some degree. but we use the require package to simplify these assertions a bit.
For instance could replace it with one of these:
require.Contains(t, title, "pkg/sql")
It already generates the message saying it expect X to contain Y. But you can add another arg in require.Contains(..., "should have contained ...") if you want to customize the message.
herkolategan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to understand which team labels will be applied to each package regression issue. Since this adds the release-blocker label, it will become the duty of those team(s) to address or close this issue. Although an issue could end up containing benchmarks owned by multiple teams.
| title := fmt.Sprintf("%s: performance regression", pkgName) | ||
| f := githubpost.MicrobenchmarkFailure( | ||
| pkgName, | ||
| title, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The title here replaces the benchmarkName. This becomes the testName when we try to resolve the owning team (See: https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/bazci/githubpost/githubpost.go#L689)
The question here is in which team's bucket does this all end up? Should we be more intelligent here and determine all the regressed benchmark's teams?
Previously, performance regressions detected during the weekly microbenchmark comparison were only reported via Slack notifications. This made it difficult to track and ensure timely follow-up on regressions, as they were often discussed informally without formal issue tracking.
This change extends the existing
--post-issuesflag to work with the compare command. When enabled, the system automatically creates GitHub issues for performance regressions that exceed 20% (the "red" regression threshold). Each issue includes:The implementation reuses the same GitHub posting infrastructure and environment variables (GITHUB_BRANCH, GITHUB_SHA, GITHUB_BINARY) as the existing benchmark failure reporting. Issues are created per package to avoid spam, with up to 10 regressions listed in each issue summary.
Example GitHub Issue screenshot:

Epic: None
Release note: None