Skip to content

Conversation

@ibreakthecloud
Copy link
Member

@ibreakthecloud ibreakthecloud commented Nov 7, 2025

Previously, performance regressions detected during the weekly microbenchmark comparison were only reported via Slack notifications. This made it difficult to track and ensure timely follow-up on regressions, as they were often discussed informally without formal issue tracking.

This change extends the existing --post-issues flag to work with the compare command. When enabled, the system automatically creates GitHub issues for performance regressions that exceed 20% (the "red" regression threshold). Each issue includes:

  • Package name and list of regressed benchmarks
  • Regression percentages and formatted deltas
  • Link to the Google Sheet with detailed comparison data
  • Labels: O-microbench and C-performance for easy filtering

The implementation reuses the same GitHub posting infrastructure and environment variables (GITHUB_BRANCH, GITHUB_SHA, GITHUB_BINARY) as the existing benchmark failure reporting. Issues are created per package to avoid spam, with up to 10 regressions listed in each issue summary.

Example GitHub Issue screenshot:
image

Epic: None
Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ibreakthecloud ibreakthecloud marked this pull request as ready for review November 11, 2025 05:04
@ibreakthecloud ibreakthecloud requested review from a team as code owners November 11, 2025 05:04
@ibreakthecloud ibreakthecloud requested review from golgeek and srosenberg and removed request for a team November 11, 2025 05:04
Previously, performance regressions detected during the weekly microbenchmark
comparison were only reported via Slack notifications. This made it difficult
to track and ensure timely follow-up on regressions, as they were often
discussed informally without formal issue tracking.

This change extends the existing `--post-issues` flag to work with the compare
command. When enabled, the system automatically creates GitHub issues for
performance regressions that exceed 20% (the "red" regression threshold). Each
issue includes:
- Package name and list of regressed benchmarks
- Regression percentages and formatted deltas
- Link to the Google Sheet with detailed comparison data
- Labels: O-microbench and C-performance for easy filtering

The implementation reuses the same GitHub posting infrastructure and
environment variables (GITHUB_BRANCH, GITHUB_SHA, GITHUB_BINARY) as the
existing benchmark failure reporting. Issues are created per package to avoid
spam, with up to 10 regressions listed in each issue summary.

This change also renames `postBenchmarkIssue` to `postIssuesToGitHub` for
consistency, as it now handles both execution failures and performance
regressions.

Epic: None
Release note: None
Copy link
Contributor

@rishabh7m rishabh7m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • How was this tested? Can you paste a screenshot or the link of any generated issue?
  • Does this change needs to be backported?

--sheet-desc="$sheet_description" \
${influx_token:+--influx-token="$influx_token"} \
${influx_host:+--influx-host="$influx_host"} \
${TRIGGERED_BUILD:+--post-issues} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is required and cannot be the default setting? Is there any similar flag available for the slack notification?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag already exists, I am just using it. See here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we'd only want this to happen on a triggered build. Since people are allowed to run custom comparisons which should not post (although it's not done often).

artifactsDir := fmt.Sprintf("%s/%s", e.outputDir, benchmarkResponse.key)
formatter, req := createBenchmarkPostRequest(artifactsDir, response, timeout)
err = postBenchmarkIssue(context.Background(), e.log, formatter, req)
err = postIssuesToGitHub(context.Background(), e.log, formatter, req)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this renaming is required?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is a generic, and can be used to post any issues to github.
In the beginning, I thought we are posting issue for two different events, benchmark and regression, but now I realise both are benchmark issues, and naming was correct. I will now revert this change.

}

// postIssuesToGitHub posts a benchmark issue to github.
func postIssuesToGitHub(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: postIssueToGitHub. It is taking post request for a single issue.

formatter, req := createRegressionPostRequest(pkgName, regressions, sheetLink, c.sheetDesc)
err := postIssuesToGitHub(c.ctx, l, formatter, req)
if err != nil {
return errors.Wrapf(err, "failed to post regression issue for package %s", pkgName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also not create the GitHub issue for the remaining regression packages if the earlier one failed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. I will log the error and continue.

@ibreakthecloud
Copy link
Member Author

ibreakthecloud commented Nov 11, 2025

@rishabh7m

How was this tested? Can you paste a screenshot or the link of any generated issue?

No, it was not, let me test and update this PR.

Does this change needs to be backported?

No I don't think so.

@ibreakthecloud
Copy link
Member Author

ibreakthecloud commented Nov 11, 2025

How was this tested? Can you paste a screenshot or the link of any generated issue?

I've added the unit test that verifies the format of issue posted during regression.

@herkolategan herkolategan self-requested a review November 12, 2025 15:55
@srosenberg
Copy link
Member

How was this tested? Can you paste a screenshot or the link of any generated issue?

I've added the unit test that verifies the format of issue posted during regression.

The unit test is great, but it would still be nice to test it end-to-end. You can create a dummy issue to serve as an example.

When enabled, the system automatically creates GitHub issues for performance regressions that exceed 20% (the "red" regression threshold). Each issue includes:

Package name and list of regressed benchmarks

In case of a misconfiguration (or other bug), what if every package results in a regression? We should limit the total (possible) number of created GH issues.

@ibreakthecloud
Copy link
Member Author

ibreakthecloud commented Nov 13, 2025

@srosenberg

How was this tested? Can you paste a screenshot or the link of any generated issue?

I've added the unit test that verifies the format of issue posted during regression.

The unit test is great, but it would still be nice to test it end-to-end. You can create a dummy issue to serve as an example.

Fair point, I will create a dummy issue and update the description.

In case of a misconfiguration (or other bug), what if every package results in a regression? We should limit the total (possible) number of created GH issues.

I will limit it to 5 issues.

This changes creates one GitHub issue per package with all severe regressions (skips creating issue, incase it there's none in the pkg). Currently there are 23 packages and from the historical slack messages in #perf-ops, I think on an average we get ~8 pkg with atleast one regression, it would be safe to put 10 issues as limit on stop creating. LMK your thoughts.

Also, would like to understand how exactly do you want to limit. Stop creating issues after 10 issues or donot even create one if there's more than 10.

PackageNameShort: req.PackageName,
}
title := formatter.Title(data)
if !strings.Contains(title, "pkg/sql") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit / Personal preference to some degree. but we use the require package to simplify these assertions a bit.

For instance could replace it with one of these:

require.Contains(t, title, "pkg/sql")

It already generates the message saying it expect X to contain Y. But you can add another arg in require.Contains(..., "should have contained ...") if you want to customize the message.

Copy link
Collaborator

@herkolategan herkolategan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to understand which team labels will be applied to each package regression issue. Since this adds the release-blocker label, it will become the duty of those team(s) to address or close this issue. Although an issue could end up containing benchmarks owned by multiple teams.

title := fmt.Sprintf("%s: performance regression", pkgName)
f := githubpost.MicrobenchmarkFailure(
pkgName,
title,
Copy link
Collaborator

@herkolategan herkolategan Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title here replaces the benchmarkName. This becomes the testName when we try to resolve the owning team (See: https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/bazci/githubpost/githubpost.go#L689)

The question here is in which team's bucket does this all end up? Should we be more intelligent here and determine all the regressed benchmark's teams?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants