You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
149479: roachtest: exit with failure on github post errors r=herkolategan,DarrylWong a=williamchoe3
Fixes#147116
### Changes
#### Highlevel Changes
Added a new failure path first by
* adding a new counter in `testRunner` struct which get's incremented when `github.MaybePost()` (called in `testRunner.runWorkers()` and `testRunner.runTests()` )returns an error. When this count > 0, `testRunner.Run()` will return a new error `errGithubPostFailed` and when `main()` sees that error, it will return a new exit code `12` which will fail the pipeline (unlike exit code 10, 11)
* ^ very similar to how provisioning errors are tracked and returned to `main()`
* does not trigger test short circuiting mechanism because `testRunner.runWorkers()` doesn't return an error
```
type testRunner struct {
...
// numGithubPostErrs Counts GitHub post errors across all workers
numGithubPostErrs int32
...
}
...
issue, err := github.MaybePost(t, issueInfo, l, output, params) // TODO add cluster specific args here
if err != nil {
shout(ctx, l, stdout, "failed to post issue: %s", err)
atomic.AddInt32(&r.numGithubPostErrs, 1)
}
```
#### Design
In order to do verification via unit tests, i'm used to using something like Python's magic mock, but that's not available in GoLang so i opted for a Dependency Injection approach. (This was the best I could come up with, I wanted to avoid "if unit test, do this" logic. If anyone has any other approaches / suggestions let me know!)
I made a new interface `GithubPoster` in such a way that the original `githubIssues` implements that new interface. I then pass this interface in function signatures all the way from `Run()` to `runTests()`. Then in the unit tests, I could pass a different implementation of `GithubPoster` that has a `MaybePost()` that always fails.
`github.go`
```
type GithubPoster interface {
MaybePost(
t *testImpl, issueInfo *githubIssueInfo, l *logger.Logger, message string,
params map[string]string) (
*issues.TestFailureIssue, error)
}
```
Another issue with this approach is the original `githubIssues` has information that is cluster specific, but because of dependency injection, it's now a shared struct among all the workers, so it doesn't make sense to store certain fields that are worker dependent.
For the fields that are worker specific, I created a new struct `githubIssueInfo` that is created in `runWorkers()`, similar to how `githubIssues` used to be created there.
Note: I don't love the name `githubIssueInfo`, but i wanted to stick with a similar naming convention to `githubIssues`, open to name suggestions
```
// Original githubIssues
type githubIssues struct {
disable bool
cluster *clusterImpl
vmCreateOpts *vm.CreateOpts
issuePoster func(context.Context, issues.Logger, issues.IssueFormatter, issues.PostRequest,
*issues.Options) (*issues.TestFailureIssue, error)
teamLoader func() (team.Map, error)
}
// New githubIssues
type githubIssues struct {
disable bool
issuePoster func(context.Context, issues.Logger, issues.IssueFormatter, issues.PostRequest,
*issues.Options) (*issues.TestFailureIssue, error)
teamLoader func() (team.Map, error)
}
```
All this was very verbose and didn't love that i had to change all the function signatures to do this, open to other ways to do verification.
### Misc
Also first time writing in Go in like ~3 years very open to general go semantic feedback / best practices / design patterns
### Verification
Diff of binary I used to manually confirm if you wanna see where I hardcoded to return errors: 611adcc
#### Manual Test Logs
> ➜ cockroach git:(wchoe/147116-github-err-will-fail-pipeline) ✗ tmp/roachtest run acceptance/build-info --cockroach /Users/wchoe/work/cockroachdb/cockroach/bin_linux/cockroach
> ...
> Running tests which match regex "acceptance/build-info" and are compatible with cloud "gce".
>
> fallback runner logs in: artifacts/roachtest.crdb.log
> 2025/07/09 00:51:48 run.go:386: test runner logs in: artifacts/_runner-logs/test_runner-1752022308.log
> test runner logs in: artifacts/_runner-logs/test_runner-1752022308.log
> HTTP server listening on port 56238 on localhost: http://localhost:56238/
> 2025/07/09 00:51:48 run.go:148: global random seed: 1949199437086051249
> 2025/07/09 00:51:48 test_runner.go:398: test_run_id: will.choe-1752022308
> test_run_id: will.choe-1752022308
> [w0] 2025/07/09 00:51:48 work_pool.go:198: Acquired quota for 16 CPUs
> [w0] 2025/07/09 00:51:48 cluster.go:3204: Using randomly chosen arch="amd64", acceptance/build-info
> [w0] 2025/07/09 00:51:48 test_runner.go:798: Unable to create (or reuse) cluster for test acceptance/build-info due to: mocking.
> Unable to create (or reuse) cluster for test acceptance/build-info due to: mocking.
> 2025/07/09 00:51:48 test_impl.go:478: test failure #1: full stack retained in failure_1.log: (test_runner.go:873).func4: mocking [owner=test-eng]
> 2025/07/09 00:51:48 test_impl.go:200: Runtime assertions disabled
> [w0] 2025/07/09 00:51:48 test_runner.go:883: failed to post issue: mocking
> failed to post issue: mocking
> [w0] 2025/07/09 00:51:48 test_runner.go:1019: test failed: acceptance/build-info (run 1)
> [w0] 2025/07/09 00:51:48 test_runner.go:732: Releasing quota for 16 CPUs
> [w0] 2025/07/09 00:51:48 test_runner.go:744: No work remaining; runWorker is bailing out...
> No work remaining; runWorker is bailing out...
> [w0] 2025/07/09 00:51:48 test_runner.go:643: Worker exiting; no cluster to destroy.
> 2025/07/09 00:51:48 test_runner.go:460: PASS
> PASS
> 2025/07/09 00:51:48 test_runner.go:465: 1 clusters could not be created and 1 errors occurred while posting to github
> 1 clusters could not be created and 1 errors occurred while posting to github
> 2025/07/09 00:51:48 run.go:200: runTests destroying all clusters
> Error: some clusters could not be created
> failed to POST to GitHub
> ➜ cockroach git:(wchoe/147116-github-err-will-fail-pipeline) ✗ echo $?
> 12
149913: crosscluster/physical: persist standby poller progress r=dt a=msbutler
This patch sets the standby poller job's resolved time to the system time that standby descriptors have been updated to. This allows a reader tenant user to easily check that the poller job is running smoothly via SHOW JOB.
Epic: none
Release note: none
Co-authored-by: William Choe <williamchoe3@gmail.com>
Co-authored-by: Michael Butler <butler@cockroachlabs.com>
0 commit comments