-
Notifications
You must be signed in to change notification settings - Fork 116
feat(perf): Cache changed files during event processing #2317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @aThorp96, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant performance enhancement by implementing a caching mechanism for changed files within the Git provider integrations. Previously, repeated calls to Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces caching for the GetFiles method in the Bitbucket Data Center, GitHub, and GitLab providers. This is a great performance enhancement that will reduce the number of API calls made during event processing. The implementation is consistent across all providers, and the tests have been updated to verify the new caching behavior and related metrics. The refactoring of test helper functions into a shared package is also a welcome improvement for maintainability. I've added a few suggestions to further reduce code duplication within the GetFiles methods, which should improve the code's readability and make it easier to maintain in the future.
| switch runevent.TriggerTarget { | ||
| case triggertype.PullRequest: | ||
| opts := &scm.ListOptions{Page: 1, Size: apiResponseLimit} | ||
| changedFiles := changedfiles.ChangedFiles{} | ||
| for { | ||
| changes, _, err := v.Client().PullRequests.ListChanges(ctx, OrgAndRepo, runevent.PullRequestNumber, opts) | ||
| if err != nil { | ||
| return changedfiles.ChangedFiles{}, fmt.Errorf("failed to list changes for pull request: %w", err) | ||
| } | ||
| if c.Deleted { | ||
| changedFiles.Deleted = append(changedFiles.Deleted, c.Path) | ||
| } | ||
| } | ||
|
|
||
| // In the Jenkins-x/go-scm package, the `isLastPage` field is not available, and the value of | ||
| // `response.Page.Last` is set to `0`. Therefore, to determine if there are more items to fetch, | ||
| // we can check if the length of the currently fetched items is less than the specified limit. | ||
| // If the length is less than the limit, it indicates that there are no more items to retrieve. | ||
| if len(changes) < apiResponseLimit { | ||
| break | ||
| } | ||
| for _, c := range changes { | ||
| changedFiles.All = append(changedFiles.All, c.Path) | ||
| if c.Added { | ||
| changedFiles.Added = append(changedFiles.Added, c.Path) | ||
| } | ||
| if c.Modified { | ||
| changedFiles.Modified = append(changedFiles.Modified, c.Path) | ||
| } | ||
| if c.Renamed { | ||
| changedFiles.Renamed = append(changedFiles.Renamed, c.Path) | ||
| } | ||
| if c.Deleted { | ||
| changedFiles.Deleted = append(changedFiles.Deleted, c.Path) | ||
| } | ||
| } | ||
|
|
||
| opts.Page++ | ||
| } | ||
| return changedFiles, nil | ||
| } | ||
| // In the Jenkins-x/go-scm package, the `isLastPage` field is not available, and the value of | ||
| // `response.Page.Last` is set to `0`. Therefore, to determine if there are more items to fetch, | ||
| // we can check if the length of the currently fetched items is less than the specified limit. | ||
| // If the length is less than the limit, it indicates that there are no more items to retrieve. | ||
| if len(changes) < apiResponseLimit { | ||
| break | ||
| } | ||
|
|
||
| if runevent.TriggerTarget == triggertype.Push { | ||
| opts := &scm.ListOptions{Page: 1, Size: apiResponseLimit} | ||
| changedFiles := changedfiles.ChangedFiles{} | ||
| for { | ||
| changes, _, err := v.Client().Git.ListChanges(ctx, OrgAndRepo, runevent.SHA, opts) | ||
| if err != nil { | ||
| return changedfiles.ChangedFiles{}, fmt.Errorf("failed to list changes for commit %s: %w", runevent.SHA, err) | ||
| opts.Page++ | ||
| } | ||
|
|
||
| for _, c := range changes { | ||
| changedFiles.All = append(changedFiles.All, c.Path) | ||
| if c.Added { | ||
| changedFiles.Added = append(changedFiles.Added, c.Path) | ||
| } | ||
| if c.Modified { | ||
| changedFiles.Modified = append(changedFiles.Modified, c.Path) | ||
| v.changedFiles = &changedFiles | ||
| case triggertype.Push: | ||
| opts := &scm.ListOptions{Page: 1, Size: apiResponseLimit} | ||
| changedFiles := changedfiles.ChangedFiles{} | ||
| for { | ||
| changes, _, err := v.Client().Git.ListChanges(ctx, OrgAndRepo, runevent.SHA, opts) | ||
| if err != nil { | ||
| return changedfiles.ChangedFiles{}, fmt.Errorf("failed to list changes for commit %s: %w", runevent.SHA, err) | ||
| } | ||
| if c.Renamed { | ||
| changedFiles.Renamed = append(changedFiles.Renamed, c.Path) | ||
|
|
||
| for _, c := range changes { | ||
| changedFiles.All = append(changedFiles.All, c.Path) | ||
| if c.Added { | ||
| changedFiles.Added = append(changedFiles.Added, c.Path) | ||
| } | ||
| if c.Modified { | ||
| changedFiles.Modified = append(changedFiles.Modified, c.Path) | ||
| } | ||
| if c.Renamed { | ||
| changedFiles.Renamed = append(changedFiles.Renamed, c.Path) | ||
| } | ||
| if c.Deleted { | ||
| changedFiles.Deleted = append(changedFiles.Deleted, c.Path) | ||
| } | ||
| } | ||
| if c.Deleted { | ||
| changedFiles.Deleted = append(changedFiles.Deleted, c.Path) | ||
|
|
||
| if len(changes) < apiResponseLimit { | ||
| break | ||
| } | ||
| } | ||
|
|
||
| if len(changes) < apiResponseLimit { | ||
| break | ||
| opts.Page++ | ||
| } | ||
|
|
||
| opts.Page++ | ||
| v.changedFiles = &changedFiles | ||
| default: | ||
| v.changedFiles = &changedfiles.ChangedFiles{} | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for processing the []*scm.Change slice is duplicated in the PullRequest and Push cases. This could be refactored into a helper function to reduce code duplication and improve maintainability.
For example, you could create a helper function like this:
func processScmChanges(changes []*scm.Change, changedFiles *changedfiles.ChangedFiles) {
for _, c := range changes {
changedFiles.All = append(changedFiles.All, c.Path)
if c.Added {
changedFiles.Added = append(changedFiles.Added, c.Path)
}
if c.Modified {
changedFiles.Modified = append(changedFiles.Modified, c.Path)
}
if c.Renamed {
changedFiles.Renamed = append(changedFiles.Renamed, c.Path)
}
if c.Deleted {
changedFiles.Deleted = append(changedFiles.Deleted, c.Path)
}
}
}And then call it from both case blocks.
| switch runevent.TriggerTarget { | ||
| case triggertype.PullRequest: | ||
| opt := &github.ListOptions{PerPage: v.PaginedNumber} | ||
| changedFiles := changedfiles.ChangedFiles{} | ||
| for { | ||
| repoCommit, resp, err := wrapAPI(v, "list_pull_request_files", func() ([]*github.CommitFile, *github.Response, error) { | ||
| return v.Client().PullRequests.ListFiles(ctx, runevent.Organization, runevent.Repository, runevent.PullRequestNumber, opt) | ||
| }) | ||
| if err != nil { | ||
| return changedfiles.ChangedFiles{}, err | ||
| } | ||
| for j := range repoCommit { | ||
| changedFiles.All = append(changedFiles.All, *repoCommit[j].Filename) | ||
| if *repoCommit[j].Status == "added" { | ||
| changedFiles.Added = append(changedFiles.Added, *repoCommit[j].Filename) | ||
| } | ||
| if *repoCommit[j].Status == "removed" { | ||
| changedFiles.Deleted = append(changedFiles.Deleted, *repoCommit[j].Filename) | ||
| } | ||
| if *repoCommit[j].Status == "modified" { | ||
| changedFiles.Modified = append(changedFiles.Modified, *repoCommit[j].Filename) | ||
| } | ||
| if *repoCommit[j].Status == "renamed" { | ||
| changedFiles.Renamed = append(changedFiles.Renamed, *repoCommit[j].Filename) | ||
| } | ||
| } | ||
| if resp.NextPage == 0 { | ||
| break | ||
| } | ||
| opt.Page = resp.NextPage | ||
| } | ||
| v.changedFiles = &changedFiles | ||
| case triggertype.Push: | ||
| changedFiles := changedfiles.ChangedFiles{} | ||
| rC, _, err := wrapAPI(v, "get_commit_files", func() (*github.RepositoryCommit, *github.Response, error) { | ||
| return v.Client().Repositories.GetCommit(ctx, runevent.Organization, runevent.Repository, runevent.SHA, &github.ListOptions{}) | ||
| }) | ||
| if err != nil { | ||
| return changedfiles.ChangedFiles{}, err | ||
| } | ||
| for j := range repoCommit { | ||
| changedFiles.All = append(changedFiles.All, *repoCommit[j].Filename) | ||
| if *repoCommit[j].Status == "added" { | ||
| changedFiles.Added = append(changedFiles.Added, *repoCommit[j].Filename) | ||
| for i := range rC.Files { | ||
| changedFiles.All = append(changedFiles.All, *rC.Files[i].Filename) | ||
| if *rC.Files[i].Status == "added" { | ||
| changedFiles.Added = append(changedFiles.Added, *rC.Files[i].Filename) | ||
| } | ||
| if *repoCommit[j].Status == "removed" { | ||
| changedFiles.Deleted = append(changedFiles.Deleted, *repoCommit[j].Filename) | ||
| if *rC.Files[i].Status == "removed" { | ||
| changedFiles.Deleted = append(changedFiles.Deleted, *rC.Files[i].Filename) | ||
| } | ||
| if *repoCommit[j].Status == "modified" { | ||
| changedFiles.Modified = append(changedFiles.Modified, *repoCommit[j].Filename) | ||
| if *rC.Files[i].Status == "modified" { | ||
| changedFiles.Modified = append(changedFiles.Modified, *rC.Files[i].Filename) | ||
| } | ||
| if *repoCommit[j].Status == "renamed" { | ||
| changedFiles.Renamed = append(changedFiles.Renamed, *repoCommit[j].Filename) | ||
| if *rC.Files[i].Status == "renamed" { | ||
| changedFiles.Renamed = append(changedFiles.Renamed, *rC.Files[i].Filename) | ||
| } | ||
| } | ||
| if resp.NextPage == 0 { | ||
| break | ||
| } | ||
| opt.Page = resp.NextPage | ||
| } | ||
| return changedFiles, nil | ||
| } | ||
|
|
||
| if runevent.TriggerTarget == "push" { | ||
| changedFiles := changedfiles.ChangedFiles{} | ||
| rC, _, err := wrapAPI(v, "get_commit_files", func() (*github.RepositoryCommit, *github.Response, error) { | ||
| return v.Client().Repositories.GetCommit(ctx, runevent.Organization, runevent.Repository, runevent.SHA, &github.ListOptions{}) | ||
| }) | ||
| if err != nil { | ||
| return changedfiles.ChangedFiles{}, err | ||
| } | ||
| for i := range rC.Files { | ||
| changedFiles.All = append(changedFiles.All, *rC.Files[i].Filename) | ||
| if *rC.Files[i].Status == "added" { | ||
| changedFiles.Added = append(changedFiles.Added, *rC.Files[i].Filename) | ||
| } | ||
| if *rC.Files[i].Status == "removed" { | ||
| changedFiles.Deleted = append(changedFiles.Deleted, *rC.Files[i].Filename) | ||
| } | ||
| if *rC.Files[i].Status == "modified" { | ||
| changedFiles.Modified = append(changedFiles.Modified, *rC.Files[i].Filename) | ||
| } | ||
| if *rC.Files[i].Status == "renamed" { | ||
| changedFiles.Renamed = append(changedFiles.Renamed, *rC.Files[i].Filename) | ||
| } | ||
| v.changedFiles = &changedFiles | ||
| default: | ||
| v.changedFiles = &changedfiles.ChangedFiles{} | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for processing []*github.CommitFile is duplicated for PullRequest and Push cases. This can be refactored into a helper function. Additionally, using a switch statement on *repoCommit[j].Status would be more readable and slightly more efficient than multiple if statements.
Example helper function:
func processGithubCommitFiles(files []*github.CommitFile, changedFiles *changedfiles.ChangedFiles) {
for i := range files {
filename := *files[i].Filename
changedFiles.All = append(changedFiles.All, filename)
switch *files[i].Status {
case "added":
changedFiles.Added = append(changedFiles.Added, filename)
case "removed":
changedFiles.Deleted = append(changedFiles.Deleted, filename)
case "modified":
changedFiles.Modified = append(changedFiles.Modified, filename)
case "renamed":
changedFiles.Renamed = append(changedFiles.Renamed, filename)
}
}
}Calls to `vcs.GetFiles()` only result in one or two API calls (paging aside). However the method is called each time a CEL expression references `"path".pathChanged()` or the `on-path-change` annotation is checked for matches. Because of this, because matching is evaluated several times in a row, and because some users have tens of PipelineRuns in their `.tekton` directory, in some cases a single push event may cause PaC to make hundreds of API requests listing the same files repeatedly.
By default the exhaustive CI rule does not count a `default` switch case as exhaustive. Also fixes typo in misspell settings
0a29e4f to
838d438
Compare
🔍 PR Lint Feedback
|
| } | ||
| if c.Renamed { | ||
| changedFiles.Renamed = append(changedFiles.Renamed, c.Path) | ||
| if v.changedFiles == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aThorp96 another approach is instead of making changes in every provider, I see that wherever in a func GetFiles is called, there has event reference. so can you please check in the code for changedFiles where the GetFiles is called, making changedFiles a field in event struct?
e.g.
if event.changedFiles == nil {
event.changedFiles = v.GetFiles(ctx, event)
}
📝 Description of the Change
Calls to
vcs.GetFiles()only result in one or two API calls (pagingaside). However the method is called each time a CEL expression
references
"path".pathChanged()or theon-path-changeannotation ischecked for matches. Because of this, because matching is evaluated
several times in a row, and because some users have tens of PipelineRuns
in their
.tektondirectory, in some cases a single push event may causePaC to make hundreds of API requests listing the same files repeatedly.
Also fixes a bug in the golang-ci settings and enables
exhaustive.default-signifies-exhaustive👨🏻 Linked Jira
https://issues.redhat.com/browse/SRVKP-9228
🔗 Linked GitHub Issue
Fixes #
🚀 Type of Change
fix:)feat:)feat!:,fix!:)docs:)chore:)refactor:)enhance:)deps:)🧪 Testing Strategy
🤖 AI Assistance
If you have used AI assistance, please provide the following details:
Which LLM was used?
Extent of AI Assistance:
Important
If the majority of the code in this PR was generated by an AI, please add a
Co-authored-bytrailer to your commit message.For example:
Co-authored-by: Gemini gemini@google.com
Co-authored-by: ChatGPT noreply@chatgpt.com
Co-authored-by: Claude noreply@anthropic.com
Co-authored-by: Cursor noreply@cursor.com
Co-authored-by: Copilot Copilot@users.noreply.github.com
**💡You can use the script
./hack/add-llm-coauthor.shto automatically addthese co-author trailers to your commits.
✅ Submitter Checklist
fix:,feat:) matches the "Type of Change" I selected above.make testandmake lintlocally to check for and fix anyissues. For an efficient workflow, I have considered installing
pre-commit and running
pre-commit installtoautomate these checks.