Skip to content

Commit 2eb4264

Browse files
authored
Persist impacted projects (#2418)
* Agent task for persisting drift runs * Refactor into helper * Fix table name
1 parent 759745b commit 2eb4264

File tree

9 files changed

+470
-2
lines changed

9 files changed

+470
-2
lines changed
Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
# Persist Detection Runs (Append‑Only)
2+
3+
This document outlines a minimal, append‑only design to persist every time we compute “impacted projects” for a PR. The goal is auditability and simple, reliable retrieval of the latest detection run, without extra counters or complex coordination.
4+
5+
## Summary
6+
- Add a single append‑only table `digger_detection_runs`.
7+
- Insert one row per detection run (PR and Issue Comment flows) with denormalized JSON payloads.
8+
- Use timestamps (`created_at`) to identify the latest run for a given PR.
9+
- No updates, no deletes.
10+
11+
## Scope
12+
- Persist detection runs for:
13+
- GitHub Pull Request events (`handlePullRequestEvent`).
14+
- GitHub Issue Comment events (`handleIssueCommentEvent`).
15+
- Denormalized JSON for impacted projects and source mappings.
16+
- Minimal model + writer method; errors are logged but do not break main flow.
17+
18+
Out of scope:
19+
- EE and OpenTaco.
20+
- Additional VCS (GitLab/Bitbucket) wiring (can be added later similarly).
21+
- Lock/PR inconsistency detection (future step once data is persisted).
22+
23+
## Schema (Postgres)
24+
Create a single table for append‑only detection runs.
25+
26+
```sql
27+
-- backend/migrations/20251107000100.sql
28+
CREATE TABLE "public"."digger_detection_runs" (
29+
"id" bigserial PRIMARY KEY,
30+
"created_at" timestamptz NOT NULL DEFAULT now(),
31+
"updated_at" timestamptz,
32+
"deleted_at" timestamptz,
33+
34+
"organisation_id" bigint NOT NULL,
35+
"repo_full_name" text NOT NULL,
36+
"pr_number" integer NOT NULL,
37+
38+
-- What triggered this detection
39+
"trigger_type" text NOT NULL, -- 'pull_request' | 'issue_comment'
40+
"trigger_action" text NOT NULL, -- e.g. opened | synchronize | reopened | comment | closed | converted_to_draft
41+
42+
-- Context
43+
"commit_sha" text,
44+
"default_branch" text,
45+
"target_branch" text,
46+
47+
-- Denormalized JSON payloads
48+
"labels_json" jsonb,
49+
"changed_files_json" jsonb,
50+
"impacted_projects_json" jsonb NOT NULL, -- array of projects
51+
"source_mapping_json" jsonb -- project -> impacting_locations[]
52+
);
53+
54+
-- Helpful indexes for lookups and listing latest runs per PR
55+
CREATE INDEX IF NOT EXISTS idx_ddr_org_repo_pr_created_at
56+
ON "public"."digger_detection_runs" ("organisation_id", "repo_full_name", "pr_number", "created_at" DESC);
57+
58+
CREATE INDEX IF NOT EXISTS idx_ddr_repo_pr
59+
ON "public"."digger_detection_runs" ("repo_full_name", "pr_number");
60+
61+
CREATE INDEX IF NOT EXISTS idx_ddr_deleted_at
62+
ON "public"."digger_detection_runs" ("deleted_at");
63+
```
64+
65+
Notes:
66+
- We reuse GORM’s soft‑delete columns via `gorm.Model` pattern (created_at/updated_at/deleted_at). We will not update or delete rows in code.
67+
- `impacted_projects_json` is required; empty array when zero impacted projects.
68+
69+
## JSON Shapes
70+
- impacted_projects_json (array of objects) — subset of project fields we already have in memory:
71+
```json
72+
[
73+
{
74+
"name": "app-us-east-1",
75+
"dir": "infra/app",
76+
"workspace": "default",
77+
"layer": 1,
78+
"workflow": "default",
79+
"terragrunt": false,
80+
"opentofu": false,
81+
"pulumi": false
82+
}
83+
]
84+
```
85+
86+
- source_mapping_json (object of arrays):
87+
```json
88+
{
89+
"app-us-east-1": { "impacting_locations": ["infra/app/modules/sg", "infra/app/main.tf"] }
90+
}
91+
```
92+
93+
- labels_json / changed_files_json: arrays of strings. When unavailable (e.g., labels in comment flows), pass null or empty array.
94+
95+
## Model (backend/models)
96+
Add a new model and writer. Keep it simple and append‑only.
97+
98+
```go
99+
// backend/models/detection_runs.go
100+
package models
101+
102+
import (
103+
"encoding/json"
104+
"gorm.io/datatypes"
105+
"gorm.io/gorm"
106+
)
107+
108+
type DetectionRun struct {
109+
gorm.Model
110+
OrganisationID uint
111+
RepoFullName string
112+
PrNumber int
113+
TriggerType string
114+
TriggerAction string
115+
CommitSHA string
116+
DefaultBranch string
117+
TargetBranch string
118+
LabelsJSON datatypes.JSON
119+
ChangedFilesJSON datatypes.JSON
120+
ImpactedProjectsJSON datatypes.JSON // required
121+
SourceMappingJSON datatypes.JSON
122+
}
123+
124+
// CreateDetectionRun inserts an append‑only detection run row.
125+
func (db *Database) CreateDetectionRun(run *DetectionRun) error {
126+
return db.GormDB.Create(run).Error
127+
}
128+
```
129+
130+
Helper mappers (in the same file) to convert from:
131+
- `[]digger_config.Project` → lightweight `[]struct{...}``json.Marshal`.
132+
- `map[string]digger_config.ProjectToSourceMapping``map[string]struct{ ImpactingLocations []string }``json.Marshal`.
133+
134+
## Controller Wiring
135+
We add writes at the moment we compute impacted projects successfully — before any early returns — so runs are recorded even if later steps decide to skip work (e.g., draft PRs).
136+
137+
1) Pull Request events
138+
- File: `backend/controllers/github_pull_request.go`
139+
- After:
140+
- `impactedProjects, impactedProjectsSourceMapping, _, err := github2.ProcessGitHubPullRequestEvent(...)`
141+
- And after fetching `changedFiles` (already available)
142+
- Insert:
143+
- Build the `DetectionRun` struct:
144+
- orgId, repoFullName, prNumber
145+
- trigger_type="pull_request", trigger_action=`*payload.Action`
146+
- commit_sha=payload.PullRequest.Head.GetSHA()
147+
- default_branch=`*payload.Repo.DefaultBranch`
148+
- target_branch=payload.PullRequest.Base.GetRef()
149+
- labels_json: PR label names (we already collect `labels``prLabelsStr`)
150+
- changed_files_json: from `changedFiles`
151+
- impacted_projects_json: from `impactedProjects`
152+
- source_mapping_json: from `impactedProjectsSourceMapping`
153+
- Call `models.DB.CreateDetectionRun(&run)`
154+
- On error: `slog.Error` and continue (do not fail the PR handler).
155+
156+
2) Issue Comment events
157+
- File: `backend/controllers/github_comment.go`
158+
- After:
159+
- `processEventResult, err := generic.ProcessIssueCommentEvent(...)`
160+
- Use `processEventResult.AllImpactedProjects` and `.ImpactedProjectsSourceMapping` (not the filtered subset)
161+
- We have `changedFiles` captured earlier in the handler
162+
- `prBranchName, _, targetBranch, _, err := ghService.GetBranchName(issueNumber)` → defaultBranch is `*payload.Repo.DefaultBranch`
163+
- `commitSha` available from earlier when loading config
164+
- Insert `CreateDetectionRun(...)` with:
165+
- trigger_type="issue_comment", trigger_action="comment"
166+
- Same fields as PR event with the appropriate sources.
167+
168+
## Error Handling
169+
- Persistence must be best‑effort: log and continue on errors to avoid impacting main workflows.
170+
- Use concise log fields: orgId, repoFullName, prNumber, counts of impacted projects and changed files.
171+
172+
## Queries (examples)
173+
- Latest detection run for a PR:
174+
```sql
175+
SELECT *
176+
FROM public.digger_detection_runs
177+
WHERE organisation_id = $1 AND repo_full_name = $2 AND pr_number = $3
178+
ORDER BY created_at DESC
179+
LIMIT 1;
180+
```
181+
182+
- All runs for a PR:
183+
```sql
184+
SELECT *
185+
FROM public.digger_detection_runs
186+
WHERE organisation_id = $1 AND repo_full_name = $2 AND pr_number = $3
187+
ORDER BY created_at DESC;
188+
```
189+
190+
## Testing
191+
- Unit tests:
192+
- Model round‑trip: marshal minimal and full payloads (empty impacted projects; multiple projects; multiple source locations) and `CreateDetectionRun` succeeds.
193+
- Controller integration tests (lightweight):
194+
- Simulate a PR event with no impacted projects → one row with empty `impacted_projects_json`.
195+
- Simulate a PR event with 2 impacted projects → row with expected JSON arrays.
196+
- Simulate an issue comment event → row with trigger_type="issue_comment".
197+
198+
## Rollout
199+
- Add migration.
200+
- Add model + writer method.
201+
- Wire controllers (PR and Issue Comment) to create detection runs.
202+
- Deploy; no backfill required. Data accrues on subsequent events.
203+
204+
## Risks / Considerations
205+
- Size of JSON fields: on very large PRs, `changed_files_json` can be big; acceptable for audit purposes, can be truncated later if needed.
206+
- Ordering by timestamp: adequate for our needs; if we ever need strict monotonic ordering under rare clock drifts, we could fall back to ID ordering as a tie‑breaker (`ORDER BY created_at DESC, id DESC`).
207+
- Privacy: Paths and labels are internal to the repo; acceptable within backend storage context.
208+
209+
## Work Items
210+
1) Create migration file `backend/migrations/20251107000100.sql` with schema above.
211+
2) Add `backend/models/detection_runs.go` with `DetectionRun` and `CreateDetectionRun`.
212+
3) Add light mappers for JSON serialization of projects and source mapping.
213+
4) PR controller: write detection run after computing impacts.
214+
5) Comment controller: write detection run after computing impacts.
215+
6) Add basic unit tests for model creation; optional controller tests.
216+

backend/controllers/github_comment.go

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -266,6 +266,26 @@ func handleIssueCommentEvent(gh utils.GithubClientProvider, payload *github.Issu
266266
impactedProjectsSourceMapping := processEventResult.ImpactedProjectsSourceMapping
267267
allImpactedProjects := processEventResult.AllImpactedProjects
268268

269+
// Persist detection run (append-only) for issue comment events using full impacted set
270+
var csha string
271+
if commitSha != nil {
272+
csha = *commitSha
273+
}
274+
recordDetectionRun(
275+
orgId,
276+
repoFullName,
277+
issueNumber,
278+
"issue_comment",
279+
"comment",
280+
csha,
281+
defaultBranch,
282+
targetBranch,
283+
prLabelsStr,
284+
changedFiles,
285+
allImpactedProjects,
286+
impactedProjectsSourceMapping,
287+
)
288+
269289
impactedProjectsForComment, err := generic.FilterOutProjectsFromComment(allImpactedProjects, commentBody)
270290
if err != nil {
271291
slog.Error("Error filtering out projects from comment",

backend/controllers/github_helpers.go

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -244,6 +244,46 @@ func TriggerDiggerJobs(ciBackend ci_backends.CiBackend, repoFullName string, rep
244244
return nil
245245
}
246246

247+
// recordDetectionRun persists a detection run for any trigger (PR or issue comment).
248+
func recordDetectionRun(
249+
organisationId uint,
250+
repoFullName string,
251+
number int,
252+
triggerType string, // e.g. "pull_request" | "issue_comment"
253+
triggerAction string, // e.g. PR action or "comment"
254+
commitSha string,
255+
defaultBranch string,
256+
targetBranch string,
257+
labels []string,
258+
changedFiles []string,
259+
impactedProjects []digger_config.Project,
260+
impactedProjectsSourceMapping map[string]digger_config.ProjectToSourceMapping,
261+
) {
262+
dr, derr := models.NewDetectionRun(
263+
organisationId,
264+
repoFullName,
265+
number,
266+
triggerType,
267+
triggerAction,
268+
commitSha,
269+
defaultBranch,
270+
targetBranch,
271+
labels,
272+
changedFiles,
273+
impactedProjects,
274+
impactedProjectsSourceMapping,
275+
)
276+
if derr != nil {
277+
slog.Error("Failed to build detection run payload", "number", number, "trigger", triggerType, "error", derr)
278+
return
279+
}
280+
if err := models.DB.CreateDetectionRun(dr); err != nil {
281+
slog.Error("Failed to persist detection run", "number", number, "trigger", triggerType, "error", err)
282+
return
283+
}
284+
slog.Debug("Persisted detection run", "number", number, "trigger", triggerType, "projects", len(impactedProjects))
285+
}
286+
247287
func GenerateTerraformFromCode(payload *github.IssueCommentEvent, commentReporterManager utils.CommentReporterManager, config *digger_config.DiggerConfig, defaultBranch string, ghService *github2.GithubService, repoOwner string, repoName string, commitSha *string, issueNumber int, branch *string) error {
248288
if !strings.HasPrefix(*payload.Comment.Body, "digger generate") {
249289
return nil
@@ -934,4 +974,3 @@ generate_projects:
934974
slog.Info("Created Digger repo", "repoId", repo.ID, "diggerRepoName", diggerRepoName)
935975
return repo, org, nil
936976
}
937-

backend/controllers/github_pull_request.go

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,22 @@ func handlePullRequestEvent(gh utils.GithubClientProvider, payload *github.PullR
162162
return fmt.Errorf("error processing event")
163163
}
164164

165+
// Persist detection run (append-only) right after impact calculation
166+
recordDetectionRun(
167+
organisationId,
168+
repoFullName,
169+
prNumber,
170+
"pull_request",
171+
action,
172+
commitSha,
173+
*payload.Repo.DefaultBranch,
174+
payload.PullRequest.Base.GetRef(),
175+
prLabelsStr,
176+
changedFiles,
177+
impactedProjects,
178+
impactedProjectsSourceMapping,
179+
)
180+
165181
jobsForImpactedProjects, coverAllImpactedProjects, err := github2.ConvertGithubPullRequestEventToJobs(payload, impactedProjects, nil, *config, false)
166182
if err != nil {
167183
slog.Error("Error converting event to jobs",

backend/go.mod

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -329,6 +329,7 @@ require (
329329
gopkg.in/urfave/cli.v1 v1.20.0 // indirect
330330
gopkg.in/yaml.v2 v2.4.0 // indirect
331331
gopkg.in/yaml.v3 v3.0.1 // indirect
332+
gorm.io/datatypes v1.2.7 // indirect
332333
gorm.io/driver/mysql v1.6.0 // indirect
333334
gorm.io/driver/sqlserver v1.6.1 // indirect
334335
k8s.io/klog v1.0.0 // indirect

backend/go.sum

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2824,6 +2824,8 @@ gopkg.in/yaml.v2 v2.4.0/go.mod h1:RDklbk79AGWmwhnvt/jBztapEOGDOx6ZbXqjP6csGnQ=
28242824
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
28252825
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
28262826
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
2827+
gorm.io/datatypes v1.2.7 h1:ww9GAhF1aGXZY3EB3cJPJ7//JiuQo7DlQA7NNlVaTdk=
2828+
gorm.io/datatypes v1.2.7/go.mod h1:M2iO+6S3hhi4nAyYe444Pcb0dcIiOMJ7QHaUXxyiNZY=
28272829
gorm.io/driver/mysql v1.4.0/go.mod h1:sSIebwZAVPiT+27jK9HIwvsqOGKx3YMPmrA3mBJR10c=
28282830
gorm.io/driver/mysql v1.6.0 h1:eNbLmNTpPpTOVZi8MMxCi2aaIm0ZpInbORNXDwyLGvg=
28292831
gorm.io/driver/mysql v1.6.0/go.mod h1:D/oCC2GWK3M/dqoLxnOlaNKmXz8WNTfcS9y5ovaSqKo=
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
-- Create "digger_detection_runs" table (append-only)
2+
CREATE TABLE "public"."digger_detection_runs" (
3+
"id" bigserial NOT NULL,
4+
"created_at" timestamptz NOT NULL DEFAULT now(),
5+
"updated_at" timestamptz NULL,
6+
"deleted_at" timestamptz NULL,
7+
8+
"organisation_id" bigint NOT NULL,
9+
"repo_full_name" text NOT NULL,
10+
"pr_number" integer NOT NULL,
11+
12+
-- What triggered this detection
13+
"trigger_type" text NOT NULL, -- 'pull_request' | 'issue_comment'
14+
"trigger_action" text NOT NULL, -- e.g. opened | synchronize | reopened | comment | closed | converted_to_draft
15+
16+
-- Context
17+
"commit_sha" text,
18+
"default_branch" text,
19+
"target_branch" text,
20+
21+
-- Denormalized JSON payloads
22+
"labels_json" jsonb,
23+
"changed_files_json" jsonb,
24+
"impacted_projects_json" jsonb NOT NULL,
25+
"source_mapping_json" jsonb,
26+
27+
PRIMARY KEY ("id")
28+
);
29+
30+
-- Helpful indexes for lookups and listing latest runs per PR
31+
CREATE INDEX IF NOT EXISTS idx_ddr_org_repo_pr_created_at
32+
ON "public"."digger_detection_runs" ("organisation_id", "repo_full_name", "pr_number", "created_at" DESC);
33+
34+
CREATE INDEX IF NOT EXISTS idx_ddr_repo_pr
35+
ON "public"."digger_detection_runs" ("repo_full_name", "pr_number");
36+
37+
CREATE INDEX IF NOT EXISTS idx_ddr_deleted_at
38+
ON "public"."digger_detection_runs" ("deleted_at");
39+

backend/migrations/atlas.sum

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
h1:jWoBs487iG65lDQFsl/k45k6w5yExxjs+1elkqh5xoI=
1+
h1:Bodw0wkkaLj7PUoyJBn2J/C1a0k3CEZzYZVP7A4930g=
22
20231227132525.sql h1:43xn7XC0GoJsCnXIMczGXWis9d504FAWi4F1gViTIcw=
33
20240115170600.sql h1:IW8fF/8vc40+eWqP/xDK+R4K9jHJ9QBSGO6rN9LtfSA=
44
20240116123649.sql h1:R1JlUIgxxF6Cyob9HdtMqiKmx/BfnsctTl5rvOqssQw=
@@ -66,3 +66,4 @@ h1:jWoBs487iG65lDQFsl/k45k6w5yExxjs+1elkqh5xoI=
6666
20250907140955.sql h1:LHINhHgrPwM/Sy1UeIS4Z3iUVp6kv3/UtiGZZ5/SE8k=
6767
20250910102133.sql h1:jBW3PuoCWZPJA8ZaXDAyRuA9LnGDQGxvL+HtjCn33DI=
6868
20251006225238.sql h1:L581xAn5IsYt9Srf1RnJLleLIQVlgLzp7FaAChAlCJw=
69+
20251107000100.sql h1:b3USfhlLulZ+6iL9a66Ddpy6uDcYmmyDGZLYzbEjuRA=

0 commit comments

Comments
 (0)