|
| 1 | +# Persist Detection Runs (Append‑Only) |
| 2 | + |
| 3 | +This document outlines a minimal, append‑only design to persist every time we compute “impacted projects” for a PR. The goal is auditability and simple, reliable retrieval of the latest detection run, without extra counters or complex coordination. |
| 4 | + |
| 5 | +## Summary |
| 6 | +- Add a single append‑only table `digger_detection_runs`. |
| 7 | +- Insert one row per detection run (PR and Issue Comment flows) with denormalized JSON payloads. |
| 8 | +- Use timestamps (`created_at`) to identify the latest run for a given PR. |
| 9 | +- No updates, no deletes. |
| 10 | + |
| 11 | +## Scope |
| 12 | +- Persist detection runs for: |
| 13 | + - GitHub Pull Request events (`handlePullRequestEvent`). |
| 14 | + - GitHub Issue Comment events (`handleIssueCommentEvent`). |
| 15 | +- Denormalized JSON for impacted projects and source mappings. |
| 16 | +- Minimal model + writer method; errors are logged but do not break main flow. |
| 17 | + |
| 18 | +Out of scope: |
| 19 | +- EE and OpenTaco. |
| 20 | +- Additional VCS (GitLab/Bitbucket) wiring (can be added later similarly). |
| 21 | +- Lock/PR inconsistency detection (future step once data is persisted). |
| 22 | + |
| 23 | +## Schema (Postgres) |
| 24 | +Create a single table for append‑only detection runs. |
| 25 | + |
| 26 | +```sql |
| 27 | +-- backend/migrations/20251107000100.sql |
| 28 | +CREATE TABLE "public"."digger_detection_runs" ( |
| 29 | + "id" bigserial PRIMARY KEY, |
| 30 | + "created_at" timestamptz NOT NULL DEFAULT now(), |
| 31 | + "updated_at" timestamptz, |
| 32 | + "deleted_at" timestamptz, |
| 33 | + |
| 34 | + "organisation_id" bigint NOT NULL, |
| 35 | + "repo_full_name" text NOT NULL, |
| 36 | + "pr_number" integer NOT NULL, |
| 37 | + |
| 38 | + -- What triggered this detection |
| 39 | + "trigger_type" text NOT NULL, -- 'pull_request' | 'issue_comment' |
| 40 | + "trigger_action" text NOT NULL, -- e.g. opened | synchronize | reopened | comment | closed | converted_to_draft |
| 41 | + |
| 42 | + -- Context |
| 43 | + "commit_sha" text, |
| 44 | + "default_branch" text, |
| 45 | + "target_branch" text, |
| 46 | + |
| 47 | + -- Denormalized JSON payloads |
| 48 | + "labels_json" jsonb, |
| 49 | + "changed_files_json" jsonb, |
| 50 | + "impacted_projects_json" jsonb NOT NULL, -- array of projects |
| 51 | + "source_mapping_json" jsonb -- project -> impacting_locations[] |
| 52 | +); |
| 53 | + |
| 54 | +-- Helpful indexes for lookups and listing latest runs per PR |
| 55 | +CREATE INDEX IF NOT EXISTS idx_ddr_org_repo_pr_created_at |
| 56 | + ON "public"."digger_detection_runs" ("organisation_id", "repo_full_name", "pr_number", "created_at" DESC); |
| 57 | + |
| 58 | +CREATE INDEX IF NOT EXISTS idx_ddr_repo_pr |
| 59 | + ON "public"."digger_detection_runs" ("repo_full_name", "pr_number"); |
| 60 | + |
| 61 | +CREATE INDEX IF NOT EXISTS idx_ddr_deleted_at |
| 62 | + ON "public"."digger_detection_runs" ("deleted_at"); |
| 63 | +``` |
| 64 | + |
| 65 | +Notes: |
| 66 | +- We reuse GORM’s soft‑delete columns via `gorm.Model` pattern (created_at/updated_at/deleted_at). We will not update or delete rows in code. |
| 67 | +- `impacted_projects_json` is required; empty array when zero impacted projects. |
| 68 | + |
| 69 | +## JSON Shapes |
| 70 | +- impacted_projects_json (array of objects) — subset of project fields we already have in memory: |
| 71 | +```json |
| 72 | +[ |
| 73 | + { |
| 74 | + "name": "app-us-east-1", |
| 75 | + "dir": "infra/app", |
| 76 | + "workspace": "default", |
| 77 | + "layer": 1, |
| 78 | + "workflow": "default", |
| 79 | + "terragrunt": false, |
| 80 | + "opentofu": false, |
| 81 | + "pulumi": false |
| 82 | + } |
| 83 | +] |
| 84 | +``` |
| 85 | + |
| 86 | +- source_mapping_json (object of arrays): |
| 87 | +```json |
| 88 | +{ |
| 89 | + "app-us-east-1": { "impacting_locations": ["infra/app/modules/sg", "infra/app/main.tf"] } |
| 90 | +} |
| 91 | +``` |
| 92 | + |
| 93 | +- labels_json / changed_files_json: arrays of strings. When unavailable (e.g., labels in comment flows), pass null or empty array. |
| 94 | + |
| 95 | +## Model (backend/models) |
| 96 | +Add a new model and writer. Keep it simple and append‑only. |
| 97 | + |
| 98 | +```go |
| 99 | +// backend/models/detection_runs.go |
| 100 | +package models |
| 101 | + |
| 102 | +import ( |
| 103 | + "encoding/json" |
| 104 | + "gorm.io/datatypes" |
| 105 | + "gorm.io/gorm" |
| 106 | +) |
| 107 | + |
| 108 | +type DetectionRun struct { |
| 109 | + gorm.Model |
| 110 | + OrganisationID uint |
| 111 | + RepoFullName string |
| 112 | + PrNumber int |
| 113 | + TriggerType string |
| 114 | + TriggerAction string |
| 115 | + CommitSHA string |
| 116 | + DefaultBranch string |
| 117 | + TargetBranch string |
| 118 | + LabelsJSON datatypes.JSON |
| 119 | + ChangedFilesJSON datatypes.JSON |
| 120 | + ImpactedProjectsJSON datatypes.JSON // required |
| 121 | + SourceMappingJSON datatypes.JSON |
| 122 | +} |
| 123 | + |
| 124 | +// CreateDetectionRun inserts an append‑only detection run row. |
| 125 | +func (db *Database) CreateDetectionRun(run *DetectionRun) error { |
| 126 | + return db.GormDB.Create(run).Error |
| 127 | +} |
| 128 | +``` |
| 129 | + |
| 130 | +Helper mappers (in the same file) to convert from: |
| 131 | +- `[]digger_config.Project` → lightweight `[]struct{...}` → `json.Marshal`. |
| 132 | +- `map[string]digger_config.ProjectToSourceMapping` → `map[string]struct{ ImpactingLocations []string }` → `json.Marshal`. |
| 133 | + |
| 134 | +## Controller Wiring |
| 135 | +We add writes at the moment we compute impacted projects successfully — before any early returns — so runs are recorded even if later steps decide to skip work (e.g., draft PRs). |
| 136 | + |
| 137 | +1) Pull Request events |
| 138 | +- File: `backend/controllers/github_pull_request.go` |
| 139 | +- After: |
| 140 | + - `impactedProjects, impactedProjectsSourceMapping, _, err := github2.ProcessGitHubPullRequestEvent(...)` |
| 141 | + - And after fetching `changedFiles` (already available) |
| 142 | +- Insert: |
| 143 | + - Build the `DetectionRun` struct: |
| 144 | + - orgId, repoFullName, prNumber |
| 145 | + - trigger_type="pull_request", trigger_action=`*payload.Action` |
| 146 | + - commit_sha=payload.PullRequest.Head.GetSHA() |
| 147 | + - default_branch=`*payload.Repo.DefaultBranch` |
| 148 | + - target_branch=payload.PullRequest.Base.GetRef() |
| 149 | + - labels_json: PR label names (we already collect `labels` → `prLabelsStr`) |
| 150 | + - changed_files_json: from `changedFiles` |
| 151 | + - impacted_projects_json: from `impactedProjects` |
| 152 | + - source_mapping_json: from `impactedProjectsSourceMapping` |
| 153 | + - Call `models.DB.CreateDetectionRun(&run)` |
| 154 | + - On error: `slog.Error` and continue (do not fail the PR handler). |
| 155 | + |
| 156 | +2) Issue Comment events |
| 157 | +- File: `backend/controllers/github_comment.go` |
| 158 | +- After: |
| 159 | + - `processEventResult, err := generic.ProcessIssueCommentEvent(...)` |
| 160 | + - Use `processEventResult.AllImpactedProjects` and `.ImpactedProjectsSourceMapping` (not the filtered subset) |
| 161 | + - We have `changedFiles` captured earlier in the handler |
| 162 | + - `prBranchName, _, targetBranch, _, err := ghService.GetBranchName(issueNumber)` → defaultBranch is `*payload.Repo.DefaultBranch` |
| 163 | + - `commitSha` available from earlier when loading config |
| 164 | +- Insert `CreateDetectionRun(...)` with: |
| 165 | + - trigger_type="issue_comment", trigger_action="comment" |
| 166 | + - Same fields as PR event with the appropriate sources. |
| 167 | + |
| 168 | +## Error Handling |
| 169 | +- Persistence must be best‑effort: log and continue on errors to avoid impacting main workflows. |
| 170 | +- Use concise log fields: orgId, repoFullName, prNumber, counts of impacted projects and changed files. |
| 171 | + |
| 172 | +## Queries (examples) |
| 173 | +- Latest detection run for a PR: |
| 174 | +```sql |
| 175 | +SELECT * |
| 176 | +FROM public.digger_detection_runs |
| 177 | +WHERE organisation_id = $1 AND repo_full_name = $2 AND pr_number = $3 |
| 178 | +ORDER BY created_at DESC |
| 179 | +LIMIT 1; |
| 180 | +``` |
| 181 | + |
| 182 | +- All runs for a PR: |
| 183 | +```sql |
| 184 | +SELECT * |
| 185 | +FROM public.digger_detection_runs |
| 186 | +WHERE organisation_id = $1 AND repo_full_name = $2 AND pr_number = $3 |
| 187 | +ORDER BY created_at DESC; |
| 188 | +``` |
| 189 | + |
| 190 | +## Testing |
| 191 | +- Unit tests: |
| 192 | + - Model round‑trip: marshal minimal and full payloads (empty impacted projects; multiple projects; multiple source locations) and `CreateDetectionRun` succeeds. |
| 193 | +- Controller integration tests (lightweight): |
| 194 | + - Simulate a PR event with no impacted projects → one row with empty `impacted_projects_json`. |
| 195 | + - Simulate a PR event with 2 impacted projects → row with expected JSON arrays. |
| 196 | + - Simulate an issue comment event → row with trigger_type="issue_comment". |
| 197 | + |
| 198 | +## Rollout |
| 199 | +- Add migration. |
| 200 | +- Add model + writer method. |
| 201 | +- Wire controllers (PR and Issue Comment) to create detection runs. |
| 202 | +- Deploy; no backfill required. Data accrues on subsequent events. |
| 203 | + |
| 204 | +## Risks / Considerations |
| 205 | +- Size of JSON fields: on very large PRs, `changed_files_json` can be big; acceptable for audit purposes, can be truncated later if needed. |
| 206 | +- Ordering by timestamp: adequate for our needs; if we ever need strict monotonic ordering under rare clock drifts, we could fall back to ID ordering as a tie‑breaker (`ORDER BY created_at DESC, id DESC`). |
| 207 | +- Privacy: Paths and labels are internal to the repo; acceptable within backend storage context. |
| 208 | + |
| 209 | +## Work Items |
| 210 | +1) Create migration file `backend/migrations/20251107000100.sql` with schema above. |
| 211 | +2) Add `backend/models/detection_runs.go` with `DetectionRun` and `CreateDetectionRun`. |
| 212 | +3) Add light mappers for JSON serialization of projects and source mapping. |
| 213 | +4) PR controller: write detection run after computing impacts. |
| 214 | +5) Comment controller: write detection run after computing impacts. |
| 215 | +6) Add basic unit tests for model creation; optional controller tests. |
| 216 | + |
0 commit comments