-
Notifications
You must be signed in to change notification settings - Fork 136
feat: speedup migration #1173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: speedup migration #1173
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughMultiple SQL migrations switch OFFSET/LIMIT pagination to row_number()-based batching, add row_number to views/CTEs, introduce batched/temp-table updates with per-batch commits for moves, adjust batch bounds semantics in some migrations, and convert one migration into an early no-op notice. (≤50 words) Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql (1)
1-66: Consider adding a brief deprecation comment to clarify the intentional disabling.Migration 28 (
28-fix-pcv-missing-asset) is active and successfully replaces this migration with an improved row_number()-based pagination approach. The core logic is identical—both filter on the same dataset (insertion_date < version_id 12) and updatepost_commit_volumesusing the same transformation. On fresh deployments, migration 27 runs as inert and migration 28 handles all processing. On existing databases, migration 27's previous work remains intact and migration 28 will not reprocess already-populated data unnecessarily.However, adding a brief comment would improve clarity during maintenance:
set search_path = '{{ .Schema }}'; +-- Superseded by migration 28 (uses row_number() for better pagination performance) -- drop table if exists moves_view;
🧹 Nitpick comments (4)
internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql (1)
25-37: Index should be optimized for row_number filtering.The WHERE clause at line 37 filters on
row_number >= _offset and row_number < _offset + _batch_size, but the index at line 25 is created on (log_id, id), not row_number. This will cause PostgreSQL to perform a sequential scan of txs_view instead of an index-based range query.Consider optimizing the index to support row_number filtering:
create index txs_view_idx on txs_view(row_number) include (log_id);This allows the range query on row_number to use the index and avoids scanning the entire view for each batch.
internal/storage/bucket/migrations/29-fix-invalid-metadata-on-reverts/up.sql (1)
28-40: Index should support row_number range queries.The WHERE clause at line 40 filters on
row_number >= _offset and row_number < _offset + _batch_size, but the index at line 28 is on (reversedTransactionID). This forces a sequential scan of txs_view for each batch iteration.Consider refactoring the index to:
create index txs_view_idx on txs_view(row_number) include (reversedTransactionID);This enables efficient range queries on row_number and avoids scanning the entire temporary table per batch.
internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql (1)
26-38: Index should be optimized for row_number range filtering.The WHERE clause at line 38 filters on
row_number >= _offset and row_number < _offset + _batch_size, but the index at line 26 is on (log_id, id). This will cause sequential scans of txs_view during batch iteration.Consider optimizing the index:
create index txs_view_idx on txs_view(row_number) include (log_id, id);This allows PostgreSQL to use the index for row_number range queries, improving batch iteration performance.
internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql (1)
20-29: Index should prioritize row_number for efficient batch filtering.The index at line 20 on seq supports the UPDATE join but does not support the WHERE clause range query on row_number at line 29. PostgreSQL will perform a sequential scan of txs_view for each batch.
Consider creating an index on row_number as well:
create index txs_view_row_number_idx on txs_view(row_number);This enables efficient range queries on row_number during batch iteration, while the existing seq index supports the UPDATE join condition.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (8)
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql(1 hunks)internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql(2 hunks)internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql(1 hunks)internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql(3 hunks)internal/storage/bucket/migrations/29-fix-invalid-metadata-on-reverts/up.sql(2 hunks)internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql(2 hunks)internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql(2 hunks)internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql(2 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-05-20T13:48:07.455Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
Applied to files:
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sqlinternal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sqlinternal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql
📚 Learning: 2025-11-28T10:06:01.381Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1167
File: internal/storage/bucket/migrations/43-fix-missing-inserted-at-in-log-data/up.sql:23-23
Timestamp: 2025-11-28T10:06:01.381Z
Learning: In PostgreSQL migrations for the Formance ledger, to ensure ISO 8601/RFC3339 date formatting when setting JSONB fields, use the pattern `to_jsonb(date)#>>'{}'` to extract the timestamp as a properly formatted string. This leverages JSONB's built-in ISO 8601 encoding. Using `date::text` would depend on the server's DateStyle setting and doesn't guarantee ISO 8601 format.
Applied to files:
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
📚 Learning: 2025-04-29T11:24:28.923Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 892
File: internal/controller/ledger/controller_default.go:196-196
Timestamp: 2025-04-29T11:24:28.923Z
Learning: In the ledger Import function, it's critical to maintain proper log ID tracking by updating lastLogID with the current log.ID after each processed log, rather than setting it to nil. This ensures the system can properly validate the ordering of logs and prevent duplicate or out-of-order processing, which is essential for maintaining data integrity in the ledger.
Applied to files:
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
📚 Learning: 2025-07-19T18:35:34.260Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
Applied to files:
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sqlinternal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sqlinternal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: cubic · AI code reviewer
- GitHub Check: Tests
- GitHub Check: Dirty
- GitHub Check: GoReleaser
- GitHub Check: Tests
- GitHub Check: Dirty
🔇 Additional comments (3)
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (1)
8-39: Batch iteration using row_number window function is correctly implemented.The pattern of creating a temporary table with row_number(), indexing on row_number with included columns, and batching via row_number bounds is well-structured. Index optimization and per-batch commits should improve migration performance.
internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql (1)
10-62: Well-optimized batch processing with efficient indexing.The index at line 30 on row_number with included columns provides optimal support for both the WHERE clause filtering and the UPDATE join. The batch iteration pattern and
if not foundexit condition are sound.internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (1)
21-46: Batch processing correctly refactored with optimal index design.The migration cleanly replaces date-based temporal filtering with type-based filtering and implements row_number-based pagination with an optimally-designed index (line 26) that includes all columns needed for the UPDATE join. The added
inserted_at is nullcondition (line 42) also ensures idempotency.
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql
Outdated
Show resolved
Hide resolved
3845306 to
7770a87
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql (1)
39-53: Clarify variable shadowing between CTE name and column name.The UPDATE statement uses
dataas both the CTE name (FROM data) and the table column being updated (logs.data). PostgreSQL resolves this correctly—CTE-qualified references likedata.idresolve to the CTE—but the overlap is confusing for readers. Consider renaming the CTE to something likebatch_datato distinguish it from thelogs.datacolumn.update logs - set data = data || jsonb_build_object( + set data = batch_data || jsonb_build_object( 'revertedTransaction', jsonb_build_object( - 'id', data.id, - 'postings', data.postings::jsonb, - 'metadata', data.metadata, + 'id', batch_data.id, + 'postings', batch_data.postings::jsonb, + 'metadata', batch_data.metadata, 'reverted', true, - 'revertedAt', to_json(data.reverted_at)#>>'{}' || 'Z', - 'insertedAt', to_json(data.inserted_at)#>>'{}' || 'Z', - 'timestamp', to_json(data.timestamp)#>>'{}' || 'Z', - 'reference', case when data.reference is not null and data.reference <> '' then data.reference end, - 'postCommitVolumes', data.post_commit_volumes + 'revertedAt', to_json(batch_data.reverted_at)#>>'{}' || 'Z', + 'insertedAt', to_json(batch_data.inserted_at)#>>'{}' || 'Z', + 'timestamp', to_json(batch_data.timestamp)#>>'{}' || 'Z', + 'reference', case when batch_data.reference is not null and batch_data.reference <> '' then batch_data.reference end, + 'postCommitVolumes', batch_data.post_commit_volumes )) - from data + from batch_data where logs.id = data.log_id and - logs.ledger = data.ledger; + logs.ledger = batch_data.ledger;
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (6)
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql(1 hunks)internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql(3 hunks)internal/storage/bucket/migrations/29-fix-invalid-metadata-on-reverts/up.sql(2 hunks)internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql(2 hunks)internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql(2 hunks)internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql
- internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
📚 Learning: 2025-05-20T13:48:07.455Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
Applied to files:
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql
📚 Learning: 2025-07-19T18:35:34.260Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
Applied to files:
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Tests
- GitHub Check: Dirty
🔇 Additional comments (4)
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (2)
8-23: Temporary table and index setup is sound.The temporary table correctly captures rows needing updates, and the covering index on
row_numberwithINCLUDEcolumns is well-designed for the batch range queries that follow. Per-table analysis after population is good practice.
25-39: Loop iteration now correctly starts at 1, resolving off-by-one concern.The batched update logic is correct. The loop now properly aligns with PostgreSQL's
row_number()semantics by starting at1, which resolves the off-by-one issue flagged in the previous review. The range conditionrow_number >= i and row_number < i + _batch_sizecorrectly partitions all rows without gaps or overlaps. Per-batch commits reduce lock contention on themovestable—a key optimization for migration performance.internal/storage/bucket/migrations/29-fix-invalid-metadata-on-reverts/up.sql (1)
20-56: Row_number batching implementation is sound.The refactor from OFFSET/LIMIT to row_number-based pagination is implemented cleanly. Deterministic ordering via
order by transactions.seqensures consistent batching, and the per-batch commits reduce lock contention during the migration. Index onreversedTransactionIDappropriately supports the join condition.internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql (1)
10-21: Verify temporary table declaration and foreign key necessity.Line 10 creates a table without the TEMP keyword, unlike migrations 29 and 33 which use
create temp table. Confirm whether this should becreate temp tablefor consistency and to ensure the table is session-scoped and automatically dropped.Additionally, the foreign key constraint on line 21 is unusual for temporary tables. While indexes (line 20) are standard for optimizing joins, foreign keys are primarily for data integrity. Verify whether the foreign key serves a purpose beyond the index or if it can be removed. The comment "speed up hash join" suggests intent for query optimization, which indexes typically provide; the foreign key may be redundant.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1173 +/- ##
=======================================
Coverage 81.43% 81.43%
=======================================
Files 190 190
Lines 9334 9334
=======================================
Hits 7601 7601
- Misses 1228 1229 +1
+ Partials 505 504 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
7770a87 to
3f0c2c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql (2)
3-4: Remove unused variable declarations.Since all code using
_offsetand_batch_sizeis commented out, these declarations are dead code and should be removed.do $$ declare - _offset integer := 0; - _batch_size integer := 1000; begin set search_path = '{{ .Schema }}';
8-64: Replace commented-out implementation with explicit supersession marker.The entire migration logic is commented out because migration 28 supersedes this one. Rather than leaving 57 lines of commented code, replace it with a single-line comment clearly documenting the supersession intent. This improves maintainability and signals to future readers why the migration is inert.
begin set search_path = '{{ .Schema }}'; - -- [58 lines of commented-out implementation omitted...] + -- Migration 27 is superseded by migration 28 (fix-pcv-missing-asset). + -- All post-commit volume fixes are now handled by migration 28 using row_number-based batching. end $$;This keeps the file intact (in case the migration framework requires it) while making intent explicit and removing the cognitive burden of maintaining large commented-out blocks.
internal/storage/bucket/migrations/29-fix-invalid-metadata-on-reverts/up.sql (1)
28-28: Remove potentially unused index.The index on
reversedTransactionIDappears redundant:
- Not used in the WHERE clause (which filters by
row_number)- Not used in the UPDATE join (which filters from an already-sliced temp table)
Temporary table indexes are cleaned up with the table, but removing this index reduces migration overhead.
- create index txs_view_idx on txs_view(reversedTransactionID);
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (8)
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql(1 hunks)internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql(2 hunks)internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql(1 hunks)internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql(3 hunks)internal/storage/bucket/migrations/29-fix-invalid-metadata-on-reverts/up.sql(2 hunks)internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql(2 hunks)internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql(2 hunks)internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql
- internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
📚 Learning: 2025-07-19T18:35:34.260Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
Applied to files:
internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sqlinternal/storage/bucket/migrations/27-fix-invalid-pcv/up.sqlinternal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql
📚 Learning: 2025-05-20T13:48:07.455Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
Applied to files:
internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sqlinternal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql
🔇 Additional comments (7)
internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql (3)
11-12: Excellent row_number-based batching refactor.The migration effectively addresses the performance goal by replacing OFFSET/LIMIT pagination with row_number()-based range filtering. The index design on
row_numberwith included columns (transactions_seq, volumes) is optimal for covering the CTE query (lines 42–46), and per-batch commits (line 61) reduce lock contention during long-running updates.Also applies to: 30-30, 45-45
32-32: Foreign key and early-exit optimizations are sound.The foreign key constraint (line 32) supports hash-join acceleration during the UPDATE, and the empty-view check (lines 34–37) avoids unnecessary work. Both align with prior learnings on acceptable lock strategies for migration-scoped temporary tables.
Also applies to: 34-37
42-62: Verify termination logic handles all edge cases.The loop termination condition (line 52:
if not found then) relies on the UPDATE matching zero rows when the row_number range exhausts all data. This is correct, but ensure the logic is resilient:
- First batch:
row_number >= 0 and row_number < 1000— processes rows 1–1000; if rows exist, UPDATE succeeds,found = true, loop continues.- Final batch:
row_number >= N and row_number < N + 1000— range is empty or depleted; UPDATE matches 0 rows,found = false, loop exits.Confirm through testing that the cursor position (
_offset) advances correctly and no rows are skipped or double-processed across batch boundaries.internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql (1)
19-61: Row_number-based batching implementation is correct and performant.The migration correctly implements the row_number pagination pattern: stable ordering by
transactions.seqensures deterministic batching across iterations, the range filter (row_number >= _offset and row_number < _offset + _batch_size) avoids expensive OFFSET scanning, and per-batch commits reduce lock contention. The temporary index ontxs_view(log_id, id)and early exit for empty results are both optimizations well-suited for a large migration. This aligns with the optimization pattern in the PR.internal/storage/bucket/migrations/29-fix-invalid-metadata-on-reverts/up.sql (2)
20-20: ✓ Excellent row_number-based batching optimization.The migration correctly replaces inefficient OFFSET/LIMIT pagination with deterministic row_number range filtering. The window function is ordered by
transactions.seq, ensuring consistent batch boundaries and deterministic processing across restarts. The exit conditionexit when not foundproperly detects when all rows have been processed.Also applies to: 40-40
44-45: Verify intent: overwriting updated_at with revertedAt.Line 45 sets
updated_at = data.revertedAt, which overwrites the actual update timestamp with the revert date. Confirm this is intentional for this migration's semantics and won't cause issues with audit trails or other logic depending on actual modification times.internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (1)
1-41: Loop logic is correct; past off-by-one concern does not apply.The loop correctly covers all rows. With
for i in 1.._max by _batch_sizeandrow_number >= i AND row_number < i + _batch_size:
- For _max=10,000 and _batch_size=1,000: iterations cover rows 1-1000, 1001-2000, ..., 9001-10000 (all rows).
- For non-divisible totals (e.g., 10,001 rows): the loop extends to i=10001 and processes row 10001 via the range check.
The past Graphite comment suggesting a fix appears to reference an earlier version; the current code starting at
i=1is correct.Strengths of this refactor:
- Deterministic row_number()-based batching replaces OFFSET/LIMIT, improving performance on large result sets.
- Index on row_number with included columns optimizes batch filtering.
- Per-batch commits reduce lock contention during the long migration.
- The
WHERE transactions_id is nullfilter ensures idempotency (re-runs don't duplicate updates).
internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 8 files
Prompt for AI agents (all 1 issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql">
<violation number="1" location="internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql:37">
P2: The batching predicate skips every `_batch_size`-th row (including the last row when total rows are a multiple of the batch size), leaving some transactions’ inserted_at unset.</violation>
</file>
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 issues found across 8 files
Prompt for AI agents (all 2 issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql">
<violation number="1" location="internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:3">
P3: Typo: 'superseeded' should be 'superseded'.</violation>
</file>
<file name="internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql">
<violation number="1" location="internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql:38">
P2: Index `txs_view_idx` is on `(log_id, id)`, but the batch selection query now filters by `row_number`. For efficient range scans, consider adding an index on `row_number`. Migration 28 in this same PR demonstrates the correct pattern: `create index ... on ...(row_number) include (...)`. The current index is still useful for the UPDATE join, but won't help with batch selection.</violation>
</file>
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql
Show resolved
Hide resolved
4d1261e to
137d69c
Compare
137d69c to
77a64f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql (1)
25-25: Consider whether the temporary index is cost-effective.The index on
txs_view(log_id, id)(line 25) is created on a temporary table. However, the main WHERE filter in the loop (line 37) only conditions onrow_number, not onlog_idorid. This index may not be directly used by the query planner for row filtering. It could potentially help with the JOIN condition or as a covering index, but inspect whether it meaningfully accelerates the UPDATE or JOIN. For a temp table during migration, a minimal index strategy may suffice.If profiling shows the index is unused or negligible, consider removing it or narrowing it to only the necessary columns.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql(2 hunks)internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql(1 hunks)internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql(2 hunks)internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
📚 Learning: 2025-07-19T18:35:34.260Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
Applied to files:
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
📚 Learning: 2025-05-20T13:48:07.455Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
Applied to files:
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sqlinternal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql
📚 Learning: 2025-05-20T13:07:54.504Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 935
File: internal/controller/system/state_tracker.go:50-55
Timestamp: 2025-05-20T13:07:54.504Z
Learning: In the ledger codebase's `handleState` method, when updating ledger state from `StateInitializing` to `StateInUse`, it's intentional to proceed silently when `rowsAffected == 0`. This indicates another parallel transaction has already updated the ledger state and configured the sequences, so no error needs to be returned and no sequence updating is required.
Applied to files:
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
📚 Learning: 2025-11-28T10:06:01.381Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1167
File: internal/storage/bucket/migrations/43-fix-missing-inserted-at-in-log-data/up.sql:23-23
Timestamp: 2025-11-28T10:06:01.381Z
Learning: In PostgreSQL migrations for the Formance ledger, to ensure ISO 8601/RFC3339 date formatting when setting JSONB fields, use the pattern `to_jsonb(date)#>>'{}'` to extract the timestamp as a properly formatted string. This leverages JSONB's built-in ISO 8601 encoding. Using `date::text` would depend on the server's DateStyle setting and doesn't guarantee ISO 8601 format.
Applied to files:
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
📚 Learning: 2025-04-29T11:24:28.923Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 892
File: internal/controller/ledger/controller_default.go:196-196
Timestamp: 2025-04-29T11:24:28.923Z
Learning: In the ledger Import function, it's critical to maintain proper log ID tracking by updating lastLogID with the current log.ID after each processed log, rather than setting it to nil. This ensures the system can properly validate the ordering of logs and prevent duplicate or out-of-order processing, which is essential for maintaining data integrity in the ledger.
Applied to files:
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
🔇 Additional comments (5)
internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql (1)
3-3: Well-executed migration deprecation.The typo ("superseeded" → "superseded") has been corrected, and converting this migration to a no-op with a notice is a clean approach to deprecate it. This maintains migration sequence integrity while allowing existing databases (which already executed migration 27) to skip it, and new databases to proceed directly to migration 28.
Please verify that migration 28 contains all necessary data-migration logic that migration 27 previously implemented, ensuring no functionality is lost during the transition.
internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql (2)
19-19: ✓ Row_number-based batching replaces OFFSET/LIMIT efficiently.The pagination refactor from OFFSET/LIMIT to row_number-based filtering (line 37:
where row_number > _offset and row_number <= _offset + _batch_size) is correct and aligns with the PR's performance improvement goals. The batching boundaries are accurate—each iteration processes exactly_batch_sizerows without the overhead of OFFSET scanning.The addition of
log_id(line 19) as an alias forreversed.idis used correctly in the UPDATE join condition (line 52), so the projection is well-suited to the downstream logic.Also applies to: 37-37
33-62: ✓ Batched loop with per-batch commits reduces lock contention.The loop structure with batched UPDATE (lines 39–53), per-batch commit (line 61), and exit condition (line 55) is sound. Each iteration processes a fixed window of rows, avoiding the risk of prolonged table locks during large migrations. The
exit when not found;correctly terminates when no rows are affected by the UPDATE.internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (2)
21-26: Row_number-based batching is well-optimized for performance.The temporary table now uses row_number-based windowing (line 22) with a corresponding index (line 26) keyed on row_number. This replaces the earlier OFFSET/LIMIT approach and should significantly reduce lock contention during large batch updates.
- Index key on
row_numberaligns with the WHERE filter on lines 37- INCLUDE columns (
ledger,date,transaction_id) support the CTE SELECT and downstream join- Type-based filtering (line 24) correctly identifies relevant transaction logs per PR objectives
33-47: Batch loop logic is correct; no rows are skipped.The loop
for i in 0.._count-1 by _batch_sizecombined withwhere row_number > i and row_number <= i + _batch_sizecorrectly partitions all rows without gaps or overlaps:
- First iteration (i=0):
row_number > 0 AND row_number <= 1000→ rows 1–1000- Second iteration (i=1000):
row_number > 1000 AND row_number <= 2000→ rows 1001–2000- And so on…
The defensive
inserted_at is nullcheck on line 42 ensures idempotency and prevents re-updating rows if the migration is retried.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
internal/storage/bucket/migrations/29-fix-invalid-metadata-on-reverts/up.sql (1)
28-28: Index on reversedTransactionID may be underutilized.An index on
txs_view(reversedTransactionID)is created, but the loop's WHERE clause (line 40) filters by row_number, not reversedTransactionID. The index is unlikely to be used for the SELECT in the data CTE or the subsequent UPDATE join.Consider whether a row_number index would be more beneficial (if indexing is needed at all), or remove the index if it's unnecessary for temporary table scans.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
internal/storage/bucket/migrations/19-transactions-fill-pcv/up.sql(1 hunks)internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql(3 hunks)internal/storage/bucket/migrations/29-fix-invalid-metadata-on-reverts/up.sql(2 hunks)internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql(2 hunks)internal/storage/bucket/migrations/34-fix-memento-format/up.sql(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: cubic · AI code reviewer
- GitHub Check: Dirty
- GitHub Check: Tests
🔇 Additional comments (7)
internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql (3)
11-11: ✓ Row_number-based pagination correctly replaces OFFSET/LIMIT.The batch boundary logic is sound: with
_offset = 0and_batch_size = 1000, the filterrow_number > _offset and row_number <= _offset+_batch_sizecorrectly partitions rows (1–1000, 1001–2000, etc.) without gaps or overlaps. The loop exits cleanly when the data CTE is empty.Also applies to: 29-29
20-21: Index and foreign key enhance both performance and data integrity.The index on
seqaccelerates the hash join during the UPDATE statement (line 31–34), and the foreign key constraint ensures that all rows in the temporary table reference valid transactions. These are sound optimizations for a migration-specific temporary table.
15-18: Batched processing with per-batch commits is well-structured.Empty-table handling (early return), loop exit on exhaustion (
if not found), and per-batch commits all contribute to efficient lock management and graceful completion. The strategy aligns with the broader migration speedup pattern in this PR.Also applies to: 36-39, 45-45
internal/storage/bucket/migrations/34-fix-memento-format/up.sql (1)
18-18: No issue found. The seq column is abigserial primary keythat starts at 1 and cannot contain values ≤ 0.The boundary change from
seq >= _offset and seq < _offset + _batch_sizetoseq > _offset and seq <= _offset + _batch_sizeis valid and does not cause data loss. The seq column in the logs table is defined asbigserial primary keyin migration 0-init-schema, which guarantees all values are >= 1. Therefore, the concern about rows with seq ≤ 0 being skipped is not applicable—no such rows can exist.Likely an incorrect or invalid review comment.
internal/storage/bucket/migrations/29-fix-invalid-metadata-on-reverts/up.sql (2)
20-20: Good migration to row_number-based batching for performance.The row_number() window function replaces offset/limit pagination, which is more efficient for large datasets—each iteration avoids the cost of scanning and discarding offset rows.
36-56: Verify row_number batch boundary logic in loop.The batching follows the pattern:
where row_number > _offset and row_number <= _offset + _batch_size. With _offset = 0:
- Iteration 1: rows 1–1000
- Iteration 2: rows 1001–2000
- etc.
This is correct and non-overlapping. The
exit when not foundat line 50 triggers when the UPDATE affects 0 rows (i.e., the data CTE returns no rows), exiting after the final batch.internal/storage/bucket/migrations/19-transactions-fill-pcv/up.sql (1)
3-3: The batch boundary logic is correct and poses no data completeness risk. Thetransactions.seqcolumn is defined asbigserial primary keyin the initial schema, which means it auto-increments starting from 1—never 0. Therefore, with_offset = 0, the WHERE clausetransactions_seq > 0 AND transactions_seq <= 1000correctly selects all rows in the first batch (1–1000), and no rows are skipped. Subsequent batches process correctly without overlap. The loop exits when no rows match the WHERE clause, which is the appropriate termination condition. No changes are needed.Likely an incorrect or invalid review comment.
Summary by cubic
Speed up several database migrations by switching to row_number-based batching and adding temporary indexes, reducing runtime and lock contention during upgrades.
Written for commit f3611ec. Summary will update automatically on new commits.