feat: improve migration performances #1175

gfyrag · 2025-12-01T13:42:02Z

Summary by cubic

Speed up v2.2 migrations by switching to row_number-based batching with indexed temp tables. This reduces large OFFSET/LIMIT scans, migration time, and locking.

Refactors
- Migrations 17, 18, 28, 29, 31, 32, 33 use temp tables with row_number and indexes for batched updates.
- Replace OFFSET/LIMIT with row ranges and commit per batch.
- Tighten updates (only set inserted_at when null; process only NEW_TRANSACTION/REVERTED_TRANSACTION logs).
- Migration 27: skip the heavy PCV backfill to avoid long runtimes.

^{Written for commit 69b5f3f. Summary will update automatically on new commits.}

coderabbitai · 2025-12-01T13:42:23Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Multiple SQL migrations were rewritten to use precomputed staging tables and row_number()-based batching instead of OFFSET/LIMIT; one migration (27) was converted to a no-op with an explicit notice; batching predicates and indexes were adjusted and a final NOT NULL step was removed from migration 17.

Changes

Cohort / File(s)	Summary
Row_number-based pagination refactoring `internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql`, `internal/storage/bucket/migrations/29-fix-invalid-metadata-on-reverts/up.sql`, `internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql`, `internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql`, `internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql`	Add `row_number() over (order by transactions.seq)` to temporary views, create/index on `row_number`, and replace OFFSET/LIMIT pagination with `WHERE row_number` range filters; update batching queries and adjust indexes/update predicates.
Batch processing materialization `internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql`	Introduce a precomputed `transactions_ids` temp table (join of moves→transactions), index and ANALYZE it, process updates in row_number-based batches using the staged rows, commit per batch, and remove the final NOT NULL constraint addition.
Row_number batching with view/index changes `internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql`	Add `row_number` to `moves_view`, change index to `row_number` (INCLUDE transactions_seq, volumes), and switch pagination to `row_number`-range filtering for batched updates.
Migration simplification / no-op `internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql`	Replace previous transformation logic with a single `RAISE NOTICE` stating the migration is superseded (no-op).
Batch boundary tweak `internal/storage/bucket/migrations/19-transactions-fill-pcv/up.sql`, `internal/storage/bucket/migrations/34-fix-memento-format/up.sql`	Adjust batch boundary comparisons from `>= lower AND < upper` to `> lower AND <= upper`, shifting inclusive/exclusive endpoints for batch ranges.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Pay extra attention to:
- internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql — correctness of the join that materializes transaction ids, index choice, ANALYZE usage, per-batch commit semantics, and the removal of the NOT NULL constraint.
- internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql — tightened update predicate (avoid overwriting existing inserted_at) and the new row_number indexing.
- Consistency of row_number ordering and index definitions across migrations (18, 28, 29, 31, 32, 33) to ensure deterministic, non-overlapping batch boundaries.
- Ensure no off-by-one gaps introduced by switching inclusive/exclusive bounds in migrations 19 and 34.

Possibly related PRs

feat: improve migration performances #1174 — Makes similar rewrites across the same migration files (row_number-based batching and staging tables); closely related changes.
fix: speed up migration 19 #1017 — Modifies overlapping migration batching and moves_view handling; related to batch-control edits.

Suggested reviewers

ascandone
flemzord

Poem

🐇 I numbered rows beneath the moonlit sky,
staged my joins, then let the batches fly.
I committed hops in tidy little stacks,
skipped the old NOT NULL, and tidied tracks.
Hooray — migrations done, a rabbit's sigh. 🥕✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main objective of the changeset: improving migration performance through optimizations.
Description check	✅ Passed	The description clearly explains the key changes and rationale, detailing the migration refactoring strategy and performance improvements.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch backport/v2.2/speedup-migration

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 099e612 and 69b5f3f.

📒 Files selected for processing (4)

internal/storage/bucket/migrations/19-transactions-fill-pcv/up.sql (1 hunks)
internal/storage/bucket/migrations/29-fix-invalid-metadata-on-reverts/up.sql (2 hunks)
internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql (2 hunks)
internal/storage/bucket/migrations/34-fix-memento-format/up.sql (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: cubic · AI code reviewer
GitHub Check: Tests

🔇 Additional comments (4)

internal/storage/bucket/migrations/29-fix-invalid-metadata-on-reverts/up.sql (2)

20-20: Row_number boundary fix correctly applied.

The previous critical issue regarding the off-by-one error in row_number predicates has been correctly resolved. The current predicate row_number > _offset and row_number <= _offset + _batch_size properly handles the 1-based row_number vs. 0-based _offset mismatch:

First batch (_offset=0): selects rows 1–1000 (1000 rows)

Second batch (_offset=1000): selects rows 1001–2000 (1000 rows)

No gaps, no overlaps, matches OFFSET/LIMIT semantics ✓

Batch processing is deterministic and each row is processed exactly once because row_number is assigned once at temp-table creation and remains stable throughout the loop.

Also applies to: 40-40

1-61: Migration structure and resilience: approved.

The overall migration design is sound:

Temp table isolation (lines 10–26): Filters to only affected rows; pre-excludes rows already having revert metadata (line 25) to prevent redundant updates.

Batching with commits (lines 36–57): Per-batch commit reduces lock duration and allows concurrent operations between batches.

Loop termination (lines 30–32, 50): Early exits if no data; proper not found exit condition prevents infinite loops.

NULL safety (line 44): JSONB merge operator handles null metadata correctly.

Observability (lines 34, 54): Notifications track migration progress through pg_notify.

The migration avoids large, long-held locks typical of unoptimized OFFSET/LIMIT scans.

internal/storage/bucket/migrations/19-transactions-fill-pcv/up.sql (1)

40-50: The boundary condition is safe. PostgreSQL's SERIAL type defaults to START WITH 1, so the exclusive lower bound (transactions_seq > _offset) with _offset=0 on the first iteration correctly captures seq values from 1 onward. No rows are skipped.

internal/storage/bucket/migrations/34-fix-memento-format/up.sql (1)

14-20: Verify batch boundary doesn't skip seq=0 if it exists.

The WHERE clause uses exclusive lower bound (seq > _offset), which skips seq=0 on the first iteration when _offset=0. However, PostgreSQL SERIAL columns typically start at 1, making this risk minimal in standard deployments. Confirm whether the logs table allows seq=0 values before considering this a blocker.

Additionally, sparse seq values (gaps exceeding the batch size) could theoretically leave rows unprocessed between iterations, though the MAX(seq) termination check at line 84 should safely exit the loop once all rows are covered.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cubic-dev-ai

3 issues found across 8 files

Prompt for AI agents (all 3 issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql">

<violation number="1" location="internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql:37">
P2: The range predicate skips the last batch element whenever the total number of rows is a multiple of `_batch_size`, leaving the final transaction’s `inserted_at` null.</violation>
</file>

<file name="internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql">

<violation number="1" location="internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql:36">
P1: `COMMIT` cannot be issued inside a `DO`/PLpgSQL block; this migration will raise `invalid transaction termination` before processing any rows.</violation>
</file>

<file name="internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql">

<violation number="1" location="internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql:37">
P2: The row_number window is 1-based but `_offset` is 0-based, so this predicate processes only `_batch_size-1` rows in the first batch and forces an extra single-row batch whenever the total count is a multiple of `_batch_size`. Use bounds that mimic OFFSET semantics (rows `_offset + 1` through `_offset + _batch_size`).</violation>
</file>

_{Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR}

internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql

internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql

internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql

internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql

coderabbitai

Actionable comments posted: 4

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0840b9f and fec0e16.

📒 Files selected for processing (8)

internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (1 hunks)
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (2 hunks)
internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql (1 hunks)
internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql (3 hunks)
internal/storage/bucket/migrations/29-fix-invalid-metadata-on-reverts/up.sql (2 hunks)
internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql (2 hunks)
internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql (2 hunks)
internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql (2 hunks)

🧰 Additional context used

🧠 Learnings (5)

📓 Common learnings

Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.

📚 Learning: 2025-07-19T18:35:34.260Z

Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.

Applied to files:

internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql

📚 Learning: 2025-11-28T10:06:01.381Z

Learnt from: gfyrag
Repo: formancehq/ledger PR: 1167
File: internal/storage/bucket/migrations/43-fix-missing-inserted-at-in-log-data/up.sql:23-23
Timestamp: 2025-11-28T10:06:01.381Z
Learning: In PostgreSQL migrations for the Formance ledger, to ensure ISO 8601/RFC3339 date formatting when setting JSONB fields, use the pattern `to_jsonb(date)#>>'{}'` to extract the timestamp as a properly formatted string. This leverages JSONB's built-in ISO 8601 encoding. Using `date::text` would depend on the server's DateStyle setting and doesn't guarantee ISO 8601 format.

Applied to files:

internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql

📚 Learning: 2025-05-20T13:48:07.455Z

Learnt from: gfyrag
Repo: formancehq/ledger PR: 935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.

Applied to files:

internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql
internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql
internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql

📚 Learning: 2025-04-29T11:24:28.923Z

Learnt from: gfyrag
Repo: formancehq/ledger PR: 892
File: internal/controller/ledger/controller_default.go:196-196
Timestamp: 2025-04-29T11:24:28.923Z
Learning: In the ledger Import function, it's critical to maintain proper log ID tracking by updating lastLogID with the current log.ID after each processed log, rather than setting it to nil. This ensures the system can properly validate the ordering of logs and prevent duplicate or out-of-order processing, which is essential for maintaining data integrity in the ledger.

Applied to files:

internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: cubic · AI code reviewer
GitHub Check: Tests

🔇 Additional comments (4)

internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql (1)

1-5: Approved: Intentional no-op migration with clear messaging.

The migration is correctly simplified to skip the heavy PCV backfill as intended. The RAISE NOTICE provides clear feedback that the migration is superseded.

internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql (1)

37-37: Approved: Row_number boundary is correctly fixed.

The predicate where row_number > _offset and row_number <= _offset + _batch_size properly handles the 1-based row_number with 0-based offset, processing exactly _batch_size rows per batch.

internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql (1)

38-38: Approved: Row_number boundary correctly handles 1-based indexing.

The predicate where row_number > _offset and row_number <= _offset + _batch_size properly processes exactly _batch_size rows per batch.

internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (1)

37-37: Approved: Row_number boundary correctly fixed to avoid skipping rows.

The predicate where row_number > i and row_number <= i + _batch_size correctly processes exactly _batch_size rows per iteration, accounting for 1-based row_number and 0-based loop start.

internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql

internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql

internal/storage/bucket/migrations/29-fix-invalid-metadata-on-reverts/up.sql

internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql (1)
34-37: Exit condition logic is sound; consider explicit logging for visibility.

The loop exits when NOT FOUND (no rows updated), and empty-table check prevents unnecessary processing. However, the early return at line 35 prevents the final drop table moves_view at line 64 from executing in that code path.

To ensure cleanup consistency, consider restructuring the empty check to use the same cleanup path:
  if (select count(*) from moves_view) = 0 then
-     drop table moves_view;
-     return;
+     perform pg_notify('migrations-{{ .Schema }}', 'init: no moves to process');
+     drop table if exists moves_view;
+     return;
  end if;
This ensures the drop happens consistently regardless of the early exit path.

Also applies to: 52-55

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between fec0e16 and 099e612.

📒 Files selected for processing (5)

internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (2 hunks)
internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql (1 hunks)
internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql (3 hunks)
internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql (2 hunks)
internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql

🧰 Additional context used

🧠 Learnings (7)

📓 Common learnings

Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.

📚 Learning: 2025-05-20T13:48:07.455Z

Learnt from: gfyrag
Repo: formancehq/ledger PR: 935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.

Applied to files:

internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql

📚 Learning: 2025-12-03T14:01:02.266Z

Learnt from: gfyrag
Repo: formancehq/ledger PR: 1175
File: internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql:36-36
Timestamp: 2025-12-03T14:01:02.266Z
Learning: In the Formance ledger migration system, DO blocks are executed at top level without an explicit transaction wrapper. This means that COMMIT and ROLLBACK statements inside PLpgSQL DO blocks work correctly (PostgreSQL 11+), allowing batch processing with commits between batches as the PLpgSQL engine automatically starts a new transaction after each COMMIT.

Applied to files:

internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql

📚 Learning: 2025-07-19T18:35:34.260Z

Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.

Applied to files:

internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql

📚 Learning: 2025-05-20T13:07:54.504Z

Learnt from: gfyrag
Repo: formancehq/ledger PR: 935
File: internal/controller/system/state_tracker.go:50-55
Timestamp: 2025-05-20T13:07:54.504Z
Learning: In the ledger codebase's `handleState` method, when updating ledger state from `StateInitializing` to `StateInUse`, it's intentional to proceed silently when `rowsAffected == 0`. This indicates another parallel transaction has already updated the ledger state and configured the sequences, so no error needs to be returned and no sequence updating is required.

Applied to files:

internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql

📚 Learning: 2025-11-28T10:06:01.381Z

Learnt from: gfyrag
Repo: formancehq/ledger PR: 1167
File: internal/storage/bucket/migrations/43-fix-missing-inserted-at-in-log-data/up.sql:23-23
Timestamp: 2025-11-28T10:06:01.381Z
Learning: In PostgreSQL migrations for the Formance ledger, to ensure ISO 8601/RFC3339 date formatting when setting JSONB fields, use the pattern `to_jsonb(date)#>>'{}'` to extract the timestamp as a properly formatted string. This leverages JSONB's built-in ISO 8601 encoding. Using `date::text` would depend on the server's DateStyle setting and doesn't guarantee ISO 8601 format.

Applied to files:

internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql

📚 Learning: 2025-04-29T11:24:28.923Z

Learnt from: gfyrag
Repo: formancehq/ledger PR: 892
File: internal/controller/ledger/controller_default.go:196-196
Timestamp: 2025-04-29T11:24:28.923Z
Learning: In the ledger Import function, it's critical to maintain proper log ID tracking by updating lastLogID with the current log.ID after each processed log, rather than setting it to nil. This ensures the system can properly validate the ordering of logs and prevent duplicate or out-of-order processing, which is essential for maintaining data integrity in the ledger.

Applied to files:

internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Tests

🔇 Additional comments (8)

internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql (2)
11-12: Row_number batching logic is now correct; verify index strategy.

The previous off-by-one boundary condition has been fixed: the predicate row_number > _offset AND row_number <= _offset + _batch_size now correctly processes exactly 1000 rows per batch (rows 1–1000, then 1001–2000, etc.). The index has been optimized to filter on row_number with INCLUDE columns for the join, which should accelerate batch retrieval.

Verify that the index on row_number is being used for batch filtering and that the join between moves_view and transactions remains efficient. Please run an EXPLAIN ANALYZE on a representative batch query to confirm the query plan:
explain analyze
select transactions_seq, volumes
from moves_view
where row_number > 0 and row_number <= 1000;
Also applies to: 30-30, 45-45

32-50: Verify foreign key constraint doesn't cause lock escalation or deadlock during batched updates.

The foreign key on transactions_seq optimizes the hash join but may cause unexpected lock behaviors during concurrent updates. Confirm that the ACCESS SHARE lock acquired on the transactions table doesn't escalate during the batched UPDATE, and that no deadlock risk exists between the temporary table and transactions.

Based on learnings, temporary table locks are acceptable, but please verify:

Run a test migration on a staging environment and monitor lock waits: SELECT * FROM pg_locks WHERE relation = 'transactions'::regclass.

Confirm no lock escalation from ACCESS SHARE to stronger modes during the UPDATE loop.
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (1)

22-44: Row-number batching logic is correct and efficiently structured.

The migration correctly uses row_number()-based pagination with batch commits. Line 37's range predicate (row_number > i and row_number <= i + _batch_size) accurately covers all rows without gaps or duplicates—the past concern about skipping the final batch has been properly fixed. The 1-based row numbering from row_number() aligns correctly with loop iteration (starting i=0), ensuring the first batch captures rows 1–1000, second batch rows 1001–2000, and so on.

The index on (row_number) with included columns supports efficient range scans and index-only retrieval of all required fields. The tight update predicate (and inserted_at is null) prevents re-updating and avoids unnecessary row locking. Explicit COMMITs within the DO block (per PLpgSQL semantics) enable proper batch processing without extended locks.

The filtering to only NEW_TRANSACTION and REVERTED_TRANSACTION log types aligns with PR objectives.

internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql (3)

19-19: Row_number bounds correctly fixed.

The batching logic now correctly processes full batches: first iteration fetches rows 1–1000, second fetches 1001–2000, etc. The predicate row_number > _offset and row_number <= _offset + _batch_size properly aligns 1-based row_number with 0-based _offset. Past concern appears resolved.

Also applies to: 37-37

25-25: Clarify index strategy for row_number-based filtering.

The index is defined on (log_id, id), but the data CTE is filtered by row_number (line 37). Since the temporary table is small after row_number filtering, and log_id lookup is only used in the subsequent JOIN, verify that this index choice aligns with query performance expectations. Consider whether an index on (row_number) would be more beneficial for the initial filtering step, consistent with the row_number-based batching pattern.

45-48: Verify timestamp NULL handling and format consistency.

Timestamps are formatted as to_json(field)#>>'{}' || 'Z', which appends the ISO 8601 UTC marker. However, if a timestamp is NULL, the expression evaluates to NULL rather than a formatted string. Additionally, migration 32 (file 2, lines 46–48) uses raw field values without this JSON conversion. Clarify whether:

The 'Z' suffix is required for this migration's use case, and NULL values are acceptable as-is.

Migration 32 should adopt the same formatting for consistency, or vice versa.

internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql (2)

26-26: Verify index strategy aligns with migration 33.

This migration indexes on (row_number) to support the batching filter (line 38), whereas migration 33 indexes on (log_id, id). Both are filtering by row_number and joining on log_id. Confirm whether the different index strategies are intentional (e.g., based on cardinality differences) or if they should be unified for consistency across the batch-based migration pattern introduced in this PR.

40-51: Timestamp formatting differs from migration 33; verify intended behavior.

Migration 32 (lines 46–48) uses raw field values for timestamps (data.reverted_at, data.inserted_at, data.timestamp), whereas migration 33 converts them to JSON strings with a 'Z' ISO 8601 suffix. Both migrations perform similar operations on reverted transactions. Confirm whether the different timestamp formats are intentional or if one should be standardized to match the other.

gfyrag requested a review from a team as a code owner December 1, 2025 13:42

feat: improve migration performances

e1c4e8d

gfyrag force-pushed the backport/v2.2/speedup-migration branch from 69d9a0f to e1c4e8d Compare December 1, 2025 13:42

cubic-dev-ai bot reviewed Dec 1, 2025

View reviewed changes

flemzord reviewed Dec 3, 2025

View reviewed changes

internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql Outdated Show resolved Hide resolved

flemzord approved these changes Dec 3, 2025

View reviewed changes

coderabbitai bot reviewed Dec 3, 2025

View reviewed changes

fix: from reviews

099e612

gfyrag force-pushed the backport/v2.2/speedup-migration branch from fec0e16 to 099e612 Compare December 3, 2025 14:01

coderabbitai bot reviewed Dec 3, 2025

View reviewed changes

fix: some offsets

69b5f3f

gfyrag merged commit 09b8b6c into release/v2.2 Dec 3, 2025
9 checks passed

gfyrag deleted the backport/v2.2/speedup-migration branch December 3, 2025 14:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: improve migration performances #1175

feat: improve migration performances #1175

Uh oh!

gfyrag commented Dec 1, 2025 •

edited by cubic-dev-ai bot

Loading

Uh oh!

coderabbitai bot commented Dec 1, 2025 •

edited

Loading

Other AI code review bot(s) detected

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: improve migration performances #1175

feat: improve migration performances #1175

Uh oh!

Conversation

gfyrag commented Dec 1, 2025 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Uh oh!

coderabbitai bot commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gfyrag commented Dec 1, 2025 •

edited by cubic-dev-ai bot

Loading

coderabbitai bot commented Dec 1, 2025 •

edited

Loading