Skip to content

Commit ce9169f

Browse files
committed
Fix: Improve pre-migration check clarity and accuracy (#243)
## Summary Updates the pre-migration check script (`PRE_MIGRATION_CHECK_20251006073122.sql`) to provide clearer, more accurate messaging about data migrations. Also updates the changeset documentation to match. ## Problem The original pre-migration check output was misleading: - Used vague labels like `ISSUE_1_AUTO_FIXED` and `ISSUE_2_AUTO_FIXED` - Didn't clearly communicate that these are **expected** data migrations - Missing reporting for completed steps that need backfilling - Could give false confidence that everything is "auto-fixed" without explaining the criticality ## Changes ### Pre-migration Check Script - ✅ Renamed output types from `ISSUE_*_AUTO_FIXED` to descriptive `DATA_*` labels - ✅ Added `DATA_BACKFILL_COMPLETED` row to report completed steps needing backfill - ✅ Improved detail messages to clearly state what will happen during migration - ✅ Added critical warning about packaged migration version bug - ✅ Better documentation explaining how to interpret results **Before:** ``` type | identifier | details ----------------------|-------------------------|--------------------------- ISSUE_2_AUTO_FIXED | run=8805ef09 step=embed | status=started → will set initial_tasks=1 INFO_SUMMARY | total_step_states=114 | created=0 started=1 completed=113 failed=0 ``` **After:** ``` type | identifier | details ---------------------------|---------------------------|------------------------------------------ DATA_BACKFILL_STARTED | run=8805ef09 step=embed | initial_tasks will be set to 1 (inferred from remaining_tasks=1) DATA_BACKFILL_COMPLETED | Found 113 completed steps | initial_tasks will be set to 1 (old schema enforced single-task) INFO_SUMMARY | total_step_states=114 | created=0 started=1 completed=113 failed=0 ``` ### Changeset Documentation - ✅ Updated example output to match new check format - ✅ Replaced markdown callouts with emojis (⚠️ 💡 📝) - ✅ Clarified interpretation guidance ## Testing Verified against real production-like data: 1. ✅ Created test database with pgmq extension 2. ✅ Imported schema and data dump (114 step_states: 1 started, 113 completed) 3. ✅ Ran pre-migration check - output is clear and accurate 4. ✅ Ran full migration - completed successfully 5. ✅ Verified all data properly migrated (initial_tasks backfilled correctly) 6. ✅ Verified constraints enforce correctness post-migration ## Migration Verification Note During testing, we confirmed: - **Local migration file** (`20251006073122_pgflow_add_map_step_type.sql`) is **CORRECT** ✅ - Properly splits ALTER TABLE statements - Backfills data between schema changes - Handles started steps correctly - **Packaged migration** (from published npm package) is **BROKEN** ❌ - Combines column addition and constraint addition in single ALTER TABLE - Will FAIL on databases with started steps - Package needs rebuild before next release ## Files Changed - `pkgs/core/queries/PRE_MIGRATION_CHECK_20251006073122.sql` - Improved clarity and accuracy - `.changeset/add-map-step-type-infrastructure.md` - Updated documentation and examples ## Related - Original migration PR: #234 - Issue: Misleading pre-migration check output
1 parent af10272 commit ce9169f

File tree

2 files changed

+70
-49
lines changed

2 files changed

+70
-49
lines changed

.changeset/add-map-step-type-infrastructure.md

Lines changed: 35 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -4,37 +4,39 @@
44

55
Add map step type infrastructure in SQL core
66

7-
> [!WARNING]
8-
> **This migration includes automatic data migration**
9-
>
10-
> The migration will automatically update existing `step_states` rows to satisfy new constraints. This should complete without issues due to strict check constraints enforced in previous versions.
11-
12-
> [!TIP]
13-
> **Optional: Verify before deploying to production**
14-
>
15-
> If you have existing production data and want to verify the migration will succeed cleanly, run this **read-only check query** (does not modify data) in **Supabase Studio** against your **production database**:
16-
>
17-
> 1. Open Supabase Studio → SQL Editor
18-
> 2. Copy contents of `pkgs/core/queries/PRE_MIGRATION_CHECK_20251006073122.sql`
19-
> 3. Execute against your production database (not local dev!)
20-
> 4. Review results
21-
>
22-
> **Expected output for successful migration:**
23-
> ```
24-
> type | identifier | details
25-
> ----------------------|---------------------------|---------------------------
26-
> ISSUE_1_AUTO_FIXED | run=abc12345 step=load | status=created remaining_tasks=1 → will set to NULL
27-
> ISSUE_2_AUTO_FIXED | run=def67890 step=process | status=started → will set initial_tasks=1
28-
> INFO_SUMMARY | total_step_states=42 | created=5 started=2 completed=30 failed=5
29-
> ```
30-
>
31-
> **Interpretation:**
32-
> - ✅ Only `ISSUE_1_AUTO_FIXED`, `ISSUE_2_AUTO_FIXED`, and `INFO_SUMMARY` rows? **Safe to migrate**
33-
> - 🆘 Unexpected issues or errors? Copy output and share on Discord for help
34-
>
35-
> **Note:** This check only returns results indicating correctness or problems - it does not modify any data. Only useful for production databases with existing runs.
7+
⚠️ **This migration includes automatic data migration**
8+
9+
The migration will automatically update existing `step_states` rows to satisfy new constraints. This should complete without issues due to strict check constraints enforced in previous versions.
10+
11+
💡 **Recommended: Verify before deploying to production**
12+
13+
If you have existing production data and want to verify the migration will succeed cleanly, run this **read-only check query** (does not modify data) in **Supabase Studio** against your **production database**:
14+
15+
1. Open Supabase Studio → SQL Editor
16+
2. Copy contents of `pkgs/core/queries/PRE_MIGRATION_CHECK_20251006073122.sql`
17+
3. Execute against your production database (not local dev!)
18+
4. Review results
19+
20+
**Expected output for successful migration:**
21+
22+
```
23+
type | identifier | details
24+
---------------------------|---------------------------|------------------------------------------
25+
DATA_BACKFILL_STARTED | run=def67890 step=process | initial_tasks will be set to 1 (...)
26+
DATA_BACKFILL_COMPLETED | Found 100 completed steps | initial_tasks will be set to 1 (...)
27+
INFO_SUMMARY | total_step_states=114 | created=0 started=1 completed=113 failed=0
28+
```
29+
30+
**Interpretation:**
31+
32+
- ✅ Only `DATA_BACKFILL_*` and `INFO_SUMMARY` rows? **Safe to migrate**
33+
- ⚠️ These are expected data migrations handled automatically by the migration
34+
- 🆘 Unexpected rows or errors? Copy output and share on Discord for help
35+
36+
📝 **Note:** This check identifies data that needs migration but does not modify anything. Only useful for production databases with existing runs.
3637

3738
**Automatic data updates:**
39+
3840
- Sets `initial_tasks = 1` for all existing steps (correct for pre-map-step schema)
3941
- Sets `remaining_tasks = NULL` for 'created' status steps (new semantics)
4042

@@ -47,20 +49,23 @@ No manual intervention required.
4749
This patch introduces the foundation for map step functionality in the SQL core layer:
4850

4951
### Schema Changes
52+
5053
- Added `step_type` column to `steps` table with constraint allowing 'single' or 'map' values
5154
- Added `initial_tasks` column to `step_states` table (defaults to 1, stores planned task count)
5255
- Modified `remaining_tasks` column to be nullable (NULL = not started, >0 = active countdown)
5356
- Added constraint `remaining_tasks_state_consistency` to ensure `remaining_tasks` is only set when step has started
5457
- Removed `only_single_task_per_step` constraint from `step_tasks` table to allow multiple tasks per step
5558

5659
### Function Updates
60+
5761
- **`add_step()`**: Now accepts `step_type` parameter (defaults to 'single') with validation that map steps can have at most 1 dependency
5862
- **`start_flow()`**: Sets `initial_tasks = 1` for all steps (map step array handling will come in future phases)
5963
- **`start_ready_steps()`**: Copies `initial_tasks` to `remaining_tasks` when starting a step, maintaining proper task counting semantics
6064

6165
### Testing
66+
6267
- Added comprehensive test coverage for map step creation and validation
6368
- All existing tests pass with the new schema changes
6469
- Tests validate the new step_type parameter and dependency constraints for map steps
6570

66-
This is Phase 2a of the map step implementation, establishing the SQL infrastructure needed for parallel task execution in future phases.
71+
This is Phase 2a of the map step implementation, establishing the SQL infrastructure needed for parallel task execution in future phases.
Lines changed: 35 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,50 +1,63 @@
11
-- ================================================================================
22
-- PRE-MIGRATION CHECK for 20251006073122_pgflow_add_map_step_type.sql
33
-- ================================================================================
4-
-- Purpose: Verify the migration will succeed cleanly on your database
4+
-- Purpose: Identify data that requires migration before schema changes
55
-- When to run: BEFORE applying the migration
66
-- What to do with output:
77
-- - ✅ Only INFO rows? You're good to migrate
8-
-- - ⚠️ ISSUE_1/ISSUE_2? Safe - migration auto-fixes these
9-
-- - 🆘 Unexpected issues? Copy/paste output to Discord for help
8+
-- - ⚠️ DATA_* rows? Expected - migration handles these automatically
9+
-- - 🆘 Unexpected rows? Copy/paste output for help
1010
-- ================================================================================
1111

1212
WITH issues AS (
13-
-- Issue 1: Created steps with remaining_tasks (will be set to NULL)
13+
-- Created steps with remaining_tasks set (should be NULL in new schema)
1414
SELECT
1515
1 AS priority,
16-
'ISSUE_1_AUTO_FIXED' AS type,
16+
'DATA_CLEANUP_CREATED' AS type,
1717
format('run=%s step=%s',
1818
LEFT(run_id::text, 8),
1919
step_slug
2020
) AS identifier,
21-
format('status=%s remaining_tasks=%s → will set to NULL',
22-
status,
21+
format('remaining_tasks=%s will be set to NULL (created steps not started yet)',
2322
remaining_tasks
2423
) AS details
2524
FROM pgflow.step_states
2625
WHERE status = 'created' AND remaining_tasks IS NOT NULL
2726

2827
UNION ALL
2928

30-
-- Issue 2: Started steps (will backfill initial_tasks)
29+
-- Started steps that need initial_tasks backfilled
3130
SELECT
3231
2 AS priority,
33-
'ISSUE_2_AUTO_FIXED' AS type,
32+
'DATA_BACKFILL_STARTED' AS type,
3433
format('run=%s step=%s',
3534
LEFT(run_id::text, 8),
3635
step_slug
3736
) AS identifier,
38-
format('status=%s → will set initial_tasks=%s',
39-
status,
40-
COALESCE(remaining_tasks::text, '1')
37+
format('initial_tasks will be set to %s (inferred from remaining_tasks=%s)',
38+
COALESCE(remaining_tasks::text, '1'),
39+
COALESCE(remaining_tasks::text, 'NULL')
4140
) AS details
4241
FROM pgflow.step_states
4342
WHERE status = 'started'
4443

4544
UNION ALL
4645

47-
-- Info: Summary stats
46+
-- Completed steps that need initial_tasks backfilled
47+
SELECT
48+
3 AS priority,
49+
'DATA_BACKFILL_COMPLETED' AS type,
50+
format('Found %s completed steps', COUNT(*)) AS identifier,
51+
format('initial_tasks will be set to 1 (old schema enforced single-task)',
52+
''
53+
) AS details
54+
FROM pgflow.step_states
55+
WHERE status = 'completed'
56+
HAVING COUNT(*) > 0
57+
58+
UNION ALL
59+
60+
-- Summary stats
4861
SELECT
4962
999 AS priority,
5063
'INFO_SUMMARY' AS type,
@@ -67,11 +80,14 @@ ORDER BY priority, identifier;
6780
-- ================================================================================
6881
-- HOW TO READ THE OUTPUT:
6982
-- ================================================================================
70-
-- type | identifier | details
71-
-- ----------------------|-------------------------|---------------------------
72-
-- ISSUE_1_AUTO_FIXED | run=abc12345 step=foo | Will be cleaned up
73-
-- ISSUE_2_AUTO_FIXED | run=def67890 step=bar | Will be backfilled
74-
-- INFO_SUMMARY | total_step_states=42 | Overall stats
83+
-- type | identifier | details
84+
-- ---------------------------|-------------------------|-----------------------------
85+
-- DATA_CLEANUP_CREATED | run=abc12345 step=foo | Cleanup needed
86+
-- DATA_BACKFILL_STARTED | run=def67890 step=bar | Backfill needed
87+
-- DATA_BACKFILL_COMPLETED | Found 100 completed... | Backfill needed
88+
-- INFO_SUMMARY | total_step_states=114 | Overall stats
7589
--
76-
-- ✅ Safe to migrate if you only see ISSUE_1/ISSUE_2 + INFO_SUMMARY
90+
-- ✅ Safe to migrate - these data issues are handled automatically by the migration
91+
-- ⚠️ CRITICAL: This migration MUST use the correct version that splits ALTER TABLE
92+
-- statements. The packaged version combines them and will FAIL on started steps.
7793
-- ================================================================================

0 commit comments

Comments
 (0)