Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
3f14ed6
Added initial extensions for Database, Schema, Table, Column schema c…
oskardudycz Sep 27, 2025
c8634a6
Added a first naive implementation of SchemaComponentMigrator
oskardudycz Sep 27, 2025
9718612
Added drafts of schema components definitions
oskardudycz Oct 16, 2025
1de4a0d
Added DatabaseSchemaComponent factory
oskardudycz Oct 16, 2025
651a623
Added additional options to SchemaComponent to make easier extending it
oskardudycz Oct 16, 2025
acb210c
Added factories for SchemaComponents
oskardudycz Oct 21, 2025
c846b5a
Removed overengineered kind in schema components and schema component…
oskardudycz Oct 26, 2025
b90e422
Refactored Schema Component to use urn key instead of just component …
oskardudycz Oct 28, 2025
06e7d04
Added mapping for the specific schema component nested components to …
oskardudycz Oct 28, 2025
64660af
Moved schema components to dedicated files
oskardudycz Oct 28, 2025
46bbdcf
Refactored the structure to remove cyclic dependency between migratio…
oskardudycz Oct 28, 2025
a15e854
Added first version of dumbo schema
oskardudycz Oct 28, 2025
f87afe4
Added samples for dumbo schema
oskardudycz Oct 28, 2025
20a2d92
Simplified database schema setup
oskardudycz Oct 29, 2025
8ff99b3
Addef dumbo schema functions overloads
oskardudycz Oct 30, 2025
03fb40a
Refactored Schema component to use exclude instead of omit to avoid a…
oskardudycz Oct 30, 2025
16fde91
Added explicit methods for adding nested components
oskardudycz Oct 31, 2025
70aafb1
Added design and implementation plan
oskardudycz Oct 31, 2025
d247020
Added docs for design of feature schema component
oskardudycz Oct 31, 2025
560748b
Refactored column schema component to take all SQLToken parameters li…
oskardudycz Nov 13, 2025
c1fb729
Adjusted column schema definition to include mandatory type
oskardudycz Nov 13, 2025
e48c1dd
Renamed SQLColumnTokens to SQLColumnTypeTokens to make it explict
oskardudycz Nov 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
461 changes: 461 additions & 0 deletions plan.md

Large diffs are not rendered by default.

1,731 changes: 1,731 additions & 0 deletions src/packages/dumbo/src/core/schema/MIGRATION_UNIFICATION_PLAN.md

Large diffs are not rendered by default.

201 changes: 201 additions & 0 deletions src/packages/dumbo/src/core/schema/REVIEW.MD
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
Deep Analysis of the Architecture
After thoroughly rereading the plan, here's my multi-perspective analysis:
From a Regular User's Perspective (Storage Library Consumer)
Strengths:
Progressive Disclosure of Complexity - Users can start simple (SQL-only or TypeScript-only) and gradually adopt more features. This is excellent API design.
Familiar Patterns - The API resembles popular ORMs (Prisma, TypeORM) making adoption easier.
Clear Error Messages - The immutability validation provides actionable feedback rather than cryptic errors.
Type Safety Benefits - The generated types for query builders provide immediate value that developers can feel in their IDE.
Potential Pain Points:
Hidden Complexity in Snapshots - The dual-purpose snapshots (schema state + migration tracking) might be confusing initially. Users might not understand why snapshots contain their migrations.
Directory Proliferation - The .dumbo/ folder with nested snapshots could feel overwhelming for small projects.
Mental Model Shift - Understanding when TypeScript is truth vs when SQL is truth requires careful documentation and good examples.
Migration Immutability - While safer, this might frustrate users coming from Rails/Django where you can edit migrations before deploying.
From a Software Architect's Perspective
Architectural Strengths:
Separation of Concerns - Each component generating its own SQL is textbook single responsibility principle. This makes the system highly testable and maintainable.
Command-Query Separation - The clear distinction between:
Commands (processMigrations, runMigrations)
Queries (loadSnapshot, validateMigrations)
Pure functions (generateMigrations, diffTableWithSnapshot)
Temporal Decoupling - Snapshots act as a temporal boundary, allowing the system to reason about "then vs now" without querying the database.
Composability - The tree structure with proper traversal order handles complex dependencies elegantly.
Architectural Concerns:
Snapshot as Dual Source of Truth - Using snapshots for BOTH schema state AND migration tracking violates single responsibility. Consider splitting into:
Schema snapshots (current state for diffing)
Migration manifest (tracking what's been processed)
File System as Database - The .dumbo/migrations/ directory becomes a critical data store. What happens with:
Concurrent writes?
File system permissions?
CI/CD environments?
Type Generation Timing - Generating types from snapshots rather than from the database means types might not reflect reality if migrations fail partially.
Validation vs Enforcement - The system validates but doesn't enforce schema consistency. A team member could still manually edit the database.
Architecture as Code Analysis
Elegant Design Patterns:
Builder Pattern in dumboSchema API:
dumboSchema.database('myapp', {
public: dumboSchema.schema('public', {
users: dumboSchema.table('users', {...})
})
})
Template Method Pattern in migration generation - each component implements its own generateMigrations() but follows the same structure.
Visitor Pattern in tree traversal - the orchestrator visits each component in dependency order.
Memento Pattern - Snapshots act as mementos capturing component state at a point in time.
Code Smells to Address:
Large Parameter Lists - The orchestrator constructor and several functions have many parameters that could be grouped into configuration objects.
String Manipulation for SQL - Building SQL through string concatenation (even with template literals) is risky. Consider a SQL builder abstraction.
Implicit Ordering Dependencies - The migration grouping relies on naming conventions (includes('schema'), includes('create_table')). This is fragile.
Mixed Abstraction Levels - The orchestrator handles both high-level workflow and low-level file I/O. Consider extracting a MigrationWriter service.
Philosophical Considerations
The Snapshot Paradox: The snapshots serve as both historical record (what migrations were defined) and current state (what the schema looks like). This dual nature is philosophically interesting but practically complex. It's like Git storing both diffs AND the current state in the same structure. Migration Immutability Trade-off: Enforcing immutability is philosophically sound (history should be append-only) but practically challenging. Real-world schema evolution is messy, and sometimes you need to fix a broken migration. TypeScript as Truth: The decision to make TypeScript the source of truth when present is bold. It prioritizes developer experience over database reality, which could lead to drift.
Recommendations for Improvement
Split Snapshot Responsibilities:
Keep schema snapshots for state
Add a separate .dumbo/manifest.json for tracking processed migrations
Add Schema Verification:
After running migrations, query information_schema to verify the result matches expectations
Generate types from actual database state, not snapshots
Implement Conflict Resolution:
Add a dumbo schema:merge command for snapshot conflicts
Use three-way merge logic (base, mine, theirs)
Consider Event Sourcing:
Treat migrations as events in an event store
Snapshots become projections of those events
Enables time-travel debugging
Add Dry-Run Capability:
Let users preview what migrations would be generated/applied
Essential for production deployments
Overall Assessment
This is a sophisticated and well-thought-out architecture that successfully bridges the gap between SQL-first and TypeScript-first workflows. The use of snapshots for migration tracking is innovative, though it adds complexity. The system would benefit from:
Clearer separation between state tracking and migration manifest
Runtime verification that generated types match database reality
Better tooling for snapshot conflict resolution
More explicit handling of edge cases (partial migration failure, concurrent modifications)
The architecture shows maturity in its backwards compatibility and progressive enhancement approach. It's clearly designed by someone who understands both the technical challenges and the human factors of schema management. Grade: B+ Strong foundation with room for refinement in the snapshot dual-responsibility and runtime verification aspects.

#########

What if snapshots were stored as jsons would that help in understanding the split?
Excellent question! Let me think through how JSON snapshots would affect the architecture, particularly around the split of responsibilities.
JSON Snapshots: Impact Analysis
How JSON Would Help with the Split
Current Issue with TypeScript Snapshots:
// .dumbo/snapshots/schemas/public/users.snapshot.ts
export const usersSnapshot = {
tableName: 'users',
columns: { /_ schema state _/ },
migrations: [ /* migration tracking */ ] // Mixed responsibility!
} as const;
With JSON + Split Responsibilities:
// .dumbo/snapshots/schemas/public/users.schema.json
{
"tableName": "users",
"columns": {
"id": { "type": "serial", "primaryKey": true },
"email": { "type": "varchar", "length": 255 }
},
"indexes": {}
}

// .dumbo/manifests/schemas/public/users.manifest.json
{
"component": "sc:dumbo:table:users",
"processedMigrations": [
{
"name": "001_custom_create",
"hash": "sha256:abcd1234...",
"processedAt": "2024-01-15T10:30:00Z",
"source": "provided"
}
]
}
Benefits of JSON Format
Clear Separation of Concerns
_.schema.json = Current state (for diffing)
_.manifest.json = Migration history (for tracking)
No temptation to mix responsibilities in a single file
Better Git Diffs
{
"columns": {
"id": { "type": "serial" },

- "email": { "type": "varchar", "length": 255 }
}
}
Much cleaner than TypeScript object diffs with imports and type annotations.
Language Agnostic
Other tools (Python scripts, CI/CD, database admin tools) can read/write
No TypeScript compilation needed to access snapshot data
Simpler Merge Resolution
JSON merge tools are mature and widespread
Can use jq or similar for command-line manipulation
Easier to write custom merge strategies
Drawbacks of JSON
Loss of Type Safety
No compile-time checking of snapshot structure
Need runtime validation (though you need this anyway)
No Imports/Composition
Can't use TypeScript's import system for nested structure
Would need to manually resolve file references
Verbosity for Complex Structures
No as const assertions
More verbose without TypeScript's type inference
Hybrid Approach: Best of Both Worlds
What if we used JSON for storage but TypeScript for types?
// .dumbo/snapshots/types.ts (generated)
import usersSchema from './schemas/public/users.schema.json';
import postsSchema from './schemas/public/posts.schema.json';

export type DatabaseSchema = {
public: {
users: typeof usersSchema;
posts: typeof postsSchema;
}
};
Recommended Structure with JSON
.dumbo/
schemas/ # Current state (JSON)
database.json
public/
schema.json
tables/
users.json
posts.json
manifests/ # Migration tracking (JSON)
database.json
public/
schema.json
tables/
users.json
posts.json
migrations/ # SQL files
2024-01-15-001-create.sql
generated/ # Generated TypeScript
types.ts # Generated from schemas/
manifest-types.ts # Types for manifest structure
Impact on the Architecture
Cleaner Responsibilities:
Schema JSON: Pure state representation
Manifest JSON: Pure migration history
Generated TS: Type safety without mixing concerns
Better Tooling Opportunities:
dumbo schema:diff - Compare JSON files
dumbo manifest:verify - Check manifest against database
dumbo schema:validate - JSON schema validation
Simpler Mental Model:
"Schemas describe what IS"
"Manifests describe what WAS DONE"
"Migrations describe HOW TO DO IT"
My Recommendation
Yes, JSON would significantly help with understanding and implementing the split. Specifically:
Use JSON for snapshots (both schema and manifest)
Keep them in separate directories
Generate TypeScript types from the JSON for type safety
Store the actual SQL in migrations as a third, distinct concern
This gives you:
Clear separation of concerns
Better git diffs and merge resolution
Language-agnostic data files
Type safety through code generation
Simpler mental model
The only real loss is the elegance of TypeScript imports for composition, but that's a worthwhile trade-off for the clarity gained. Would you like me to revise the plan to use this JSON-based approach with properly separated concerns?
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import type { SQLColumnToken } from '../../sql';
import {
schemaComponent,
type SchemaComponent,
type SchemaComponentOptions,
} from '../schemaComponent';

export type ColumnURNType = 'sc:dumbo:column';
export type ColumnURN = `${ColumnURNType}:${string}`;

export const ColumnURNType: ColumnURNType = 'sc:dumbo:column';
export const ColumnURN = ({ name }: { name: string }): ColumnURN =>
`${ColumnURNType}:${name}`;

export type ColumnSchemaComponent = SchemaComponent<
ColumnURN,
Readonly<{
columnName: string;
}>
>;
export const columnSchemaComponent = ({
columnName,
...migrationsOrComponents
}: {
columnName: string;
} & SchemaComponentOptions &
Omit<SQLColumnToken['value'], 'name'>): ColumnSchemaComponent => {
const sc = schemaComponent(
ColumnURN({ name: columnName }),
migrationsOrComponents,
);

return {
...sc,
columnName,
};
};
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
import {
mapSchemaComponentsOfType,
schemaComponent,
type SchemaComponent,
type SchemaComponentOptions,
} from '../schemaComponent';
import {
DatabaseSchemaURNType,
databaseSchemaSchemaComponent,
type DatabaseSchemaSchemaComponent,
} from './databaseSchemaSchemaComponent';

export type DatabaseURNType = 'sc:dumbo:database';
export type DatabaseURN = `${DatabaseURNType}:${string}`;

export const DatabaseURNType: DatabaseURNType = 'sc:dumbo:database';
export const DatabaseURN = ({ name }: { name: string }): DatabaseURN =>
`${DatabaseURNType}:${name}`;

export type DatabaseSchemaComponent = SchemaComponent<
DatabaseURN,
Readonly<{
databaseName: string;
schemas: ReadonlyMap<string, DatabaseSchemaSchemaComponent>;
addSchema: (
schema: string | DatabaseSchemaSchemaComponent,
) => DatabaseSchemaSchemaComponent;
}>
>;

export const databaseSchemaComponent = ({
databaseName,
schemaNames,
...migrationsOrComponents
}: {
databaseName: string;
schemaNames?: string[];
} & SchemaComponentOptions): DatabaseSchemaComponent => {
const schemas =
schemaNames?.map((schemaName) =>
databaseSchemaSchemaComponent({ schemaName }),
) ?? [];

const base = schemaComponent(DatabaseURN({ name: databaseName }), {
migrations: migrationsOrComponents.migrations ?? [],
components: [...(migrationsOrComponents.components ?? []), ...schemas],
});

return {
...base,
databaseName,
get schemas() {
return mapSchemaComponentsOfType<DatabaseSchemaSchemaComponent>(
base.components,
DatabaseSchemaURNType,
(c) => c.schemaName,
);
},
addSchema: (schema: string | DatabaseSchemaSchemaComponent) =>
base.addComponent(
typeof schema === 'string'
? databaseSchemaSchemaComponent({ schemaName: schema })
: schema,
),
};
};
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
import {
mapSchemaComponentsOfType,
schemaComponent,
type SchemaComponent,
type SchemaComponentOptions,
} from '../schemaComponent';
import {
TableURNType,
tableSchemaComponent,
type TableSchemaComponent,
} from './tableSchemaComponent';

export type DatabaseSchemaURNType = 'sc:dumbo:database_schema';
export type DatabaseSchemaURN = `${DatabaseSchemaURNType}:${string}`;

export const DatabaseSchemaURNType: DatabaseSchemaURNType =
'sc:dumbo:database_schema';
export const DatabaseSchemaURN = ({
name,
}: {
name: string;
}): DatabaseSchemaURN => `${DatabaseSchemaURNType}:${name}`;

export type DatabaseSchemaSchemaComponent = SchemaComponent<
DatabaseSchemaURN,
Readonly<{
schemaName: string;
tables: ReadonlyMap<string, TableSchemaComponent>;
addTable: (table: string | TableSchemaComponent) => TableSchemaComponent;
}>
>;

export const databaseSchemaSchemaComponent = ({
schemaName,
tableNames,
...migrationsOrComponents
}: {
schemaName: string;
tableNames?: string[];
} & SchemaComponentOptions): DatabaseSchemaSchemaComponent => {
const tables =
tableNames?.map((tableName) => tableSchemaComponent({ tableName })) ?? [];

const base = schemaComponent(DatabaseSchemaURN({ name: schemaName }), {
migrations: migrationsOrComponents.migrations ?? [],
components: [...(migrationsOrComponents.components ?? []), ...tables],
});

return {
...base,
schemaName,
get tables() {
return mapSchemaComponentsOfType<TableSchemaComponent>(
base.components,
TableURNType,
(c) => c.tableName,
);
},
addTable: (table: string | TableSchemaComponent) =>
base.addComponent(
typeof table === 'string'
? tableSchemaComponent({ tableName: table })
: table,
),
};
};
23 changes: 23 additions & 0 deletions src/packages/dumbo/src/core/schema/components/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
import { ColumnURN } from './columnSchemaComponent';
import { DatabaseURN } from './databaseSchemaComponent';
import { DatabaseSchemaURN } from './databaseSchemaSchemaComponent';
import { IndexURN } from './indexSchemaComponent';
import { TableURN } from './tableSchemaComponent';

export * from './columnSchemaComponent';
export * from './databaseSchemaComponent';
export * from './databaseSchemaSchemaComponent';
export * from './indexSchemaComponent';
export * from './tableSchemaComponent';

export const schemaComponentURN = {
database: DatabaseURN,
schema: DatabaseSchemaURN,
table: TableURN,
column: ColumnURN,
index: IndexURN,
extractName: (urn: string): string => {
const parts = urn.split(':');
return parts[parts.length - 1] || '';
},
} as const;
Loading
Loading