event-driven-io · oskardudycz · Sep 27, 2025 · Sep 27, 2025 · Oct 16, 2025 · Oct 16, 2025
diff --git a/plan.md b/plan.md
diff --git a/src/packages/dumbo/src/core/schema/MIGRATION_UNIFICATION_PLAN.md b/src/packages/dumbo/src/core/schema/MIGRATION_UNIFICATION_PLAN.md
diff --git a/src/packages/dumbo/src/core/schema/REVIEW.MD b/src/packages/dumbo/src/core/schema/REVIEW.MD
@@ -0,0 +1,201 @@
+Deep Analysis of the Architecture
+After thoroughly rereading the plan, here's my multi-perspective analysis:
+From a Regular User's Perspective (Storage Library Consumer)
+Strengths:
+Progressive Disclosure of Complexity - Users can start simple (SQL-only or TypeScript-only) and gradually adopt more features. This is excellent API design.
+Familiar Patterns - The API resembles popular ORMs (Prisma, TypeORM) making adoption easier.
+Clear Error Messages - The immutability validation provides actionable feedback rather than cryptic errors.
+Type Safety Benefits - The generated types for query builders provide immediate value that developers can feel in their IDE.
+Potential Pain Points:
+Hidden Complexity in Snapshots - The dual-purpose snapshots (schema state + migration tracking) might be confusing initially. Users might not understand why snapshots contain their migrations.
+Directory Proliferation - The .dumbo/ folder with nested snapshots could feel overwhelming for small projects.
+Mental Model Shift - Understanding when TypeScript is truth vs when SQL is truth requires careful documentation and good examples.
+Migration Immutability - While safer, this might frustrate users coming from Rails/Django where you can edit migrations before deploying.
+From a Software Architect's Perspective
+Architectural Strengths:
+Separation of Concerns - Each component generating its own SQL is textbook single responsibility principle. This makes the system highly testable and maintainable.
+Command-Query Separation - The clear distinction between:
+Commands (processMigrations, runMigrations)
+Queries (loadSnapshot, validateMigrations)
+Pure functions (generateMigrations, diffTableWithSnapshot)
+Temporal Decoupling - Snapshots act as a temporal boundary, allowing the system to reason about "then vs now" without querying the database.
+Composability - The tree structure with proper traversal order handles complex dependencies elegantly.
+Architectural Concerns:
+Snapshot as Dual Source of Truth - Using snapshots for BOTH schema state AND migration tracking violates single responsibility. Consider splitting into:
+Schema snapshots (current state for diffing)
+Migration manifest (tracking what's been processed)
+File System as Database - The .dumbo/migrations/ directory becomes a critical data store. What happens with:
+Concurrent writes?
+File system permissions?
+CI/CD environments?
+Type Generation Timing - Generating types from snapshots rather than from the database means types might not reflect reality if migrations fail partially.
+Validation vs Enforcement - The system validates but doesn't enforce schema consistency. A team member could still manually edit the database.
+Architecture as Code Analysis
+Elegant Design Patterns:
+Builder Pattern in dumboSchema API:
+dumboSchema.database('myapp', {
+public: dumboSchema.schema('public', {
+users: dumboSchema.table('users', {...})
+})
+})
+Template Method Pattern in migration generation - each component implements its own generateMigrations() but follows the same structure.
+Visitor Pattern in tree traversal - the orchestrator visits each component in dependency order.
+Memento Pattern - Snapshots act as mementos capturing component state at a point in time.
+Code Smells to Address:
+Large Parameter Lists - The orchestrator constructor and several functions have many parameters that could be grouped into configuration objects.
+String Manipulation for SQL - Building SQL through string concatenation (even with template literals) is risky. Consider a SQL builder abstraction.
+Implicit Ordering Dependencies - The migration grouping relies on naming conventions (includes('schema'), includes('create_table')). This is fragile.
+Mixed Abstraction Levels - The orchestrator handles both high-level workflow and low-level file I/O. Consider extracting a MigrationWriter service.
+Philosophical Considerations
+The Snapshot Paradox: The snapshots serve as both historical record (what migrations were defined) and current state (what the schema looks like). This dual nature is philosophically interesting but practically complex. It's like Git storing both diffs AND the current state in the same structure. Migration Immutability Trade-off: Enforcing immutability is philosophically sound (history should be append-only) but practically challenging. Real-world schema evolution is messy, and sometimes you need to fix a broken migration. TypeScript as Truth: The decision to make TypeScript the source of truth when present is bold. It prioritizes developer experience over database reality, which could lead to drift.
+Recommendations for Improvement
+Split Snapshot Responsibilities:
+Keep schema snapshots for state
+Add a separate .dumbo/manifest.json for tracking processed migrations
+Add Schema Verification:
+After running migrations, query information_schema to verify the result matches expectations
+Generate types from actual database state, not snapshots
+Implement Conflict Resolution:
+Add a dumbo schema:merge command for snapshot conflicts
+Use three-way merge logic (base, mine, theirs)
+Consider Event Sourcing:
+Treat migrations as events in an event store
+Snapshots become projections of those events
+Enables time-travel debugging
+Add Dry-Run Capability:
+Let users preview what migrations would be generated/applied
+Essential for production deployments
+Overall Assessment
+This is a sophisticated and well-thought-out architecture that successfully bridges the gap between SQL-first and TypeScript-first workflows. The use of snapshots for migration tracking is innovative, though it adds complexity. The system would benefit from:
+Clearer separation between state tracking and migration manifest
+Runtime verification that generated types match database reality
+Better tooling for snapshot conflict resolution
+More explicit handling of edge cases (partial migration failure, concurrent modifications)
+The architecture shows maturity in its backwards compatibility and progressive enhancement approach. It's clearly designed by someone who understands both the technical challenges and the human factors of schema management. Grade: B+ Strong foundation with room for refinement in the snapshot dual-responsibility and runtime verification aspects.
+
+#########
+
+What if snapshots were stored as jsons would that help in understanding the split?
+Excellent question! Let me think through how JSON snapshots would affect the architecture, particularly around the split of responsibilities.
+JSON Snapshots: Impact Analysis
+How JSON Would Help with the Split
+Current Issue with TypeScript Snapshots:
+// .dumbo/snapshots/schemas/public/users.snapshot.ts
+export const usersSnapshot = {
+tableName: 'users',
+columns: { /_ schema state _/ },
+migrations: [ /* migration tracking */ ] // Mixed responsibility!
+} as const;
+With JSON + Split Responsibilities:
+// .dumbo/snapshots/schemas/public/users.schema.json
+{
+"tableName": "users",
+"columns": {
+"id": { "type": "serial", "primaryKey": true },
+"email": { "type": "varchar", "length": 255 }
+},
+"indexes": {}
+}
+
+// .dumbo/manifests/schemas/public/users.manifest.json
+{
+"component": "sc:dumbo:table:users",
+"processedMigrations": [
+{
+"name": "001_custom_create",
+"hash": "sha256:abcd1234...",
+"processedAt": "2024-01-15T10:30:00Z",
+"source": "provided"
+}
+]
+}
+Benefits of JSON Format
+Clear Separation of Concerns
+_.schema.json = Current state (for diffing)
+_.manifest.json = Migration history (for tracking)
+No temptation to mix responsibilities in a single file
+Better Git Diffs
+{
+"columns": {
+"id": { "type": "serial" },
+
+- "email": { "type": "varchar", "length": 255 }
+  }
+  }
+  Much cleaner than TypeScript object diffs with imports and type annotations.
+  Language Agnostic
+  Other tools (Python scripts, CI/CD, database admin tools) can read/write
+  No TypeScript compilation needed to access snapshot data
+  Simpler Merge Resolution
+  JSON merge tools are mature and widespread
+  Can use jq or similar for command-line manipulation
+  Easier to write custom merge strategies
+  Drawbacks of JSON
+  Loss of Type Safety
+  No compile-time checking of snapshot structure
+  Need runtime validation (though you need this anyway)
+  No Imports/Composition
+  Can't use TypeScript's import system for nested structure
+  Would need to manually resolve file references
+  Verbosity for Complex Structures
+  No as const assertions
+  More verbose without TypeScript's type inference
+  Hybrid Approach: Best of Both Worlds
+  What if we used JSON for storage but TypeScript for types?
+  // .dumbo/snapshots/types.ts (generated)
+  import usersSchema from './schemas/public/users.schema.json';
+  import postsSchema from './schemas/public/posts.schema.json';
+
+export type DatabaseSchema = {
+public: {
+users: typeof usersSchema;
+posts: typeof postsSchema;
+}
+};
+Recommended Structure with JSON
+.dumbo/
+schemas/ # Current state (JSON)
+database.json
+public/
+schema.json
+tables/
+users.json
+posts.json
+manifests/ # Migration tracking (JSON)
+database.json
+public/
+schema.json
+tables/
+users.json
+posts.json
+migrations/ # SQL files
+2024-01-15-001-create.sql
+generated/ # Generated TypeScript
+types.ts # Generated from schemas/
+manifest-types.ts # Types for manifest structure
+Impact on the Architecture
+Cleaner Responsibilities:
+Schema JSON: Pure state representation
+Manifest JSON: Pure migration history
+Generated TS: Type safety without mixing concerns
+Better Tooling Opportunities:
+dumbo schema:diff - Compare JSON files
+dumbo manifest:verify - Check manifest against database
+dumbo schema:validate - JSON schema validation
+Simpler Mental Model:
+"Schemas describe what IS"
+"Manifests describe what WAS DONE"
+"Migrations describe HOW TO DO IT"
+My Recommendation
+Yes, JSON would significantly help with understanding and implementing the split. Specifically:
+Use JSON for snapshots (both schema and manifest)
+Keep them in separate directories
+Generate TypeScript types from the JSON for type safety
+Store the actual SQL in migrations as a third, distinct concern
+This gives you:
+Clear separation of concerns
+Better git diffs and merge resolution
+Language-agnostic data files
+Type safety through code generation
+Simpler mental model
+The only real loss is the elegance of TypeScript imports for composition, but that's a worthwhile trade-off for the clarity gained. Would you like me to revise the plan to use this JSON-based approach with properly separated concerns?
diff --git a/src/packages/dumbo/src/core/schema/components/columnSchemaComponent.ts b/src/packages/dumbo/src/core/schema/components/columnSchemaComponent.ts
@@ -0,0 +1,37 @@
+import type { SQLColumnToken } from '../../sql';
+import {
+  schemaComponent,
+  type SchemaComponent,
+  type SchemaComponentOptions,
+} from '../schemaComponent';
+
+export type ColumnURNType = 'sc:dumbo:column';
+export type ColumnURN = `${ColumnURNType}:${string}`;
+
+export const ColumnURNType: ColumnURNType = 'sc:dumbo:column';
+export const ColumnURN = ({ name }: { name: string }): ColumnURN =>
+  `${ColumnURNType}:${name}`;
+
+export type ColumnSchemaComponent = SchemaComponent<
+  ColumnURN,
+  Readonly<{
+    columnName: string;
+  }>
+>;
+export const columnSchemaComponent = ({
+  columnName,
+  ...migrationsOrComponents
+}: {
+  columnName: string;
+} & SchemaComponentOptions &
+  Omit<SQLColumnToken['value'], 'name'>): ColumnSchemaComponent => {
+  const sc = schemaComponent(
+    ColumnURN({ name: columnName }),
+    migrationsOrComponents,
+  );
+
+  return {
+    ...sc,
+    columnName,
+  };
+};
diff --git a/src/packages/dumbo/src/core/schema/components/databaseSchemaComponent.ts b/src/packages/dumbo/src/core/schema/components/databaseSchemaComponent.ts
@@ -0,0 +1,66 @@
+import {
+  mapSchemaComponentsOfType,
+  schemaComponent,
+  type SchemaComponent,
+  type SchemaComponentOptions,
+} from '../schemaComponent';
+import {
+  DatabaseSchemaURNType,
+  databaseSchemaSchemaComponent,
+  type DatabaseSchemaSchemaComponent,
+} from './databaseSchemaSchemaComponent';
+
+export type DatabaseURNType = 'sc:dumbo:database';
+export type DatabaseURN = `${DatabaseURNType}:${string}`;
+
+export const DatabaseURNType: DatabaseURNType = 'sc:dumbo:database';
+export const DatabaseURN = ({ name }: { name: string }): DatabaseURN =>
+  `${DatabaseURNType}:${name}`;
+
+export type DatabaseSchemaComponent = SchemaComponent<
+  DatabaseURN,
+  Readonly<{
+    databaseName: string;
+    schemas: ReadonlyMap<string, DatabaseSchemaSchemaComponent>;
+    addSchema: (
+      schema: string | DatabaseSchemaSchemaComponent,
+    ) => DatabaseSchemaSchemaComponent;
+  }>
+>;
+
+export const databaseSchemaComponent = ({
+  databaseName,
+  schemaNames,
+  ...migrationsOrComponents
+}: {
+  databaseName: string;
+  schemaNames?: string[];
+} & SchemaComponentOptions): DatabaseSchemaComponent => {
+  const schemas =
+    schemaNames?.map((schemaName) =>
+      databaseSchemaSchemaComponent({ schemaName }),
+    ) ?? [];
+
+  const base = schemaComponent(DatabaseURN({ name: databaseName }), {
+    migrations: migrationsOrComponents.migrations ?? [],
+    components: [...(migrationsOrComponents.components ?? []), ...schemas],
+  });
+
+  return {
+    ...base,
+    databaseName,
+    get schemas() {
+      return mapSchemaComponentsOfType<DatabaseSchemaSchemaComponent>(
+        base.components,
+        DatabaseSchemaURNType,
+        (c) => c.schemaName,
+      );
+    },
+    addSchema: (schema: string | DatabaseSchemaSchemaComponent) =>
+      base.addComponent(
+        typeof schema === 'string'
+          ? databaseSchemaSchemaComponent({ schemaName: schema })
+          : schema,
+      ),
+  };
+};
diff --git a/src/packages/dumbo/src/core/schema/components/databaseSchemaSchemaComponent.ts b/src/packages/dumbo/src/core/schema/components/databaseSchemaSchemaComponent.ts
@@ -0,0 +1,66 @@
+import {
+  mapSchemaComponentsOfType,
+  schemaComponent,
+  type SchemaComponent,
+  type SchemaComponentOptions,
+} from '../schemaComponent';
+import {
+  TableURNType,
+  tableSchemaComponent,
+  type TableSchemaComponent,
+} from './tableSchemaComponent';
+
+export type DatabaseSchemaURNType = 'sc:dumbo:database_schema';
+export type DatabaseSchemaURN = `${DatabaseSchemaURNType}:${string}`;
+
+export const DatabaseSchemaURNType: DatabaseSchemaURNType =
+  'sc:dumbo:database_schema';
+export const DatabaseSchemaURN = ({
+  name,
+}: {
+  name: string;
+}): DatabaseSchemaURN => `${DatabaseSchemaURNType}:${name}`;
+
+export type DatabaseSchemaSchemaComponent = SchemaComponent<
+  DatabaseSchemaURN,
+  Readonly<{
+    schemaName: string;
+    tables: ReadonlyMap<string, TableSchemaComponent>;
+    addTable: (table: string | TableSchemaComponent) => TableSchemaComponent;
+  }>
+>;
+
+export const databaseSchemaSchemaComponent = ({
+  schemaName,
+  tableNames,
+  ...migrationsOrComponents
+}: {
+  schemaName: string;
+  tableNames?: string[];
+} & SchemaComponentOptions): DatabaseSchemaSchemaComponent => {
+  const tables =
+    tableNames?.map((tableName) => tableSchemaComponent({ tableName })) ?? [];
+
+  const base = schemaComponent(DatabaseSchemaURN({ name: schemaName }), {
+    migrations: migrationsOrComponents.migrations ?? [],
+    components: [...(migrationsOrComponents.components ?? []), ...tables],
+  });
+
+  return {
+    ...base,
+    schemaName,
+    get tables() {
+      return mapSchemaComponentsOfType<TableSchemaComponent>(
+        base.components,
+        TableURNType,
+        (c) => c.tableName,
+      );
+    },
+    addTable: (table: string | TableSchemaComponent) =>
+      base.addComponent(
+        typeof table === 'string'
+          ? tableSchemaComponent({ tableName: table })
+          : table,
+      ),
+  };
+};
diff --git a/src/packages/dumbo/src/core/schema/components/index.ts b/src/packages/dumbo/src/core/schema/components/index.ts
@@ -0,0 +1,23 @@
+import { ColumnURN } from './columnSchemaComponent';
+import { DatabaseURN } from './databaseSchemaComponent';
+import { DatabaseSchemaURN } from './databaseSchemaSchemaComponent';
+import { IndexURN } from './indexSchemaComponent';
+import { TableURN } from './tableSchemaComponent';
+
+export * from './columnSchemaComponent';
+export * from './databaseSchemaComponent';
+export * from './databaseSchemaSchemaComponent';
+export * from './indexSchemaComponent';
+export * from './tableSchemaComponent';
+
+export const schemaComponentURN = {
+  database: DatabaseURN,
+  schema: DatabaseSchemaURN,
+  table: TableURN,
+  column: ColumnURN,
+  index: IndexURN,
+  extractName: (urn: string): string => {
+    const parts = urn.split(':');
+    return parts[parts.length - 1] || '';
+  },
+} as const;