|
| 1 | +# @hpcc-js/dataflow Copilot Instructions |
| 2 | + |
| 3 | +## Architecture Overview |
| 4 | + |
| 5 | +This is a **functional data flow library** using JavaScript generators and iterators for lazy evaluation. Think of it as a streaming data pipeline where data flows through activities and is observed by sensors. |
| 6 | + |
| 7 | +**Core Concepts:** |
| 8 | +- **Source<T>**: Either `T[]` or `IterableIterator<T>` - the input data |
| 9 | +- **Activities**: Transform data as it flows (`map`, `filter`, `sort`) - return iterators |
| 10 | +- **Observers/Sensors**: Monitor data without modifying it (`count`, `max`, `mean`) - accumulate state |
| 11 | +- **Pipe**: Chains activities together into reusable pipelines with full type safety |
| 12 | + |
| 13 | +**Key Files:** |
| 14 | +- `src/activities/activity.ts` - Core type definitions for the entire system |
| 15 | +- `src/utils/pipe.ts` - Complex TypeScript type magic for type-safe activity chaining |
| 16 | +- `src/observers/observer.ts` - Observer pattern with `observe()` and `peek()` methods |
| 17 | + |
| 18 | +## Critical Patterns |
| 19 | + |
| 20 | +### Dual Signature Pattern (Performance Optimization) |
| 21 | + |
| 22 | +Activities use TypeScript overloads to support both immediate execution and curried usage: |
| 23 | + |
| 24 | +```typescript |
| 25 | +// Immediate execution |
| 26 | +export function map<T, U>(source: Source<T>, callbackFn: MapCallback<T, U>): IterableIterator<U>; |
| 27 | +// Curried (returns reusable activity) |
| 28 | +export function map<T, U>(callbackFn: MapCallback<T, U>): IterableActivity<T, U>; |
| 29 | + |
| 30 | +export function map<T, U>(s_or_cb: Source<T> | MapCallback<T, U>, callbackFn?: MapCallback<T, U>) { |
| 31 | + return isSource(s_or_cb) ? mapGen(callbackFn!)(s_or_cb) : mapGen(s_or_cb); |
| 32 | +} |
| 33 | +``` |
| 34 | + |
| 35 | +**Performance optimization (in progress):** Activities are being migrated from `isSource()` runtime checks to `arguments.length` checks for better performance. See `sort.ts` for the optimized pattern - it eliminates expensive runtime type inspection in favor of fast argument counting. |
| 36 | + |
| 37 | +### Generator Functions for Lazy Evaluation |
| 38 | + |
| 39 | +All activities use generator functions (`function*`) to enable lazy evaluation: |
| 40 | + |
| 41 | +```typescript |
| 42 | +function* (source: Source<T>) { |
| 43 | + let i = -1; |
| 44 | + for (const item of source) { |
| 45 | + yield callbackFn(item, ++i); |
| 46 | + } |
| 47 | +} |
| 48 | +``` |
| 49 | + |
| 50 | +This ensures data only flows when consumed (e.g., via `[...iterator]` or `for...of`). |
| 51 | + |
| 52 | +### Observers Accumulate State |
| 53 | + |
| 54 | +Observers have two methods: |
| 55 | +- `observe(value, index)` - Called for each item as it flows through |
| 56 | +- `peek()` - Returns accumulated result without consuming the iterator |
| 57 | + |
| 58 | +Observers can be inserted into pipes using `sensor()` or converted to activities using `scalar()` or `activity()`. |
| 59 | + |
| 60 | +### Array Mutation Prevention |
| 61 | + |
| 62 | +**Always use `.slice()` before `.sort()` to avoid mutating input arrays:** |
| 63 | + |
| 64 | +```typescript |
| 65 | +const arr = Array.isArray(source) ? source.slice() : [...source]; |
| 66 | +yield* arr.sort(compareFn); |
| 67 | +``` |
| 68 | + |
| 69 | +This pattern appears in `sort.ts`, `median.ts`, `quartile.ts`. |
| 70 | + |
| 71 | +## Build & Test Workflow |
| 72 | + |
| 73 | +**Build Commands:** |
| 74 | +- `npm run build` - Parallel TypeScript compilation + Vite bundling (`run-p gen-types bundle`) |
| 75 | +- `npm run gen-types` - Generate `.d.ts` files in `types/` directory |
| 76 | +- `npm run bundle` - Vite builds UMD + ES modules to `dist/` |
| 77 | + |
| 78 | +**Testing:** |
| 79 | +- `npm test` - Runs type checking + vitest (both node & browser environments) |
| 80 | +- `npm run test-vitest` - Vitest only (dual environment: node + chromium) |
| 81 | +- `npm run bench` - Performance benchmarks (see `tests/pipe.bench.ts`) |
| 82 | + |
| 83 | +**Test Structure:** |
| 84 | +- Each activity/observer has a matching `.spec.ts` file in `tests/` |
| 85 | +- `tests/pipe.spec.ts` and `tests/pipe.bench.ts` test pipeline composition |
| 86 | +- Tests verify both immediate execution and curried usage patterns |
| 87 | + |
| 88 | +## TypeScript Configuration |
| 89 | + |
| 90 | +- Uses `"allowImportingTsExtensions": true` - **always use `.ts` extensions in imports** |
| 91 | +- `"module": "NodeNext"` - ES modules with Node.js compatibility |
| 92 | +- Type definitions generated to `types/` directory (not inline with source) |
| 93 | + |
| 94 | +## Common Gotchas |
| 95 | + |
| 96 | +1. **Index tracking:** Most activities use `let i = -1; for (const item) { ++i }` pattern - maintains correct index through transformations |
| 97 | + |
| 98 | +2. **Optional parameters with undefined:** When using `arguments.length` optimization, handle explicit `undefined` (e.g., `sort(source, undefined)` for default sort) |
| 99 | + |
| 100 | +3. **Type inference in pipe():** The `pipe()` function uses sophisticated TypeScript to infer return types - if types break, check that activity input/output types align correctly |
| 101 | + |
| 102 | +4. **Histogram edge cases:** `histogram` has special handling for empty sources - yields empty buckets with NaN bounds for `buckets` option, returns nothing for `min/range` option |
| 103 | + |
| 104 | +5. **Generator initialization:** Generators don't execute until iterated - sensors remain `undefined` until data flows through |
| 105 | + |
| 106 | +## Code Style |
| 107 | + |
| 108 | +- Use generator functions for all iterable activities |
| 109 | +- Prefer `for...of` over manual iterator manipulation |
| 110 | +- Use `yield*` to delegate to another generator |
| 111 | +- Type parameters: `<T = any>` allows inference while providing fallback |
| 112 | +- Function naming: `activityGen` helper functions create the generator, exported function handles overload dispatch |
0 commit comments