Skip to content

web-csv-toolbox@0.14.0

Latest

Choose a tag to compare

@github-actions github-actions released this 27 Nov 14:48
· 4 commits to main since this release
01dbd68

Minor Changes

  • #608 24f04d7 Thanks @kamiazya! - feat!: rename binary stream APIs for consistency and add BufferSource support

    Summary

    This release standardizes the naming of binary stream parsing APIs to match the existing parseBinary* family, and extends support to accept any BufferSource type (ArrayBuffer, Uint8Array, and other TypedArray views).

    Breaking Changes

    API Renaming for Consistency

    All parseUint8Array* functions have been renamed to parseBinary* to maintain consistency with existing binary parsing APIs:

    Function Names:

    • parseUint8ArrayStream() β†’ parseBinaryStream()
    • parseUint8ArrayStreamToStream() β†’ parseBinaryStreamToStream()

    Type Names:

    • ParseUint8ArrayStreamOptions β†’ ParseBinaryStreamOptions

    Internal Functions (for reference):

    • parseUint8ArrayStreamInMain() β†’ parseBinaryStreamInMain()
    • parseUint8ArrayStreamInWorker() β†’ parseBinaryStreamInWorker()
    • parseUint8ArrayStreamInWorkerWASM() β†’ parseBinaryStreamInWorkerWASM()

    Rationale:
    The previous naming was inconsistent with the rest of the binary API family (parseBinary, parseBinaryToArraySync, parseBinaryToIterableIterator, parseBinaryToStream). The new naming provides:

    • Perfect consistency across all binary parsing APIs
    • Clear indication that these functions accept any binary data format
    • Better predictability for API discovery

    BufferSource Support

    FlexibleBinaryCSVParser and BinaryCSVParserStream now accept BufferSource (= ArrayBuffer | ArrayBufferView) instead of just Uint8Array:

    Before:

    const parser = new FlexibleBinaryCSVParser({ header: ['name', 'age'] });
    const data = new Uint8Array([...]); // Only Uint8Array
    const records = parser.parse(data);

    After:

    const parser = new FlexibleBinaryCSVParser({ header: ['name', 'age'] });
    
    // Uint8Array still works
    const uint8Data = new Uint8Array([...]);
    const records1 = parser.parse(uint8Data);
    
    // ArrayBuffer now works directly
    const buffer = await fetch('data.csv').then(r => r.arrayBuffer());
    const records2 = parser.parse(buffer);
    
    // Other TypedArray views also work
    const int8Data = new Int8Array([...]);
    const records3 = parser.parse(int8Data);

    Benefits:

    • Direct use of fetch().then(r => r.arrayBuffer()) without conversion
    • Flexibility to work with any TypedArray view
    • Alignment with Web API standards (BufferSource is widely used)

    Migration Guide

    Automatic Migration

    Use find-and-replace in your codebase:

    # Function calls
    parseUint8ArrayStream β†’ parseBinaryStream
    parseUint8ArrayStreamToStream β†’ parseBinaryStreamToStream
    
    # Type references
    ParseUint8ArrayStreamOptions β†’ ParseBinaryStreamOptions

    TypeScript Users

    If you were explicitly typing with Uint8Array, you can now use the more general BufferSource:

    // Before
    function processCSV(data: Uint8Array) {
      return parseBinaryStream(data);
    }
    
    // After (more flexible)
    function processCSV(data: BufferSource) {
      return parseBinaryStream(data);
    }

    Updated API Consistency

    All binary parsing APIs now follow a consistent naming pattern:

    // Single-value binary data
    parseBinary(); // Binary β†’ AsyncIterableIterator<Record>
    parseBinaryToArraySync(); // Binary β†’ Array<Record> (sync)
    parseBinaryToIterableIterator(); // Binary β†’ IterableIterator<Record>
    parseBinaryToStream(); // Binary β†’ ReadableStream<Record>
    
    // Streaming binary data
    parseBinaryStream(); // ReadableStream<Uint8Array> β†’ AsyncIterableIterator<Record>
    parseBinaryStreamToStream(); // ReadableStream<Uint8Array> β†’ ReadableStream<Record>

    Note: While the stream input type remains ReadableStream<Uint8Array> (Web Streams API standard), the internal parsers now accept BufferSource for individual chunks.

    Documentation Updates

    README.md

    • Updated Low-level APIs section to reflect parseBinaryStream* naming
    • Added flush procedure documentation for streaming mode
    • Added BufferSource examples

    API Reference (docs/reference/package-exports.md)

    • Added comprehensive Low-level API Reference section
    • Documented all Parser Models (Tier 1) and Lexer + Assembler (Tier 2)
    • Included usage examples and code snippets

    Architecture Guide (docs/explanation/parsing-architecture.md)

    • Updated Binary CSV Parser section to document BufferSource support
    • Added detailed streaming mode examples with flush procedures
    • Clarified multi-byte character handling across chunk boundaries

    Flush Procedure Clarification

    Documentation now explicitly covers the requirement to call parse() without arguments when using streaming mode:

    const parser = createBinaryCSVParser({ header: ["name", "age"] });
    const encoder = new TextEncoder();
    
    // Process chunks
    const records1 = parser.parse(encoder.encode("Alice,30\nBob,"), {
      stream: true,
    });
    const records2 = parser.parse(encoder.encode("25\n"), { stream: true });
    
    // IMPORTANT: Flush remaining data (required!)
    const records3 = parser.parse();

    This prevents data loss from incomplete records or multi-byte character buffers.

    Type Safety

    All changes maintain full TypeScript strict mode compliance with proper type inference and generic constraints.

  • #608 24f04d7 Thanks @kamiazya! - Add arrayBufferThreshold option to Engine configuration for automatic Blob reading strategy selection

    New Feature

    Added engine.arrayBufferThreshold option that automatically selects the optimal Blob reading strategy based on file size:

    • Files smaller than threshold: Use blob.arrayBuffer() + parseBinary() (6-8x faster, confirmed by benchmarks)
    • Files equal to or larger than threshold: Use blob.stream() + parseBinaryStream() (memory-efficient)

    Default: 1MB (1,048,576 bytes), determined by comprehensive benchmarks

    Applies to: parseBlob() and parseFile() only

    Benchmark Results

    File Size Binary (ops/sec) Stream (ops/sec) Performance Gain
    1KB 21,691 2,685 8.08x faster
    10KB 2,187 311 7.03x faster
    100KB 219 32 6.84x faster
    1MB 20 3 6.67x faster

    Usage

    import { parseBlob, EnginePresets } from "web-csv-toolbox";
    
    // Use default (1MB threshold)
    for await (const record of parseBlob(file)) {
      console.log(record);
    }
    
    // Always use streaming (memory-efficient)
    for await (const record of parseBlob(largeFile, {
      engine: { arrayBufferThreshold: 0 },
    })) {
      console.log(record);
    }
    
    // Custom threshold (512KB)
    for await (const record of parseBlob(file, {
      engine: { arrayBufferThreshold: 512 * 1024 },
    })) {
      console.log(record);
    }
    
    // With preset
    for await (const record of parseBlob(file, {
      engine: EnginePresets.fastest({
        arrayBufferThreshold: 2 * 1024 * 1024, // 2MB
      }),
    })) {
      console.log(record);
    }

    Special Values

    • 0 - Always use streaming (maximum memory efficiency)
    • Infinity - Always use arrayBuffer (maximum performance for small files)

    Security Note

    When using arrayBufferThreshold > 0, files must stay below maxBufferSize (default 10MB) to prevent excessive memory allocation. Files exceeding this limit will throw a RangeError.

    Design Philosophy

    This option belongs to engine configuration because it affects performance and behavior only, not the parsing result specification. This follows the design principle:

    • Top-level options: Affect specification (result changes)
    • Engine options: Affect performance/behavior (same result, different execution)
  • #608 24f04d7 Thanks @kamiazya! - Add support for Blob, File, and Request objects

    This release adds native support for parsing CSV data from Web Standard Blob, File, and Request objects, making the library more versatile across different environments.

    New Functions:

    • parseBlob(blob, options) - Parse CSV from Blob or File objects

      • Automatic charset detection from blob.type property
      • Supports compression via decompression option
      • Returns AsyncIterableIterator<CSVRecord>
      • Includes .toArray() and .toStream() namespace methods
    • parseFile(file, options) - Enhanced File parsing with automatic error source tracking

      • Built on top of parseBlob with additional functionality
      • Automatically sets file.name as error source for better error reporting
      • Provides clearer intent when working specifically with File objects
      • Useful for file inputs and drag-and-drop scenarios
      • Includes .toArray() and .toStream() namespace methods
    • parseRequest(request, options) - Server-side Request parsing

      • Automatic Content-Type validation and charset extraction
      • Automatic Content-Encoding detection and decompression
      • Designed for Cloudflare Workers, Service Workers, and edge platforms
      • Includes .toArray() and .toStream() namespace methods

    High-level API Integration:

    The parse() function now automatically detects and handles these new input types:

    import { parse } from "web-csv-toolbox";
    
    // Blob/File (browser file uploads)
    // File objects automatically include filename in error messages
    const file = input.files[0];
    for await (const record of parse(file)) {
      console.log(record);
    }
    
    // Request (server-side)
    export default {
      async fetch(request: Request) {
        for await (const record of parse(request)) {
          console.log(record);
        }
      },
    };

    Type System Updates:

    • Updated CSVBinary type to include Blob and Request
    • Added proper type overloads to parse() function
    • Full TypeScript support with generic header types
    • New source field in CommonOptions, CSVRecordAssemblerOptions, and ParseError
      • Allows custom error source identification (e.g., filename, description)
      • Automatically populated for File objects
      • Improves error messages with contextual information
    • Improved internal type naming for better clarity
      • Join β†’ JoinCSVFields - More descriptive CSV field joining utility type
      • Split β†’ SplitCSVFields - More descriptive CSV field splitting utility type
      • These are internal utility types used for CSV type-level string manipulation
    • Enhanced terminology in type definitions
      • TokenLocation.rowNumber - Logical CSV row number (includes header)
      • Clear distinction between physical line numbers (line) and logical row numbers (rowNumber)

    Compression Support:

    All binary input types support compressed data:

    • Blob/File: Manual specification via decompression option

      parseBlob(file, { decompression: "gzip" });
    • Request: Automatic detection from Content-Encoding header

      // No configuration needed - automatic
      parseRequest(request);
    • Supported formats: gzip, deflate, deflate-raw (environment-dependent)

    Helper Functions:

    • getOptionsFromBlob() - Extracts charset from Blob MIME type
    • getOptionsFromFile() - Extracts options from File (charset + automatic source naming)
    • getOptionsFromRequest() - Processes Request headers (Content-Type, Content-Encoding)
    • parseBlobToStream() - Stream conversion helper
    • parseFileToArray() - Parse File to array of records
    • parseFileToStream() - Parse File to ReadableStream
    • parseRequestToStream() - Stream conversion helper

    Documentation:

    Comprehensive documentation following DiΓ‘taxis framework:

    • API Reference:

      • parseBlob.md - Complete API reference with examples
      • parseFile.md - Alias documentation
      • parseRequest.md - Server-side API reference with examples
      • Updated parse.md to include new input types
    • How-to Guides:

      • NEW: platform-usage/ - Environment-specific usage patterns organized by platform
        • Each topic has its own dedicated guide for easy navigation
        • Browser: File input, drag-and-drop, FormData, Fetch API
        • Node.js: Buffer, fs.ReadStream, HTTP requests, stdin/stdout
        • Deno: Deno.readFile, Deno.open, fetch API
      • Organized in {environment}/{topic}.md structure for maintainability
    • Examples:

      • File input elements with HTML samples
      • Drag-and-drop file uploads
      • Compressed file handling (.csv.gz)
      • Validation and error handling patterns
      • NEW: Node.js Buffer usage (supported via BufferSource compatibility)
      • NEW: FormData integration patterns
      • NEW: Node.js stream conversion (fs.ReadStream β†’ Web Streams)
    • Updated:

      • README.md - Added usage examples and API listings
      • choosing-the-right-api.md - Updated decision tree

    Enhanced Error Reporting:

    The source field provides better error context when parsing multiple files:

    import { parseFile } from "web-csv-toolbox";
    
    // Automatic source tracking
    try {
      for await (const record of parseFile(file)) {
        // ...
      }
    } catch (error) {
      console.error(error.message);
      // "Field count (100001) exceeded maximum allowed count of 100000 at row 5 in "data.csv""
      console.error(error.source); // "data.csv"
    }
    
    // Manual source specification
    parseString(csv, { source: "API-Export-2024" });
    // Error: "... at row 5 in "API-Export-2024""

    Security Note: The source field should not contain sensitive information (API keys, tokens, URLs with credentials) as it may be exposed in error messages and logs.

    Use Cases:

    βœ… Browser File Uploads:

    • File input elements (<input type="file">)
    • Drag-and-drop interfaces
    • Compressed file support (.csv.gz)

    βœ… Server-Side Processing:

    • Node.js servers
    • Deno applications
    • Service Workers

    βœ… Automatic Header Processing:

    • Content-Type validation
    • Charset detection
    • Content-Encoding decompression

    Platform Support:

    All new APIs work across:

    • Modern browsers (Chrome, Firefox, Edge, Safari)
    • Node.js 18+ (via undici Request/Blob)
    • Deno
    • Service Workers

    Breaking Changes:

    None - this is a purely additive feature. All existing APIs remain unchanged.

    Migration:

    No migration needed. New functions are available immediately:

    // Before (still works)
    import { parse } from "web-csv-toolbox";
    const response = await fetch("data.csv");
    for await (const record of parse(response)) {
    }
    
    // After (new capabilities)
    import { parseBlob, parseFile, parseRequest } from "web-csv-toolbox";
    
    // Blob support
    for await (const record of parseBlob(blob)) {
    }
    
    // File support with automatic error source
    const file = input.files[0];
    for await (const record of parseFile(file)) {
    }
    // Errors will include: 'in "data.csv"'
    
    // Server-side Request support
    for await (const record of parseRequest(request)) {
    }
    
    // Custom error source for any parser
    import { parseString } from "web-csv-toolbox";
    for await (const record of parseString(csv, { source: "user-import.csv" })) {
    }
  • #608 24f04d7 Thanks @kamiazya! - Implement discriminated union pattern for EngineConfig to improve type safety

    Breaking Changes

    1. EngineConfig Type Structure

    EngineConfig is now a discriminated union based on the worker property:

    Before:

    interface EngineConfig {
      worker?: boolean;
      workerURL?: string | URL;
      workerPool?: WorkerPool;
      workerStrategy?: WorkerCommunicationStrategy;
      strict?: boolean;
      onFallback?: (info: EngineFallbackInfo) => void;
      wasm?: boolean;
      // ... other properties
    }

    After:

    // Base configuration shared by all modes
    interface BaseEngineConfig {
      wasm?: boolean;
      arrayBufferThreshold?: number;
      backpressureCheckInterval?: BackpressureCheckInterval;
      queuingStrategy?: QueuingStrategyConfig;
    }
    
    // Main thread configuration (worker is false or undefined)
    interface MainThreadEngineConfig extends BaseEngineConfig {
      worker?: false;
    }
    
    // Worker configuration (worker must be true)
    interface WorkerEngineConfig extends BaseEngineConfig {
      worker: true;
      workerURL?: string | URL;
      workerPool?: WorkerPool;
      workerStrategy?: WorkerCommunicationStrategy;
      strict?: boolean;
      onFallback?: (info: EngineFallbackInfo) => void;
    }
    
    // Union type
    type EngineConfig = MainThreadEngineConfig | WorkerEngineConfig;

    2. Type Safety Improvements

    Worker-specific properties are now only available when worker: true:

    // βœ… Valid - worker: true allows worker-specific properties
    const config1: EngineConfig = {
      worker: true,
      workerURL: "./worker.js", // βœ… Type-safe
      workerStrategy: "stream-transfer",
      strict: true,
    };
    
    // βœ… Valid - worker: false doesn't require worker properties
    const config2: EngineConfig = {
      worker: false,
      wasm: true,
    };
    
    // ❌ Type Error - worker: false cannot have workerURL
    const config3: EngineConfig = {
      worker: false,
      workerURL: "./worker.js", // ❌ Type error!
    };

    3. EnginePresets Options Split

    EnginePresetOptions is now split into two interfaces for better type safety:

    Before:

    interface EnginePresetOptions {
      workerPool?: WorkerPool;
      workerURL?: string | URL;
      onFallback?: (info: EngineFallbackInfo) => void;
      arrayBufferThreshold?: number;
      // ...
    }
    
    EnginePresets.mainThread(options?: EnginePresetOptions)
    EnginePresets.fastest(options?: EnginePresetOptions)

    After:

    // For main thread presets (mainThread, wasm)
    interface MainThreadPresetOptions extends BasePresetOptions {
      // No worker-related options
    }
    
    // For worker-based presets (worker, fastest, balanced, etc.)
    interface WorkerPresetOptions extends BasePresetOptions {
      workerPool?: WorkerPool;
      workerURL?: string | URL;
      onFallback?: (info: EngineFallbackInfo) => void;
    }
    
    EnginePresets.mainThread(options?: MainThreadPresetOptions)
    EnginePresets.fastest(options?: WorkerPresetOptions)

    Migration:

    // Before: No type error, but logically incorrect
    EnginePresets.mainThread({ workerURL: "./worker.js" }); // Accepted but ignored
    
    // After: Type error prevents mistakes
    EnginePresets.mainThread({ workerURL: "./worker.js" }); // ❌ Type error!

    4. Transformer Constructor Changes

    Queuing strategy parameters changed from optional (?) to default parameters:

    Before:

    constructor(
      options?: CSVLexerTransformerOptions,
      writableStrategy?: QueuingStrategy<string>,
      readableStrategy?: QueuingStrategy<Token>
    )

    After:

    constructor(
      options: CSVLexerTransformerOptions = {},
      writableStrategy: QueuingStrategy<string> = DEFAULT_WRITABLE_STRATEGY,
      readableStrategy: QueuingStrategy<Token> = DEFAULT_READABLE_STRATEGY
    )

    Impact: This is technically a breaking change in the type signature, but functionally backward compatible since all parameters still have defaults. Existing code will continue to work without modifications.

    New Features

    1. Default Strategy Constants

    Default queuing strategies are now module-level constants using CountQueuingStrategy:

    // CSVLexerTransformer
    const DEFAULT_WRITABLE_STRATEGY: QueuingStrategy<string> = {
      highWaterMark: 65536,
      size: (chunk) => chunk.length,
    };
    const DEFAULT_READABLE_STRATEGY = new CountQueuingStrategy({
      highWaterMark: 1024,
    });
    
    // CSVRecordAssemblerTransformer
    const DEFAULT_WRITABLE_STRATEGY = new CountQueuingStrategy({
      highWaterMark: 1024,
    });
    const DEFAULT_READABLE_STRATEGY = new CountQueuingStrategy({
      highWaterMark: 256,
    });

    2. Type Tests

    Added comprehensive type tests in src/common/types.test-d.ts to validate the discriminated union behavior:

    // Validates type narrowing
    const config: EngineConfig = { worker: true };
    expectTypeOf(config).toExtend<WorkerEngineConfig>();
    
    // Validates property exclusion
    expectTypeOf<MainThreadEngineConfig>().not.toHaveProperty("workerURL");

    Migration Guide

    For TypeScript Users

    If you're passing EngineConfig objects explicitly typed, you may need to update:

    // Before: Could accidentally mix incompatible properties
    const config: EngineConfig = {
      worker: false,
      workerURL: "./worker.js", // Silently ignored
    };
    
    // After: TypeScript catches the mistake
    const config: EngineConfig = {
      worker: false,
      // workerURL: './worker.js'  // ❌ Type error - removed
    };

    For EnginePresets Users

    Update preset option types if explicitly typed:

    // Before
    const options: EnginePresetOptions = {
      workerPool: myPool,
    };
    EnginePresets.mainThread(options); // No error, but workerPool ignored
    
    // After
    const options: WorkerPresetOptions = {
      // or MainThreadPresetOptions
      workerPool: myPool,
    };
    EnginePresets.fastest(options); // βœ… Correct usage
    // EnginePresets.mainThread(options);  // ❌ Type error - use MainThreadPresetOptions

    For Transformer Users

    No code changes required. Existing usage continues to work:

    // Still works exactly as before
    new CSVLexerTransformer();
    new CSVLexerTransformer({ delimiter: "," });
    new CSVLexerTransformer({}, customWritable, customReadable);

    Benefits

    1. IDE Autocomplete: Better suggestions based on worker setting
    2. Type Safety: Prevents invalid property combinations
    3. Self-Documenting: Type system enforces valid configurations
    4. Catch Errors Early: TypeScript catches configuration mistakes at compile time
    5. Standards Compliance: Uses CountQueuingStrategy from Web Streams API
  • #608 24f04d7 Thanks @kamiazya! - refactor!: rename engine presets to clarify optimization targets

    This release improves the naming of engine presets to clearly indicate what each preset optimizes for. The new names focus on performance characteristics (stability, UI responsiveness, parse speed, memory efficiency) rather than implementation details.

    Breaking Changes

    Engine Preset Renaming

    Engine presets have been renamed to better communicate their optimization targets:

    - import { EnginePresets } from 'web-csv-toolbox';
    + import { EnginePresets } from 'web-csv-toolbox';
    
    - engine: EnginePresets.mainThread()
    + engine: EnginePresets.stable()
    
    - engine: EnginePresets.worker()
    + engine: EnginePresets.responsive()
    
    - engine: EnginePresets.workerStreamTransfer()
    + engine: EnginePresets.memoryEfficient()
    
    - engine: EnginePresets.wasm()
    + engine: EnginePresets.fast()
    
    - engine: EnginePresets.workerWasm()
    + engine: EnginePresets.responsiveFast()

    Optimization targets:

    Preset Optimizes For
    stable() Stability (uses only standard JavaScript APIs)
    responsive() UI responsiveness (non-blocking)
    memoryEfficient() Memory efficiency (zero-copy streams)
    fast() Parse speed (fastest execution time)
    responsiveFast() UI responsiveness + parse speed
    balanced() Balanced (general-purpose)

    Removed Presets

    Two presets have been removed:

    - engine: EnginePresets.fastest()
    + engine: EnginePresets.responsiveFast()
    
    - engine: EnginePresets.strict()
      // No replacement - limited use case

    Why removed:

    • fastest(): Misleading name - prioritized UI responsiveness over raw execution speed due to worker communication overhead
    • strict(): Limited use case - primarily for testing/debugging

    Improvements

    Clearer Performance Documentation

    Each preset now explicitly documents its performance characteristics:

    • Parse speed: How fast CSV parsing executes
    • UI responsiveness: Whether parsing blocks the main thread
    • Memory efficiency: Memory usage patterns
    • Stability: API stability level (Most Stable, Stable, Experimental)

    Trade-offs Transparency

    Documentation now clearly explains the trade-offs for each preset:

    // stable() - Most stable, blocks main thread
    // βœ… Most stable: Uses only standard JavaScript APIs
    // βœ… No worker communication overhead
    // ❌ Blocks main thread during parsing
    
    // responsive() - Non-blocking, stable
    // βœ… Non-blocking UI: Parsing runs in worker thread
    // ⚠️ Worker communication overhead
    
    // fast() - Fastest parse speed, blocks main thread
    // βœ… Fast parse speed: Compiled WASM code
    // βœ… No worker communication overhead
    // ❌ Blocks main thread
    // ❌ UTF-8 encoding only
    
    // responsiveFast() - Non-blocking + fast, stable
    // βœ… Non-blocking UI + fast parsing
    // ⚠️ Worker communication overhead
    // ❌ UTF-8 encoding only

    Migration Guide

    Quick Migration

    Replace old preset names with new names:

    1. mainThread() β†’ stable() - If you need maximum stability
    2. worker() β†’ responsive() - If you need non-blocking UI
    3. workerStreamTransfer() β†’ memoryEfficient() - If you need memory efficiency
    4. wasm() β†’ fast() - If you need fastest parse speed (and blocking is acceptable)
    5. workerWasm() β†’ responsiveFast() - If you need non-blocking UI + fast parsing
    6. fastest() β†’ responsiveFast() - Despite the name, this is the correct replacement
    7. strict() β†’ Remove - Or use custom config with strict: true

    Choosing the Right Preset

    By priority:

    • Stability first: stable() - Most stable, uses only standard JavaScript APIs
    • UI responsiveness first: responsive() or balanced() - Non-blocking execution
    • Parse speed first: fast() - Fastest execution time (blocks main thread)
    • General-purpose: balanced() - Balanced performance characteristics

    By use case:

    • Server-side parsing: stable() or fast() - Blocking acceptable
    • Browser with interactive UI: responsive() or balanced() - Non-blocking required
    • UTF-8 files only: fast() or responsiveFast() - WASM acceleration
    • Streaming large files: memoryEfficient() or balanced() - Constant memory usage

    Example Migration

    Before:

    import { parseString, EnginePresets } from "web-csv-toolbox";
    
    // Old: Unclear what "fastest" optimizes for
    for await (const record of parseString(csv, {
      engine: EnginePresets.fastest(),
    })) {
      console.log(record);
    }

    After:

    import { parseString, EnginePresets } from "web-csv-toolbox";
    
    // New: Clear that this optimizes for UI responsiveness + parse speed
    for await (const record of parseString(csv, {
      engine: EnginePresets.responsiveFast(),
    })) {
      console.log(record);
    }

    Documentation Updates

    All documentation has been updated to reflect the new preset names and include detailed performance characteristics, trade-offs, and use case guidance.

    See the Engine Presets Reference for complete documentation.

  • #608 24f04d7 Thanks @kamiazya! - Add experimental performance tuning options to Engine configuration: backpressureCheckInterval and queuingStrategy

    New Experimental Features

    Added advanced performance tuning options for fine-grained control over streaming behavior:

    engine.backpressureCheckInterval

    Controls how frequently the internal parsers check for backpressure during streaming operations (count-based).

    Default:

    {
      lexer: 100,      // Check every 100 tokens processed
      assembler: 10    // Check every 10 records processed
    }

    Trade-offs:

    • Lower values: More frequent backpressure checks, more responsive to downstream consumers
    • Higher values: Less frequent backpressure checks, reduced checking overhead

    Potential Use Cases:

    • Memory-constrained environments: Consider lower values for more responsive backpressure
    • Scenarios where checking overhead is a concern: Consider higher values
    • Slow consumers: Consider lower values to propagate backpressure more quickly

    engine.queuingStrategy

    Controls the internal queuing behavior of the CSV parser's streaming pipeline.

    Default: Designed to balance memory usage and buffering behavior

    Structure:

    {
      lexerWritable?: QueuingStrategy<string>;
      lexerReadable?: QueuingStrategy<Token>;
      assemblerWritable?: QueuingStrategy<Token>;
      assemblerReadable?: QueuingStrategy<CSVRecord<any>>;
    }

    Pipeline Stages:
    The CSV parser uses a two-stage pipeline:

    1. Lexer: String β†’ Token
    2. Assembler: Token β†’ CSVRecord

    Each stage has both writable (input) and readable (output) sides:

    1. lexerWritable - Lexer input (string chunks)
    2. lexerReadable - Lexer output (tokens)
    3. assemblerWritable - Assembler input (tokens from lexer)
    4. assemblerReadable - Assembler output (CSV records)

    Theoretical Trade-offs:

    • Small highWaterMark (1-10): Less memory for buffering, backpressure applied more quickly
    • Medium highWaterMark (default): Balanced memory and buffering
    • Large highWaterMark (100+): More memory for buffering, backpressure applied less frequently

    Note: Actual performance characteristics depend on your specific use case and runtime environment. Profile your application to determine optimal values.

    Potential Use Cases:

    • IoT/Embedded: Consider smaller highWaterMark for minimal memory footprint
    • Server-side batch processing: Consider larger highWaterMark for more buffering
    • Real-time streaming: Consider smaller highWaterMark for faster backpressure propagation

    Usage Examples

    Configuration Example: Tuning for Potential High-Throughput Scenarios

    import { parseString, EnginePresets } from "web-csv-toolbox";
    
    const config = EnginePresets.fastest({
      backpressureCheckInterval: {
        lexer: 200, // Check every 200 tokens (less frequent)
        assembler: 20, // Check every 20 records (less frequent)
      },
      queuingStrategy: {
        lexerReadable: new CountQueuingStrategy({ highWaterMark: 100 }),
        assemblerReadable: new CountQueuingStrategy({ highWaterMark: 50 }),
      },
    });
    
    for await (const record of parseString(csv, { engine: config })) {
      console.log(record);
    }

    Memory-Constrained Environment

    import { parseString, EnginePresets } from "web-csv-toolbox";
    
    const config = EnginePresets.balanced({
      backpressureCheckInterval: {
        lexer: 10, // Check every 10 tokens (frequent checks)
        assembler: 5, // Check every 5 records (frequent checks)
      },
      queuingStrategy: {
        // Minimal buffers throughout entire pipeline
        lexerWritable: new CountQueuingStrategy({ highWaterMark: 1 }),
        lexerReadable: new CountQueuingStrategy({ highWaterMark: 1 }),
        assemblerWritable: new CountQueuingStrategy({ highWaterMark: 1 }),
        assemblerReadable: new CountQueuingStrategy({ highWaterMark: 1 }),
      },
    });
    
    for await (const record of parseString(csv, { engine: config })) {
      console.log(record);
    }

    ⚠️ Experimental Status

    These APIs are marked as experimental and may change in future versions based on ongoing performance research. The default values are designed to work well for most use cases, but optimal values may vary depending on your specific environment and workload.

    Recommendation: Only adjust these settings if you're experiencing specific performance issues with large streaming operations or have specific memory/throughput requirements.

    Design Philosophy

    These options belong to engine configuration because they affect performance and behavior only, not the parsing result specification. This follows the design principle:

    • Top-level options: Affect specification (result changes)
    • Engine options: Affect performance/behavior (same result, different execution)
  • #608 24f04d7 Thanks @kamiazya! - feat: introduce "slim" entry point for optimized bundle size

    This release introduces a new slim entry point that significantly reduces bundle size by excluding the inlined WebAssembly binary.

    New Entry Points

    The package now offers two distinct entry points:

    1. Main (web-csv-toolbox): The default entry point.

      • Features: Zero-configuration, works out of the box.
      • Trade-off: Includes the WASM binary inlined as base64 (~110KB), resulting in a larger bundle size.
      • Best for: Prototyping, quick starts, or when bundle size is not a critical constraint.
    2. Slim (web-csv-toolbox/slim): The new optimized entry point.

      • Features: Smaller bundle size, streaming WASM loading.
      • Trade-off: Requires manual initialization of the WASM binary.
      • Best for: Production applications where bundle size and load performance are critical.

    How to use the "Slim" version

    When using the slim version, you must manually load the WASM binary before using any WASM-dependent features (like parseStringToArraySyncWASM or high-performance parsing presets).

    import { loadWASM, parseStringToArraySyncWASM } from "web-csv-toolbox/slim";
    // You need to provide the URL to the WASM file
    import wasmUrl from "web-csv-toolbox/csv.wasm?url";
    
    async function init() {
      // 1. Manually initialize WASM
      await loadWASM(wasmUrl);
    
      // 2. Now you can use WASM-powered functions
      const data = parseStringToArraySyncWASM("a,b,c\n1,2,3");
      console.log(data);
    }
    
    init();

    Worker Exports

    Corresponding worker exports are also available:

    • web-csv-toolbox/worker (Main)
    • web-csv-toolbox/worker/slim (Slim)
  • #608 24f04d7 Thanks @kamiazya! - feat!: add Parser models and streams with improved architecture

    Summary

    This release introduces a new Parser layer that composes Lexer and Assembler components, providing a cleaner architecture and improved streaming support. The implementation follows the design patterns established by the recently developed CSVObjectRecordAssembler and CSVArrayRecordAssembler.

    New Features

    Parser Models

    FlexibleStringCSVParser

    • Composes FlexibleStringCSVLexer and CSV Record Assembler
    • Stateful parser for string CSV data
    • Supports both object and array output formats
    • Streaming mode support via parse(chunk, { stream: true })
    • Full options support (delimiter, quotation, columnCountStrategy, etc.)

    FlexibleBinaryCSVParser

    • Composes TextDecoder with FlexibleStringCSVParser
    • Accepts any BufferSource (Uint8Array, ArrayBuffer, or other TypedArray views)
    • Uses TextDecoder with stream: true option for proper streaming
    • Supports multiple character encodings (utf-8, shift_jis, etc.)
    • BOM handling via ignoreBOM option
    • Fatal error mode via fatal option

    Factory Functions

    • createStringCSVParser() - Creates FlexibleStringCSVParser instances
    • createBinaryCSVParser() - Creates FlexibleBinaryCSVParser instances

    Stream Classes

    StringCSVParserStream

    • TransformStream<string, CSVRecord> for streaming string parsing
    • Wraps Parser instances (not constructing internally)
    • Configurable backpressure handling
    • Custom queuing strategies support
    • Follows existing CSVLexerTransformer pattern

    BinaryCSVParserStream

    • TransformStream<BufferSource, CSVRecord> for streaming binary parsing
    • Accepts any BufferSource (Uint8Array, ArrayBuffer, or other TypedArray views)
    • Handles UTF-8 multi-byte characters across chunk boundaries
    • Integration-ready for fetch API and file streaming
    • Backpressure management with configurable check intervals

    Breaking Changes

    Object Format Behavior (Reverted)

    While initially explored, the final implementation maintains the existing behavior:

    • Empty fields (,value,): Filled with ""
    • Missing fields (short rows): Remain as undefined

    This preserves backward compatibility and allows users to distinguish between explicitly empty fields and missing fields.

    Array Format Behavior (No Change)

    • Empty fields: Filled with ""
    • Missing fields with columnCountStrategy: 'pad': Filled with undefined

    Public API Exports (common.ts)

    Added exports for:

    • FlexibleStringCSVParser
    • FlexibleBinaryCSVParser
    • createStringCSVParser
    • createBinaryCSVParser
    • StringCSVParserStream
    • BinaryCSVParserStream

    Architecture Improvements

    Composition Over Implementation

    • Parsers compose Lexer + Assembler instead of reimplementing
    • Reduces code duplication across the codebase
    • Easier to maintain and extend

    Streaming Support

    • TextDecoder with stream: true for proper multi-byte character handling
    • Backpressure handling in Stream classes
    • Configurable check intervals for performance tuning

    Type Safety

    • Maintains full TypeScript strict mode compliance
    • Generic type parameters for header types
    • Proper CSVRecord type inference based on outputFormat

    Migration Guide

    For Users of Existing APIs

    No changes required. All existing functions (parseString, parseBinary, etc.) continue to work as before.

    For Direct Lexer/Assembler Users

    Consider migrating to Parser classes for simplified usage:

    // Before (manual composition)
    const lexer = new FlexibleStringCSVLexer(options);
    const assembler = createCSVRecordAssembler(options);
    const tokens = lexer.lex(csv);
    const records = Array.from(assembler.assemble(tokens));
    
    // After (using Parser)
    const parser = new FlexibleStringCSVParser(options);
    const records = parser.parse(csv);

    For Stream Users

    New stream classes provide cleaner API:

    // String streaming
    const parser = new FlexibleStringCSVParser({ header: ["name", "age"] });
    const stream = new StringCSVParserStream(parser);
    
    await fetch("data.csv")
      .then((res) => res.body)
      .pipeThrough(new TextDecoderStream())
      .pipeThrough(stream)
      .pipeTo(yourProcessor);
    
    // Binary streaming
    const parser = new FlexibleBinaryCSVParser({ header: ["name", "age"] });
    const stream = new BinaryCSVParserStream(parser);
    
    await fetch("data.csv")
      .then((res) => res.body)
      .pipeThrough(stream)
      .pipeTo(yourProcessor);

    Performance Considerations

    • Backpressure check interval defaults to 100 records
    • Writable side: 64KB highWaterMark (byte/character counting)
    • Readable side: 256 records highWaterMark
    • Configurable via queuing strategies

    Documentation

    All new classes include comprehensive JSDoc documentation with:

    • Usage examples
    • Parameter descriptions
    • Return type documentation
    • Remarks on streaming behavior
    • Performance characteristics
  • #608 24f04d7 Thanks @kamiazya! - feat!: add array output format support for CSV parsing

    CSV parsing results can now be returned as arrays in addition to objects, with TypeScript Named Tuple support for type-safe column access.

    New Features

    Array Output Format

    Parse CSV data into arrays instead of objects using the outputFormat option:

    import { parseString } from "web-csv-toolbox";
    
    const csv = `name,age,city
    Alice,30,Tokyo
    Bob,25,Osaka`;
    
    // Array output (new)
    for await (const record of parseString(csv, { outputFormat: "array" })) {
      console.log(record); // ['Alice', '30', 'Tokyo']
      console.log(record[0]); // 'Alice' - type-safe access with Named Tuples
    }
    
    // Object output (default, unchanged)
    for await (const record of parseString(csv)) {
      console.log(record); // { name: 'Alice', age: '30', city: 'Tokyo' }
    }

    Named Tuple Type Support

    When headers are provided, array output uses TypeScript Named Tuples for type-safe access:

    const csv = `name,age
    Alice,30`;
    
    for await (const record of parseString(csv, { outputFormat: "array" })) {
      // record type: { readonly [K in keyof ['name', 'age']]: string }
      // Equivalent to: { readonly 0: string, readonly 1: string, readonly length: 2 }
      console.log(record[0]); // Type-safe: 'Alice'
      console.log(record.length); // 2
    }

    Include Header Option

    Include the header row in the output (array format only):

    for await (const record of parseString(csv, {
      outputFormat: "array",
      includeHeader: true,
    })) {
      console.log(record);
    }
    // ['name', 'age', 'city']  ← Header row
    // ['Alice', '30', 'Tokyo']
    // ['Bob', '25', 'Osaka']

    Column Count Strategy

    Control how mismatched column counts are handled (array format with header):

    const csv = `name,age,city
    Alice,30        // Missing 'city'
    Bob,25,Osaka,JP // Extra column`;
    
    // Strategy: 'pad' - Pad short rows with undefined, truncate long rows
    for await (const record of parseString(csv, {
      outputFormat: "array",
      columnCountStrategy: "pad",
    })) {
      console.log(record);
    }
    // ['Alice', '30', undefined]
    // ['Bob', '25', 'Osaka']
    
    // Strategy: 'strict' - Throw error on mismatch
    // Strategy: 'truncate' - Truncate long rows, keep short rows as-is
    // Strategy: 'keep' - Keep all columns as-is (default)

    Available strategies:

    • 'keep' (default): Return rows as-is, regardless of header length
    • 'pad': Pad short rows with undefined, truncate long rows to header length
    • 'strict': Throw ParseError if row length doesn't match header length
    • 'truncate': Truncate long rows to header length, keep short rows as-is

    Breaking Changes

    CSVRecordAssembler Interface Separation

    For better Rust/WASM implementation, the CSVRecordAssembler interface has been separated:

    • CSVObjectRecordAssembler<Header> - For object format output
    • CSVArrayRecordAssembler<Header> - For array format output

    The unified CSVRecordAssembler<Header, Format> type remains as a deprecated type alias for backward compatibility.

    New specialized classes:

    import {
      FlexibleCSVObjectRecordAssembler,
      FlexibleCSVArrayRecordAssembler,
      createCSVRecordAssembler,
    } from "web-csv-toolbox";
    
    // Option 1: Factory function (recommended)
    const assembler = createCSVRecordAssembler({
      outputFormat: "array",
      includeHeader: true,
    });
    
    // Option 2: Specialized class for object output
    const objectAssembler = new FlexibleCSVObjectRecordAssembler({
      header: ["name", "age"],
    });
    
    // Option 3: Specialized class for array output
    const arrayAssembler = new FlexibleCSVArrayRecordAssembler({
      header: ["name", "age"],
      columnCountStrategy: "strict",
    });

    Type structure:

    // Before
    type CSVRecordAssembler<Header, Format> = {
      assemble(tokens): IterableIterator<CSVRecord<Header, Format>>;
    };
    
    // After
    interface CSVObjectRecordAssembler<Header> {
      assemble(tokens): IterableIterator<CSVObjectRecord<Header>>;
    }
    
    interface CSVArrayRecordAssembler<Header> {
      assemble(tokens): IterableIterator<CSVArrayRecord<Header>>;
    }
    
    // Deprecated type alias (backward compatibility)
    type CSVRecordAssembler<Header, Format> = Format extends "array"
      ? CSVArrayRecordAssembler<Header>
      : CSVObjectRecordAssembler<Header>;

    Migration Guide

    For Most Users

    No changes required. All existing code continues to work:

    // Existing code works without changes
    for await (const record of parseString(csv)) {
      console.log(record); // Still returns objects by default
    }

    Using New Array Output Format

    Simply add the outputFormat option:

    // New: Array output
    for await (const record of parseString(csv, { outputFormat: "array" })) {
      console.log(record); // Returns arrays
    }

    For Advanced Users Using Low-Level APIs

    The existing FlexibleCSVRecordAssembler class continues to work. Optionally migrate to specialized classes:

    // Option 1: Continue using FlexibleCSVRecordAssembler (no changes needed)
    const assembler = new FlexibleCSVRecordAssembler({ outputFormat: "array" });
    
    // Option 2: Use factory function (recommended)
    const assembler = createCSVRecordAssembler({ outputFormat: "array" });
    
    // Option 3: Use specialized classes directly
    const assembler = new FlexibleCSVArrayRecordAssembler({
      header: ["name", "age"],
      columnCountStrategy: "pad",
    });

    Use Cases

    Machine Learning / Data Science

    // Easily convert CSV to training data arrays
    const features = [];
    for await (const record of parseString(csv, { outputFormat: "array" })) {
      features.push(record.map(Number));
    }

    Headerless CSV Files

    const csv = `Alice,30,Tokyo
    Bob,25,Osaka`;
    
    for await (const record of parseString(csv, {
      outputFormat: "array",
      header: [], // Headerless
    })) {
      console.log(record); // ['Alice', '30', 'Tokyo']
    }

    Type-Safe Column Access

    const csv = `name,age,city
    Alice,30,Tokyo`;
    
    for await (const record of parseString(csv, { outputFormat: "array" })) {
      // TypeScript knows the tuple structure
      const name: string = record[0]; // Type-safe
      const age: string = record[1]; // Type-safe
      const city: string = record[2]; // Type-safe
    }

    Benefits

    • Memory efficiency: Arrays use less memory than objects for large datasets
    • Type safety: Named Tuples provide compile-time type checking
    • Flexibility: Choose output format based on your use case
    • Compatibility: Easier integration with ML libraries and data processing pipelines
    • Better Rust/WASM support: Separated interfaces simplify native implementation
  • #608 24f04d7 Thanks @kamiazya! - refactor!: rename core classes and simplify type system

    This release contains breaking changes for users of low-level APIs. Most users are not affected.

    Breaking Changes

    1. Class Naming

    Low-level CSV processing classes have been renamed:

    - import { CSVLexer } from 'web-csv-toolbox';
    + import { FlexibleStringCSVLexer } from 'web-csv-toolbox';
    
    - const lexer = new CSVLexer(options);
    + const lexer = new FlexibleStringCSVLexer(options);

    For CSV record assembly, use the factory function or specialized classes:

    - import { CSVRecordAssembler } from 'web-csv-toolbox';
    + import { createCSVRecordAssembler, FlexibleCSVObjectRecordAssembler, FlexibleCSVArrayRecordAssembler } from 'web-csv-toolbox';
    
    - const assembler = new CSVRecordAssembler(options);
    + // Option 1: Use factory function (recommended)
    + const assembler = createCSVRecordAssembler({ outputFormat: 'object', ...options });
    +
    + // Option 2: Use specialized class directly
    + const assembler = new FlexibleCSVObjectRecordAssembler(options);

    2. Type Renaming

    The CSV type has been renamed to CSVData:

    - import type { CSV } from 'web-csv-toolbox';
    + import type { CSVData } from 'web-csv-toolbox';
    
    - function processCSV(data: CSV) {
    + function processCSV(data: CSVData) {
        // ...
      }

    Bug Fixes

    • Fixed stream reader locks not being released when AbortSignal was triggered
    • Fixed Node.js WASM module loading
    • Improved error handling

    Migration Guide

    For most users: No changes required if you only use high-level functions like parse(), parseString(), parseBlob(), etc.

    For advanced users using low-level APIs:

    1. Rename CSV type to CSVData
    2. Rename CSVLexer to FlexibleStringCSVLexer
    3. Replace CSVRecordAssembler with createCSVRecordAssembler() factory function or specialized classes (FlexibleCSVObjectRecordAssembler / FlexibleCSVArrayRecordAssembler)

Patch Changes

  • #608 24f04d7 Thanks @kamiazya! - Consolidate and enhance benchmark suite

    This changeset focuses on benchmark organization and expansion:

    Benchmark Consolidation:

    • Integrated 3 separate benchmark files (concurrent-performance.ts, queuing-strategy.bench.ts, worker-performance.ts) into main.ts
    • Unified benchmark suite now contains 57 comprehensive tests
    • Added conditional Worker support for Node.js vs browser environments

    API Migration:

    • Migrated from deprecated { execution: ['worker'] } API to new EnginePresets API
    • Added tests for all engine presets: mainThread, wasm, worker, workerStreamTransfer, workerWasm, balanced, fastest, strict

    Bottleneck Detection:

    • Added 23 new benchmarks for systematic bottleneck detection:
      • Row count scaling (50-5000 rows)
      • Field length scaling (10 chars - 10KB)
      • Quote ratio impact (0%-100%)
      • Column count scaling (10-10,000 columns)
      • Line ending comparison (LF vs CRLF)
      • Engine comparison at different scales

    Documentation Scenario Coverage:

    • Added benchmarks for all scenarios mentioned in documentation
    • Included WASM performance tests
    • Added custom delimiter tests
    • Added parseStringStream tests
    • Added data transformation overhead tests

    Key Findings:

    • Column count is the most critical bottleneck (99.7% slower at 10k columns)
    • Field length has non-linear behavior at 1KB threshold
    • WASM advantage increases with data size (+18% β†’ +32%)
    • Quote processing overhead is minimal (1.1-10% depending on scale)
  • #608 24f04d7 Thanks @kamiazya! - fix: add charset validation to prevent malicious Content-Type header manipulation

    This patch addresses a security vulnerability where malicious or invalid charset values in Content-Type headers could cause parsing failures or unexpected behavior.

    Changes:

    • Fixed parseMime to handle Content-Type parameters without values (prevents undefined.trim() errors)
    • Added charset validation similar to existing compression validation pattern
    • Created SUPPORTED_CHARSETS constants for commonly used character encodings
    • Added allowNonStandardCharsets option to BinaryOptions for opt-in support of non-standard charsets
    • Added error handling in convertBinaryToString to catch TextDecoder instantiation failures
    • Charset values are now validated against a whitelist and normalized to lowercase

    Security Impact:

    • Invalid or malicious charset values are now rejected with clear error messages
    • Prevents DoS attacks via malformed Content-Type headers
    • Reduces risk of charset-based injection attacks

    Breaking Changes: None - existing valid charset values continue to work as before.

  • #608 24f04d7 Thanks @kamiazya! - Add bundler integration guide for Workers and WebAssembly

    This release adds comprehensive documentation for using web-csv-toolbox with modern JavaScript bundlers (Vite, Webpack, Rollup) when using Worker-based or WebAssembly execution.

    Package Structure Improvements:

    • Moved worker files to root level for cleaner package exports
      • src/execution/worker/helpers/worker.{node,web}.ts β†’ src/worker.{node,web}.ts
    • Added ./worker export with environment-specific resolution (node/browser/default)
    • Added ./web_csv_toolbox_wasm_bg.wasm export for explicit WASM file access
    • Updated internal relative paths in createWorker.{node,web}.ts to reflect new structure

    New Documentation:

    • How-to Guide: Use with Bundlers - Step-by-step configuration for Vite, Webpack, and Rollup

      • Worker configuration with ?url imports
      • WASM configuration with explicit URL handling
      • WorkerPool reuse patterns
      • Common issues and troubleshooting
    • Explanation: Package Exports - Deep dive into environment detection mechanism

      • Conditional exports for node/browser environments
      • Worker implementation differences
      • Bundler compatibility
    • Reference: Package Exports - API reference for all package exports

      • Export paths and their resolutions
      • Conditional export conditions

    Updated Documentation:

    Added bundler usage notes to all Worker and WASM-related documentation:

    • README.md
    • docs/explanation/execution-strategies.md
    • docs/explanation/worker-pool-architecture.md
    • docs/how-to-guides/choosing-the-right-api.md
    • docs/how-to-guides/wasm-performance-optimization.md

    Key Differences: Workers vs WASM with Bundlers

    Workers 🟒:

    • Bundled automatically as data URLs using ?url suffix
    • Works out of the box with Vite
    • Example: import workerUrl from 'web-csv-toolbox/worker?url'

    WASM 🟑:

    • Requires explicit URL configuration via ?url import
    • Must call loadWASM(wasmUrl) before parsing
    • Example: import wasmUrl from 'web-csv-toolbox/web_csv_toolbox_wasm_bg.wasm?url'
    • Alternative: Copy WASM file to public directory

    Migration Guide:

    For users already using Workers with bundlers, no changes are required. The package now explicitly documents the workerURL option that was previously implicit.

    For new users, follow the bundler integration guide:

    import { parseString, EnginePresets } from "web-csv-toolbox";
    import workerUrl from "web-csv-toolbox/worker?url"; // Vite
    
    for await (const record of parseString(csv, {
      engine: EnginePresets.worker({ workerURL: workerUrl }),
    })) {
      console.log(record);
    }

    Breaking Changes:

    None - this is purely additive documentation and package export improvements. Existing code continues to work without modifications.

  • #608 24f04d7 Thanks @kamiazya! - Refactor CI workflows to separate TypeScript and Rust environments

    This change improves CI efficiency by:

    • Splitting setup actions into setup-typescript, setup-rust, and setup-full
    • Separating WASM build and TypeScript build jobs with clear dependencies
    • Removing unnecessary tool installations from jobs that don't need them
    • Clarifying dependencies between TypeScript tests and WASM artifacts
  • #608 24f04d7 Thanks @kamiazya! - chore: eliminate circular dependencies and improve code quality

    This patch improves the internal code structure by eliminating all circular dependencies and adding tooling to prevent future issues.

    Changes:

    • Introduced madge for circular dependency detection and visualization
    • Eliminated circular dependencies:
      • common/types.ts ⇄ utils/types.ts: Merged type definitions into common/types.ts
      • parseFile.ts ⇄ parseFileToArray.ts: Refactored to use direct dependencies
    • Fixed import paths in test files to consistently use .ts extension
    • Added npm scripts for dependency analysis:
      • check:circular: Detect circular dependencies
      • graph:main: Visualize main entry point dependencies
      • graph:worker: Visualize worker entry point dependencies
      • graph:json, graph:summary, graph:orphans, graph:leaves: Various analysis tools
    • Added circular dependency check to CI pipeline (.github/workflows/.build.yaml)
    • Updated .gitignore to exclude generated dependency graph files

    Impact:

    • No runtime behavior changes
    • Better maintainability and code structure
    • Faster build times due to cleaner dependency graph
    • Automated prevention of circular dependency introduction

    Breaking Changes: None - this is purely an internal refactoring with no API changes.

  • #608 24f04d7 Thanks @kamiazya! - docs: comprehensive documentation update and new examples

    This release brings significant improvements to the documentation and examples, making it easier to get started and use advanced features.

    New Examples

    Added comprehensive example projects for various environments and bundlers:

    • Deno: examples/deno-main, examples/deno-slim
    • Node.js: examples/node-main, examples/node-slim, examples/node-worker-main
    • Vite: examples/vite-bundle-main, examples/vite-bundle-slim, examples/vite-bundle-worker-main, examples/vite-bundle-worker-slim
    • Webpack: examples/webpack-bundle-worker-main, examples/webpack-bundle-worker-slim

    These examples demonstrate:

    • How to use the new slim entry point
    • Worker integration with different bundlers
    • Configuration for Vite and Webpack
    • TypeScript setup

    Documentation Improvements

    • Engine Presets: Detailed guide on choosing the right engine preset for your use case
    • Main vs Slim: Explanation of the trade-offs between the main (auto-init) and slim (manual-init) entry points
    • WASM Architecture: Updated architecture documentation reflecting the new module structure
    • Performance Guide: Improved guide on optimizing performance with WASM and Workers
  • #608 24f04d7 Thanks @kamiazya! - Expand browser testing coverage and improve documentation

    Testing Infrastructure Improvements:

    • macOS Browser Testing: Added Chrome and Firefox testing on macOS in CI/CD
      • Vitest 4 stable browser mode enabled headless testing on macOS
      • Previously blocked due to Safari headless limitations
    • Parallel Browser Execution: Multiple browsers now run in parallel within each OS job
      • Linux: Chrome + Firefox in parallel
      • macOS: Chrome + Firefox in parallel
      • Windows: Chrome + Firefox + Edge in parallel
    • Dynamic Browser Configuration: Browser instances automatically determined by platform
      • Uses process.platform to select appropriate browsers
      • Eliminates need for environment variables
    • Explicit Browser Project Targeting: Updated test:browser script to explicitly run only browser tests
      • Added --project browser flag to prevent running Node.js tests during browser test execution
      • Ensures CI jobs run only their intended test suites

    Documentation Improvements:

    • Quick Overview Section: Added comprehensive support matrix and metrics
      • Visual support matrix showing all environment/platform combinations
      • Tier summary with coverage statistics
      • Testing coverage breakdown by category
      • Clear legend explaining all support status icons
    • Clearer Support Tiers: Improved distinction between support levels
      • βœ… Full Support (Tier 1): Tested and officially supported
      • 🟑 Active Support (Tier 2): Limited testing, active maintenance
      • πŸ”΅ Community Support (Tier 3): Not tested, best-effort support
    • Cross-Platform Runtime Support: Clarified Node.js and Deno support across all platforms
      • Node.js LTS: Tier 1 support on Linux, macOS, and Windows
      • Deno LTS: Tier 2 support on Linux, macOS, and Windows
      • Testing performed on Linux only due to cross-platform runtime design
      • Eliminates unnecessary concern about untested platforms
    • Simplified Tables: Converted redundant tables to concise bullet lists
      • Removed repetitive "Full Support" entries
      • Easier to scan and understand

    Browser Testing Coverage:

    • Chrome: Tested on Linux, macOS, and Windows (Tier 1)
    • Firefox: Tested on Linux, macOS, and Windows (Tier 1)
    • Edge: Tested on Windows only (Tier 1)
    • Safari: Community support (headless mode not supported by Vitest)

    Breaking Changes:

    None - this release only improves testing infrastructure and documentation.

  • #608 24f04d7 Thanks @kamiazya! - Add regression tests and documentation for prototype pollution safety

    This changeset adds comprehensive tests and documentation to ensure that CSVRecordAssembler does not cause prototype pollution when processing CSV headers with dangerous property names.

    Security Verification:

    • Verified that Object.fromEntries() is safe from prototype pollution attacks
    • Confirmed that dangerous property names (__proto__, constructor, prototype) are handled safely
    • Added 8 comprehensive regression tests in FlexibleCSVRecordAssembler.prototype-safety.test.ts

    Test Coverage:

    • Tests with __proto__ as CSV header
    • Tests with constructor as CSV header
    • Tests with prototype as CSV header
    • Tests with multiple dangerous property names
    • Tests with multiple records
    • Tests with quoted fields
    • Baseline tests documenting Object.fromEntries() behavior

    Documentation:

    • Added detailed safety comments to all Object.fromEntries() usage in CSVRecordAssembler
    • Documented why the implementation is safe from prototype pollution
    • Added references to regression tests for verification

    Conclusion:
    The AI security report suggesting prototype pollution vulnerability was a false positive. Object.fromEntries() creates own properties (not prototype properties), making it inherently safe from prototype pollution attacks. This changeset provides regression tests to prevent future concerns and documents the safety guarantees.

  • #608 24f04d7 Thanks @kamiazya! - Improve Rust/WASM development environment and add comprehensive tests

    Internal Improvements

    • Migrated from Homebrew Rust to rustup for better toolchain management
    • Updated Rust dependencies to latest versions (csv 1.4, wasm-bindgen 0.2.105, serde 1.0.228)
    • Added 10 comprehensive unit tests for CSV parsing functionality
    • Added Criterion-based benchmarks for performance tracking
    • Improved error handling in WASM bindings
    • Configured rust-analyzer and development tools (rustfmt, clippy)
    • Added pkg/ directory to .gitignore (build artifacts should not be tracked)
    • Added Rust tests to CI pipeline (GitHub Actions Dynamic Tests workflow)
    • Integrated Rust coverage with Codecov (separate from TypeScript with rust flag)
    • Integrated Rust benchmarks with CodSpeed for performance regression detection

    These changes improve code quality and maintainability without affecting the public API or functionality.

  • #608 24f04d7 Thanks @kamiazya! - chore: upgrade Biome to 2.3.4 and update configuration

    Upgraded development dependency @biomejs/biome from 1.9.4 to 2.3.4 and updated configuration for compatibility with Biome v2. This change has no impact on the runtime behavior or public API.

  • #608 24f04d7 Thanks @kamiazya! - chore: upgrade TypeScript to 5.9.3 and typedoc to 0.28.14 with enhanced documentation

    Developer Experience Improvements:

    • Upgraded TypeScript from 5.8.3 to 5.9.3
    • Upgraded typedoc from 0.28.5 to 0.28.14
    • Enabled strict type checking options (noUncheckedIndexedAccess, exactOptionalPropertyTypes)
    • Enhanced TypeDoc configuration with version display, improved sorting, and navigation
    • Integrated all documentation markdown files with TypeDoc using native projectDocuments support
    • Added YAML frontmatter to all documentation files for better organization

    Type Safety Enhancements:

    • Added explicit | undefined to all optional properties for stricter type checking
    • Added proper undefined checks for array/object indexed access
    • Improved TextDecoderOptions usage to avoid explicit undefined values

    Documentation Improvements:

    • Enhanced TypeDoc navigation with categories, groups, and folders
    • Added sidebar and navigation links to GitHub and npm
    • Organized documentation into Tutorials, How-to Guides, Explanation, and Reference sections
    • Improved documentation discoverability with YAML frontmatter grouping

    Breaking Changes: None - all changes are backward compatible

  • #608 24f04d7 Thanks @kamiazya! - feat(wasm): add input size validation and source option for error reporting

    This patch enhances the WASM CSV parser with security improvements and better error reporting capabilities.

    Security Enhancements:

    • Input Size Validation: Added validation to prevent memory exhaustion attacks
      • Validates CSV input size against maxBufferSize parameter before processing
      • Returns clear error message when size limit is exceeded
      • Default limit: 10MB (configurable via TypeScript options)
      • Addresses potential DoS vulnerability from maliciously large CSV inputs

    Error Reporting Improvements:

    • Source Option: Added optional source parameter for better error context
      • Allows specifying a source identifier (e.g., filename) in error messages
      • Error format: "Error message in \"filename\""
      • Significantly improves debugging when processing multiple CSV files
      • Aligns with TypeScript implementation's CommonOptions.source

    Performance Optimizations:

    • Optimized format_error() to take ownership of String
      • Avoids unnecessary allocation when source is None
      • Improves error path performance by eliminating to_string() call
      • Zero-cost abstraction in the common case (no source identifier)

    Code Quality Improvements:

    • Used bool::then_some() for more idiomatic Option handling
    • Fixed Clippy needless_borrow warnings in tests
    • Applied cargo fmt formatting for consistency

    Implementation Details:

    Rust (web-csv-toolbox-wasm/src/lib.rs):

    • Added format_error() helper function for consistent error formatting
    • Updated parse_csv_to_json() to accept max_buffer_size and source parameters
    • Implemented input size validation at parse entry point
    • Applied source context to all error types (headers, records, JSON serialization)

    TypeScript (src/parseStringToArraySyncWASM.ts):

    • Updated to pass maxBufferSize from options to WASM function
    • Updated to pass source from options to WASM function

    Breaking Changes: None - this is a backward-compatible enhancement with sensible defaults.

    Migration: No action required. Existing code continues to work without modification.