Minor Changes
-
#608
24f04d7Thanks @kamiazya! - feat!: rename binary stream APIs for consistency and add BufferSource supportSummary
This release standardizes the naming of binary stream parsing APIs to match the existing
parseBinary*family, and extends support to accept any BufferSource type (ArrayBuffer, Uint8Array, and other TypedArray views).Breaking Changes
API Renaming for Consistency
All
parseUint8Array*functions have been renamed toparseBinary*to maintain consistency with existing binary parsing APIs:Function Names:
parseUint8ArrayStream()βparseBinaryStream()parseUint8ArrayStreamToStream()βparseBinaryStreamToStream()
Type Names:
ParseUint8ArrayStreamOptionsβParseBinaryStreamOptions
Internal Functions (for reference):
parseUint8ArrayStreamInMain()βparseBinaryStreamInMain()parseUint8ArrayStreamInWorker()βparseBinaryStreamInWorker()parseUint8ArrayStreamInWorkerWASM()βparseBinaryStreamInWorkerWASM()
Rationale:
The previous naming was inconsistent with the rest of the binary API family (parseBinary,parseBinaryToArraySync,parseBinaryToIterableIterator,parseBinaryToStream). The new naming provides:- Perfect consistency across all binary parsing APIs
- Clear indication that these functions accept any binary data format
- Better predictability for API discovery
BufferSource Support
FlexibleBinaryCSVParserandBinaryCSVParserStreamnow acceptBufferSource(=ArrayBuffer | ArrayBufferView) instead of justUint8Array:Before:
const parser = new FlexibleBinaryCSVParser({ header: ['name', 'age'] }); const data = new Uint8Array([...]); // Only Uint8Array const records = parser.parse(data);
After:
const parser = new FlexibleBinaryCSVParser({ header: ['name', 'age'] }); // Uint8Array still works const uint8Data = new Uint8Array([...]); const records1 = parser.parse(uint8Data); // ArrayBuffer now works directly const buffer = await fetch('data.csv').then(r => r.arrayBuffer()); const records2 = parser.parse(buffer); // Other TypedArray views also work const int8Data = new Int8Array([...]); const records3 = parser.parse(int8Data);
Benefits:
- Direct use of
fetch().then(r => r.arrayBuffer())without conversion - Flexibility to work with any TypedArray view
- Alignment with Web API standards (BufferSource is widely used)
Migration Guide
Automatic Migration
Use find-and-replace in your codebase:
# Function calls parseUint8ArrayStream β parseBinaryStream parseUint8ArrayStreamToStream β parseBinaryStreamToStream # Type references ParseUint8ArrayStreamOptions β ParseBinaryStreamOptions
TypeScript Users
If you were explicitly typing with
Uint8Array, you can now use the more generalBufferSource:// Before function processCSV(data: Uint8Array) { return parseBinaryStream(data); } // After (more flexible) function processCSV(data: BufferSource) { return parseBinaryStream(data); }
Updated API Consistency
All binary parsing APIs now follow a consistent naming pattern:
// Single-value binary data parseBinary(); // Binary β AsyncIterableIterator<Record> parseBinaryToArraySync(); // Binary β Array<Record> (sync) parseBinaryToIterableIterator(); // Binary β IterableIterator<Record> parseBinaryToStream(); // Binary β ReadableStream<Record> // Streaming binary data parseBinaryStream(); // ReadableStream<Uint8Array> β AsyncIterableIterator<Record> parseBinaryStreamToStream(); // ReadableStream<Uint8Array> β ReadableStream<Record>
Note: While the stream input type remains
ReadableStream<Uint8Array>(Web Streams API standard), the internal parsers now acceptBufferSourcefor individual chunks.Documentation Updates
README.md
- Updated Low-level APIs section to reflect
parseBinaryStream*naming - Added flush procedure documentation for streaming mode
- Added BufferSource examples
API Reference (docs/reference/package-exports.md)
- Added comprehensive Low-level API Reference section
- Documented all Parser Models (Tier 1) and Lexer + Assembler (Tier 2)
- Included usage examples and code snippets
Architecture Guide (docs/explanation/parsing-architecture.md)
- Updated Binary CSV Parser section to document BufferSource support
- Added detailed streaming mode examples with flush procedures
- Clarified multi-byte character handling across chunk boundaries
Flush Procedure Clarification
Documentation now explicitly covers the requirement to call
parse()without arguments when using streaming mode:const parser = createBinaryCSVParser({ header: ["name", "age"] }); const encoder = new TextEncoder(); // Process chunks const records1 = parser.parse(encoder.encode("Alice,30\nBob,"), { stream: true, }); const records2 = parser.parse(encoder.encode("25\n"), { stream: true }); // IMPORTANT: Flush remaining data (required!) const records3 = parser.parse();
This prevents data loss from incomplete records or multi-byte character buffers.
Type Safety
All changes maintain full TypeScript strict mode compliance with proper type inference and generic constraints.
-
#608
24f04d7Thanks @kamiazya! - AddarrayBufferThresholdoption to Engine configuration for automatic Blob reading strategy selectionNew Feature
Added
engine.arrayBufferThresholdoption that automatically selects the optimal Blob reading strategy based on file size:- Files smaller than threshold: Use
blob.arrayBuffer()+parseBinary()(6-8x faster, confirmed by benchmarks) - Files equal to or larger than threshold: Use
blob.stream()+parseBinaryStream()(memory-efficient)
Default: 1MB (1,048,576 bytes), determined by comprehensive benchmarks
Applies to:
parseBlob()andparseFile()onlyBenchmark Results
File Size Binary (ops/sec) Stream (ops/sec) Performance Gain 1KB 21,691 2,685 8.08x faster 10KB 2,187 311 7.03x faster 100KB 219 32 6.84x faster 1MB 20 3 6.67x faster Usage
import { parseBlob, EnginePresets } from "web-csv-toolbox"; // Use default (1MB threshold) for await (const record of parseBlob(file)) { console.log(record); } // Always use streaming (memory-efficient) for await (const record of parseBlob(largeFile, { engine: { arrayBufferThreshold: 0 }, })) { console.log(record); } // Custom threshold (512KB) for await (const record of parseBlob(file, { engine: { arrayBufferThreshold: 512 * 1024 }, })) { console.log(record); } // With preset for await (const record of parseBlob(file, { engine: EnginePresets.fastest({ arrayBufferThreshold: 2 * 1024 * 1024, // 2MB }), })) { console.log(record); }
Special Values
0- Always use streaming (maximum memory efficiency)Infinity- Always use arrayBuffer (maximum performance for small files)
Security Note
When using
arrayBufferThreshold > 0, files must stay belowmaxBufferSize(default 10MB) to prevent excessive memory allocation. Files exceeding this limit will throw aRangeError.Design Philosophy
This option belongs to
engineconfiguration because it affects performance and behavior only, not the parsing result specification. This follows the design principle:- Top-level options: Affect specification (result changes)
- Engine options: Affect performance/behavior (same result, different execution)
- Files smaller than threshold: Use
-
#608
24f04d7Thanks @kamiazya! - Add support for Blob, File, and Request objectsThis release adds native support for parsing CSV data from Web Standard
Blob,File, andRequestobjects, making the library more versatile across different environments.New Functions:
-
parseBlob(blob, options)- Parse CSV from Blob or File objects- Automatic charset detection from
blob.typeproperty - Supports compression via
decompressionoption - Returns
AsyncIterableIterator<CSVRecord> - Includes
.toArray()and.toStream()namespace methods
- Automatic charset detection from
-
parseFile(file, options)- Enhanced File parsing with automatic error source tracking- Built on top of
parseBlobwith additional functionality - Automatically sets
file.nameas error source for better error reporting - Provides clearer intent when working specifically with File objects
- Useful for file inputs and drag-and-drop scenarios
- Includes
.toArray()and.toStream()namespace methods
- Built on top of
-
parseRequest(request, options)- Server-side Request parsing- Automatic
Content-Typevalidation and charset extraction - Automatic
Content-Encodingdetection and decompression - Designed for Cloudflare Workers, Service Workers, and edge platforms
- Includes
.toArray()and.toStream()namespace methods
- Automatic
High-level API Integration:
The
parse()function now automatically detects and handles these new input types:import { parse } from "web-csv-toolbox"; // Blob/File (browser file uploads) // File objects automatically include filename in error messages const file = input.files[0]; for await (const record of parse(file)) { console.log(record); } // Request (server-side) export default { async fetch(request: Request) { for await (const record of parse(request)) { console.log(record); } }, };
Type System Updates:
- Updated
CSVBinarytype to includeBlobandRequest - Added proper type overloads to
parse()function - Full TypeScript support with generic header types
- New
sourcefield inCommonOptions,CSVRecordAssemblerOptions, andParseError- Allows custom error source identification (e.g., filename, description)
- Automatically populated for File objects
- Improves error messages with contextual information
- Improved internal type naming for better clarity
JoinβJoinCSVFields- More descriptive CSV field joining utility typeSplitβSplitCSVFields- More descriptive CSV field splitting utility type- These are internal utility types used for CSV type-level string manipulation
- Enhanced terminology in type definitions
TokenLocation.rowNumber- Logical CSV row number (includes header)- Clear distinction between physical line numbers (
line) and logical row numbers (rowNumber)
Compression Support:
All binary input types support compressed data:
-
Blob/File: Manual specification via
decompressionoptionparseBlob(file, { decompression: "gzip" });
-
Request: Automatic detection from
Content-Encodingheader// No configuration needed - automatic parseRequest(request);
-
Supported formats:
gzip,deflate,deflate-raw(environment-dependent)
Helper Functions:
getOptionsFromBlob()- Extracts charset from Blob MIME typegetOptionsFromFile()- Extracts options from File (charset + automatic source naming)getOptionsFromRequest()- Processes Request headers (Content-Type, Content-Encoding)parseBlobToStream()- Stream conversion helperparseFileToArray()- Parse File to array of recordsparseFileToStream()- Parse File to ReadableStreamparseRequestToStream()- Stream conversion helper
Documentation:
Comprehensive documentation following DiΓ‘taxis framework:
-
API Reference:
parseBlob.md- Complete API reference with examplesparseFile.md- Alias documentationparseRequest.md- Server-side API reference with examples- Updated
parse.mdto include new input types
-
How-to Guides:
- NEW:
platform-usage/- Environment-specific usage patterns organized by platform- Each topic has its own dedicated guide for easy navigation
- Browser: File input, drag-and-drop, FormData, Fetch API
- Node.js: Buffer, fs.ReadStream, HTTP requests, stdin/stdout
- Deno: Deno.readFile, Deno.open, fetch API
- Organized in
{environment}/{topic}.mdstructure for maintainability
- NEW:
-
Examples:
- File input elements with HTML samples
- Drag-and-drop file uploads
- Compressed file handling (.csv.gz)
- Validation and error handling patterns
- NEW: Node.js Buffer usage (supported via BufferSource compatibility)
- NEW: FormData integration patterns
- NEW: Node.js stream conversion (fs.ReadStream β Web Streams)
-
Updated:
README.md- Added usage examples and API listingschoosing-the-right-api.md- Updated decision tree
Enhanced Error Reporting:
The
sourcefield provides better error context when parsing multiple files:import { parseFile } from "web-csv-toolbox"; // Automatic source tracking try { for await (const record of parseFile(file)) { // ... } } catch (error) { console.error(error.message); // "Field count (100001) exceeded maximum allowed count of 100000 at row 5 in "data.csv"" console.error(error.source); // "data.csv" } // Manual source specification parseString(csv, { source: "API-Export-2024" }); // Error: "... at row 5 in "API-Export-2024""
Security Note: The
sourcefield should not contain sensitive information (API keys, tokens, URLs with credentials) as it may be exposed in error messages and logs.Use Cases:
β Browser File Uploads:
- File input elements (
<input type="file">) - Drag-and-drop interfaces
- Compressed file support (.csv.gz)
β Server-Side Processing:
- Node.js servers
- Deno applications
- Service Workers
β Automatic Header Processing:
- Content-Type validation
- Charset detection
- Content-Encoding decompression
Platform Support:
All new APIs work across:
- Modern browsers (Chrome, Firefox, Edge, Safari)
- Node.js 18+ (via undici Request/Blob)
- Deno
- Service Workers
Breaking Changes:
None - this is a purely additive feature. All existing APIs remain unchanged.
Migration:
No migration needed. New functions are available immediately:
// Before (still works) import { parse } from "web-csv-toolbox"; const response = await fetch("data.csv"); for await (const record of parse(response)) { } // After (new capabilities) import { parseBlob, parseFile, parseRequest } from "web-csv-toolbox"; // Blob support for await (const record of parseBlob(blob)) { } // File support with automatic error source const file = input.files[0]; for await (const record of parseFile(file)) { } // Errors will include: 'in "data.csv"' // Server-side Request support for await (const record of parseRequest(request)) { } // Custom error source for any parser import { parseString } from "web-csv-toolbox"; for await (const record of parseString(csv, { source: "user-import.csv" })) { }
-
-
#608
24f04d7Thanks @kamiazya! - Implement discriminated union pattern forEngineConfigto improve type safetyBreaking Changes
1. EngineConfig Type Structure
EngineConfigis now a discriminated union based on theworkerproperty:Before:
interface EngineConfig { worker?: boolean; workerURL?: string | URL; workerPool?: WorkerPool; workerStrategy?: WorkerCommunicationStrategy; strict?: boolean; onFallback?: (info: EngineFallbackInfo) => void; wasm?: boolean; // ... other properties }
After:
// Base configuration shared by all modes interface BaseEngineConfig { wasm?: boolean; arrayBufferThreshold?: number; backpressureCheckInterval?: BackpressureCheckInterval; queuingStrategy?: QueuingStrategyConfig; } // Main thread configuration (worker is false or undefined) interface MainThreadEngineConfig extends BaseEngineConfig { worker?: false; } // Worker configuration (worker must be true) interface WorkerEngineConfig extends BaseEngineConfig { worker: true; workerURL?: string | URL; workerPool?: WorkerPool; workerStrategy?: WorkerCommunicationStrategy; strict?: boolean; onFallback?: (info: EngineFallbackInfo) => void; } // Union type type EngineConfig = MainThreadEngineConfig | WorkerEngineConfig;
2. Type Safety Improvements
Worker-specific properties are now only available when
worker: true:// β Valid - worker: true allows worker-specific properties const config1: EngineConfig = { worker: true, workerURL: "./worker.js", // β Type-safe workerStrategy: "stream-transfer", strict: true, }; // β Valid - worker: false doesn't require worker properties const config2: EngineConfig = { worker: false, wasm: true, }; // β Type Error - worker: false cannot have workerURL const config3: EngineConfig = { worker: false, workerURL: "./worker.js", // β Type error! };
3. EnginePresets Options Split
EnginePresetOptionsis now split into two interfaces for better type safety:Before:
interface EnginePresetOptions { workerPool?: WorkerPool; workerURL?: string | URL; onFallback?: (info: EngineFallbackInfo) => void; arrayBufferThreshold?: number; // ... } EnginePresets.mainThread(options?: EnginePresetOptions) EnginePresets.fastest(options?: EnginePresetOptions)
After:
// For main thread presets (mainThread, wasm) interface MainThreadPresetOptions extends BasePresetOptions { // No worker-related options } // For worker-based presets (worker, fastest, balanced, etc.) interface WorkerPresetOptions extends BasePresetOptions { workerPool?: WorkerPool; workerURL?: string | URL; onFallback?: (info: EngineFallbackInfo) => void; } EnginePresets.mainThread(options?: MainThreadPresetOptions) EnginePresets.fastest(options?: WorkerPresetOptions)
Migration:
// Before: No type error, but logically incorrect EnginePresets.mainThread({ workerURL: "./worker.js" }); // Accepted but ignored // After: Type error prevents mistakes EnginePresets.mainThread({ workerURL: "./worker.js" }); // β Type error!
4. Transformer Constructor Changes
Queuing strategy parameters changed from optional (
?) to default parameters:Before:
constructor( options?: CSVLexerTransformerOptions, writableStrategy?: QueuingStrategy<string>, readableStrategy?: QueuingStrategy<Token> )
After:
constructor( options: CSVLexerTransformerOptions = {}, writableStrategy: QueuingStrategy<string> = DEFAULT_WRITABLE_STRATEGY, readableStrategy: QueuingStrategy<Token> = DEFAULT_READABLE_STRATEGY )
Impact: This is technically a breaking change in the type signature, but functionally backward compatible since all parameters still have defaults. Existing code will continue to work without modifications.
New Features
1. Default Strategy Constants
Default queuing strategies are now module-level constants using
CountQueuingStrategy:// CSVLexerTransformer const DEFAULT_WRITABLE_STRATEGY: QueuingStrategy<string> = { highWaterMark: 65536, size: (chunk) => chunk.length, }; const DEFAULT_READABLE_STRATEGY = new CountQueuingStrategy({ highWaterMark: 1024, }); // CSVRecordAssemblerTransformer const DEFAULT_WRITABLE_STRATEGY = new CountQueuingStrategy({ highWaterMark: 1024, }); const DEFAULT_READABLE_STRATEGY = new CountQueuingStrategy({ highWaterMark: 256, });
2. Type Tests
Added comprehensive type tests in
src/common/types.test-d.tsto validate the discriminated union behavior:// Validates type narrowing const config: EngineConfig = { worker: true }; expectTypeOf(config).toExtend<WorkerEngineConfig>(); // Validates property exclusion expectTypeOf<MainThreadEngineConfig>().not.toHaveProperty("workerURL");
Migration Guide
For TypeScript Users
If you're passing
EngineConfigobjects explicitly typed, you may need to update:// Before: Could accidentally mix incompatible properties const config: EngineConfig = { worker: false, workerURL: "./worker.js", // Silently ignored }; // After: TypeScript catches the mistake const config: EngineConfig = { worker: false, // workerURL: './worker.js' // β Type error - removed };
For EnginePresets Users
Update preset option types if explicitly typed:
// Before const options: EnginePresetOptions = { workerPool: myPool, }; EnginePresets.mainThread(options); // No error, but workerPool ignored // After const options: WorkerPresetOptions = { // or MainThreadPresetOptions workerPool: myPool, }; EnginePresets.fastest(options); // β Correct usage // EnginePresets.mainThread(options); // β Type error - use MainThreadPresetOptions
For Transformer Users
No code changes required. Existing usage continues to work:
// Still works exactly as before new CSVLexerTransformer(); new CSVLexerTransformer({ delimiter: "," }); new CSVLexerTransformer({}, customWritable, customReadable);
Benefits
- IDE Autocomplete: Better suggestions based on
workersetting - Type Safety: Prevents invalid property combinations
- Self-Documenting: Type system enforces valid configurations
- Catch Errors Early: TypeScript catches configuration mistakes at compile time
- Standards Compliance: Uses
CountQueuingStrategyfrom Web Streams API
- IDE Autocomplete: Better suggestions based on
-
#608
24f04d7Thanks @kamiazya! - refactor!: rename engine presets to clarify optimization targetsThis release improves the naming of engine presets to clearly indicate what each preset optimizes for. The new names focus on performance characteristics (stability, UI responsiveness, parse speed, memory efficiency) rather than implementation details.
Breaking Changes
Engine Preset Renaming
Engine presets have been renamed to better communicate their optimization targets:
- import { EnginePresets } from 'web-csv-toolbox'; + import { EnginePresets } from 'web-csv-toolbox'; - engine: EnginePresets.mainThread() + engine: EnginePresets.stable() - engine: EnginePresets.worker() + engine: EnginePresets.responsive() - engine: EnginePresets.workerStreamTransfer() + engine: EnginePresets.memoryEfficient() - engine: EnginePresets.wasm() + engine: EnginePresets.fast() - engine: EnginePresets.workerWasm() + engine: EnginePresets.responsiveFast()
Optimization targets:
Preset Optimizes For stable()Stability (uses only standard JavaScript APIs) responsive()UI responsiveness (non-blocking) memoryEfficient()Memory efficiency (zero-copy streams) fast()Parse speed (fastest execution time) responsiveFast()UI responsiveness + parse speed balanced()Balanced (general-purpose) Removed Presets
Two presets have been removed:
- engine: EnginePresets.fastest() + engine: EnginePresets.responsiveFast() - engine: EnginePresets.strict() // No replacement - limited use case
Why removed:
fastest(): Misleading name - prioritized UI responsiveness over raw execution speed due to worker communication overheadstrict(): Limited use case - primarily for testing/debugging
Improvements
Clearer Performance Documentation
Each preset now explicitly documents its performance characteristics:
- Parse speed: How fast CSV parsing executes
- UI responsiveness: Whether parsing blocks the main thread
- Memory efficiency: Memory usage patterns
- Stability: API stability level (Most Stable, Stable, Experimental)
Trade-offs Transparency
Documentation now clearly explains the trade-offs for each preset:
// stable() - Most stable, blocks main thread // β Most stable: Uses only standard JavaScript APIs // β No worker communication overhead // β Blocks main thread during parsing // responsive() - Non-blocking, stable // β Non-blocking UI: Parsing runs in worker thread // β οΈ Worker communication overhead // fast() - Fastest parse speed, blocks main thread // β Fast parse speed: Compiled WASM code // β No worker communication overhead // β Blocks main thread // β UTF-8 encoding only // responsiveFast() - Non-blocking + fast, stable // β Non-blocking UI + fast parsing // β οΈ Worker communication overhead // β UTF-8 encoding only
Migration Guide
Quick Migration
Replace old preset names with new names:
mainThread()βstable()- If you need maximum stabilityworker()βresponsive()- If you need non-blocking UIworkerStreamTransfer()βmemoryEfficient()- If you need memory efficiencywasm()βfast()- If you need fastest parse speed (and blocking is acceptable)workerWasm()βresponsiveFast()- If you need non-blocking UI + fast parsingfastest()βresponsiveFast()- Despite the name, this is the correct replacementstrict()β Remove - Or use custom config withstrict: true
Choosing the Right Preset
By priority:
- Stability first:
stable()- Most stable, uses only standard JavaScript APIs - UI responsiveness first:
responsive()orbalanced()- Non-blocking execution - Parse speed first:
fast()- Fastest execution time (blocks main thread) - General-purpose:
balanced()- Balanced performance characteristics
By use case:
- Server-side parsing:
stable()orfast()- Blocking acceptable - Browser with interactive UI:
responsive()orbalanced()- Non-blocking required - UTF-8 files only:
fast()orresponsiveFast()- WASM acceleration - Streaming large files:
memoryEfficient()orbalanced()- Constant memory usage
Example Migration
Before:
import { parseString, EnginePresets } from "web-csv-toolbox"; // Old: Unclear what "fastest" optimizes for for await (const record of parseString(csv, { engine: EnginePresets.fastest(), })) { console.log(record); }
After:
import { parseString, EnginePresets } from "web-csv-toolbox"; // New: Clear that this optimizes for UI responsiveness + parse speed for await (const record of parseString(csv, { engine: EnginePresets.responsiveFast(), })) { console.log(record); }
Documentation Updates
All documentation has been updated to reflect the new preset names and include detailed performance characteristics, trade-offs, and use case guidance.
See the Engine Presets Reference for complete documentation.
-
#608
24f04d7Thanks @kamiazya! - Add experimental performance tuning options to Engine configuration:backpressureCheckIntervalandqueuingStrategyNew Experimental Features
Added advanced performance tuning options for fine-grained control over streaming behavior:
engine.backpressureCheckIntervalControls how frequently the internal parsers check for backpressure during streaming operations (count-based).
Default:
{ lexer: 100, // Check every 100 tokens processed assembler: 10 // Check every 10 records processed }
Trade-offs:
- Lower values: More frequent backpressure checks, more responsive to downstream consumers
- Higher values: Less frequent backpressure checks, reduced checking overhead
Potential Use Cases:
- Memory-constrained environments: Consider lower values for more responsive backpressure
- Scenarios where checking overhead is a concern: Consider higher values
- Slow consumers: Consider lower values to propagate backpressure more quickly
engine.queuingStrategyControls the internal queuing behavior of the CSV parser's streaming pipeline.
Default: Designed to balance memory usage and buffering behavior
Structure:
{ lexerWritable?: QueuingStrategy<string>; lexerReadable?: QueuingStrategy<Token>; assemblerWritable?: QueuingStrategy<Token>; assemblerReadable?: QueuingStrategy<CSVRecord<any>>; }
Pipeline Stages:
The CSV parser uses a two-stage pipeline:- Lexer: String β Token
- Assembler: Token β CSVRecord
Each stage has both writable (input) and readable (output) sides:
lexerWritable- Lexer input (string chunks)lexerReadable- Lexer output (tokens)assemblerWritable- Assembler input (tokens from lexer)assemblerReadable- Assembler output (CSV records)
Theoretical Trade-offs:
- Small highWaterMark (1-10): Less memory for buffering, backpressure applied more quickly
- Medium highWaterMark (default): Balanced memory and buffering
- Large highWaterMark (100+): More memory for buffering, backpressure applied less frequently
Note: Actual performance characteristics depend on your specific use case and runtime environment. Profile your application to determine optimal values.
Potential Use Cases:
- IoT/Embedded: Consider smaller highWaterMark for minimal memory footprint
- Server-side batch processing: Consider larger highWaterMark for more buffering
- Real-time streaming: Consider smaller highWaterMark for faster backpressure propagation
Usage Examples
Configuration Example: Tuning for Potential High-Throughput Scenarios
import { parseString, EnginePresets } from "web-csv-toolbox"; const config = EnginePresets.fastest({ backpressureCheckInterval: { lexer: 200, // Check every 200 tokens (less frequent) assembler: 20, // Check every 20 records (less frequent) }, queuingStrategy: { lexerReadable: new CountQueuingStrategy({ highWaterMark: 100 }), assemblerReadable: new CountQueuingStrategy({ highWaterMark: 50 }), }, }); for await (const record of parseString(csv, { engine: config })) { console.log(record); }
Memory-Constrained Environment
import { parseString, EnginePresets } from "web-csv-toolbox"; const config = EnginePresets.balanced({ backpressureCheckInterval: { lexer: 10, // Check every 10 tokens (frequent checks) assembler: 5, // Check every 5 records (frequent checks) }, queuingStrategy: { // Minimal buffers throughout entire pipeline lexerWritable: new CountQueuingStrategy({ highWaterMark: 1 }), lexerReadable: new CountQueuingStrategy({ highWaterMark: 1 }), assemblerWritable: new CountQueuingStrategy({ highWaterMark: 1 }), assemblerReadable: new CountQueuingStrategy({ highWaterMark: 1 }), }, }); for await (const record of parseString(csv, { engine: config })) { console.log(record); }
β οΈ Experimental StatusThese APIs are marked as experimental and may change in future versions based on ongoing performance research. The default values are designed to work well for most use cases, but optimal values may vary depending on your specific environment and workload.
Recommendation: Only adjust these settings if you're experiencing specific performance issues with large streaming operations or have specific memory/throughput requirements.
Design Philosophy
These options belong to
engineconfiguration because they affect performance and behavior only, not the parsing result specification. This follows the design principle:- Top-level options: Affect specification (result changes)
- Engine options: Affect performance/behavior (same result, different execution)
-
#608
24f04d7Thanks @kamiazya! - feat: introduce "slim" entry point for optimized bundle sizeThis release introduces a new
slimentry point that significantly reduces bundle size by excluding the inlined WebAssembly binary.New Entry Points
The package now offers two distinct entry points:
-
Main (
web-csv-toolbox): The default entry point.- Features: Zero-configuration, works out of the box.
- Trade-off: Includes the WASM binary inlined as base64 (~110KB), resulting in a larger bundle size.
- Best for: Prototyping, quick starts, or when bundle size is not a critical constraint.
-
Slim (
web-csv-toolbox/slim): The new optimized entry point.- Features: Smaller bundle size, streaming WASM loading.
- Trade-off: Requires manual initialization of the WASM binary.
- Best for: Production applications where bundle size and load performance are critical.
How to use the "Slim" version
When using the slim version, you must manually load the WASM binary before using any WASM-dependent features (like
parseStringToArraySyncWASMor high-performance parsing presets).import { loadWASM, parseStringToArraySyncWASM } from "web-csv-toolbox/slim"; // You need to provide the URL to the WASM file import wasmUrl from "web-csv-toolbox/csv.wasm?url"; async function init() { // 1. Manually initialize WASM await loadWASM(wasmUrl); // 2. Now you can use WASM-powered functions const data = parseStringToArraySyncWASM("a,b,c\n1,2,3"); console.log(data); } init();
Worker Exports
Corresponding worker exports are also available:
web-csv-toolbox/worker(Main)web-csv-toolbox/worker/slim(Slim)
-
-
#608
24f04d7Thanks @kamiazya! - feat!: add Parser models and streams with improved architectureSummary
This release introduces a new Parser layer that composes Lexer and Assembler components, providing a cleaner architecture and improved streaming support. The implementation follows the design patterns established by the recently developed CSVObjectRecordAssembler and CSVArrayRecordAssembler.
New Features
Parser Models
FlexibleStringCSVParser
- Composes
FlexibleStringCSVLexerand CSV Record Assembler - Stateful parser for string CSV data
- Supports both object and array output formats
- Streaming mode support via
parse(chunk, { stream: true }) - Full options support (delimiter, quotation, columnCountStrategy, etc.)
FlexibleBinaryCSVParser
- Composes
TextDecoderwithFlexibleStringCSVParser - Accepts any BufferSource (Uint8Array, ArrayBuffer, or other TypedArray views)
- Uses
TextDecoderwithstream: trueoption for proper streaming - Supports multiple character encodings (utf-8, shift_jis, etc.)
- BOM handling via
ignoreBOMoption - Fatal error mode via
fataloption
Factory Functions
createStringCSVParser()- Creates FlexibleStringCSVParser instancescreateBinaryCSVParser()- Creates FlexibleBinaryCSVParser instances
Stream Classes
StringCSVParserStream
TransformStream<string, CSVRecord>for streaming string parsing- Wraps Parser instances (not constructing internally)
- Configurable backpressure handling
- Custom queuing strategies support
- Follows existing CSVLexerTransformer pattern
BinaryCSVParserStream
TransformStream<BufferSource, CSVRecord>for streaming binary parsing- Accepts any BufferSource (Uint8Array, ArrayBuffer, or other TypedArray views)
- Handles UTF-8 multi-byte characters across chunk boundaries
- Integration-ready for fetch API and file streaming
- Backpressure management with configurable check intervals
Breaking Changes
Object Format Behavior (Reverted)
While initially explored, the final implementation maintains the existing behavior:
- Empty fields (
,value,): Filled with"" - Missing fields (short rows): Remain as
undefined
This preserves backward compatibility and allows users to distinguish between explicitly empty fields and missing fields.
Array Format Behavior (No Change)
- Empty fields: Filled with
"" - Missing fields with
columnCountStrategy: 'pad': Filled withundefined
Public API Exports (common.ts)
Added exports for:
FlexibleStringCSVParserFlexibleBinaryCSVParsercreateStringCSVParsercreateBinaryCSVParserStringCSVParserStreamBinaryCSVParserStream
Architecture Improvements
Composition Over Implementation
- Parsers compose Lexer + Assembler instead of reimplementing
- Reduces code duplication across the codebase
- Easier to maintain and extend
Streaming Support
TextDecoderwithstream: truefor proper multi-byte character handling- Backpressure handling in Stream classes
- Configurable check intervals for performance tuning
Type Safety
- Maintains full TypeScript strict mode compliance
- Generic type parameters for header types
- Proper CSVRecord type inference based on outputFormat
Migration Guide
For Users of Existing APIs
No changes required. All existing functions (
parseString,parseBinary, etc.) continue to work as before.For Direct Lexer/Assembler Users
Consider migrating to Parser classes for simplified usage:
// Before (manual composition) const lexer = new FlexibleStringCSVLexer(options); const assembler = createCSVRecordAssembler(options); const tokens = lexer.lex(csv); const records = Array.from(assembler.assemble(tokens)); // After (using Parser) const parser = new FlexibleStringCSVParser(options); const records = parser.parse(csv);
For Stream Users
New stream classes provide cleaner API:
// String streaming const parser = new FlexibleStringCSVParser({ header: ["name", "age"] }); const stream = new StringCSVParserStream(parser); await fetch("data.csv") .then((res) => res.body) .pipeThrough(new TextDecoderStream()) .pipeThrough(stream) .pipeTo(yourProcessor); // Binary streaming const parser = new FlexibleBinaryCSVParser({ header: ["name", "age"] }); const stream = new BinaryCSVParserStream(parser); await fetch("data.csv") .then((res) => res.body) .pipeThrough(stream) .pipeTo(yourProcessor);
Performance Considerations
- Backpressure check interval defaults to 100 records
- Writable side: 64KB highWaterMark (byte/character counting)
- Readable side: 256 records highWaterMark
- Configurable via queuing strategies
Documentation
All new classes include comprehensive JSDoc documentation with:
- Usage examples
- Parameter descriptions
- Return type documentation
- Remarks on streaming behavior
- Performance characteristics
- Composes
-
#608
24f04d7Thanks @kamiazya! - feat!: add array output format support for CSV parsingCSV parsing results can now be returned as arrays in addition to objects, with TypeScript Named Tuple support for type-safe column access.
New Features
Array Output Format
Parse CSV data into arrays instead of objects using the
outputFormatoption:import { parseString } from "web-csv-toolbox"; const csv = `name,age,city Alice,30,Tokyo Bob,25,Osaka`; // Array output (new) for await (const record of parseString(csv, { outputFormat: "array" })) { console.log(record); // ['Alice', '30', 'Tokyo'] console.log(record[0]); // 'Alice' - type-safe access with Named Tuples } // Object output (default, unchanged) for await (const record of parseString(csv)) { console.log(record); // { name: 'Alice', age: '30', city: 'Tokyo' } }
Named Tuple Type Support
When headers are provided, array output uses TypeScript Named Tuples for type-safe access:
const csv = `name,age Alice,30`; for await (const record of parseString(csv, { outputFormat: "array" })) { // record type: { readonly [K in keyof ['name', 'age']]: string } // Equivalent to: { readonly 0: string, readonly 1: string, readonly length: 2 } console.log(record[0]); // Type-safe: 'Alice' console.log(record.length); // 2 }
Include Header Option
Include the header row in the output (array format only):
for await (const record of parseString(csv, { outputFormat: "array", includeHeader: true, })) { console.log(record); } // ['name', 'age', 'city'] β Header row // ['Alice', '30', 'Tokyo'] // ['Bob', '25', 'Osaka']
Column Count Strategy
Control how mismatched column counts are handled (array format with header):
const csv = `name,age,city Alice,30 // Missing 'city' Bob,25,Osaka,JP // Extra column`; // Strategy: 'pad' - Pad short rows with undefined, truncate long rows for await (const record of parseString(csv, { outputFormat: "array", columnCountStrategy: "pad", })) { console.log(record); } // ['Alice', '30', undefined] // ['Bob', '25', 'Osaka'] // Strategy: 'strict' - Throw error on mismatch // Strategy: 'truncate' - Truncate long rows, keep short rows as-is // Strategy: 'keep' - Keep all columns as-is (default)
Available strategies:
'keep'(default): Return rows as-is, regardless of header length'pad': Pad short rows withundefined, truncate long rows to header length'strict': ThrowParseErrorif row length doesn't match header length'truncate': Truncate long rows to header length, keep short rows as-is
Breaking Changes
CSVRecordAssembler Interface Separation
For better Rust/WASM implementation, the
CSVRecordAssemblerinterface has been separated:CSVObjectRecordAssembler<Header>- For object format outputCSVArrayRecordAssembler<Header>- For array format output
The unified
CSVRecordAssembler<Header, Format>type remains as a deprecated type alias for backward compatibility.New specialized classes:
import { FlexibleCSVObjectRecordAssembler, FlexibleCSVArrayRecordAssembler, createCSVRecordAssembler, } from "web-csv-toolbox"; // Option 1: Factory function (recommended) const assembler = createCSVRecordAssembler({ outputFormat: "array", includeHeader: true, }); // Option 2: Specialized class for object output const objectAssembler = new FlexibleCSVObjectRecordAssembler({ header: ["name", "age"], }); // Option 3: Specialized class for array output const arrayAssembler = new FlexibleCSVArrayRecordAssembler({ header: ["name", "age"], columnCountStrategy: "strict", });
Type structure:
// Before type CSVRecordAssembler<Header, Format> = { assemble(tokens): IterableIterator<CSVRecord<Header, Format>>; }; // After interface CSVObjectRecordAssembler<Header> { assemble(tokens): IterableIterator<CSVObjectRecord<Header>>; } interface CSVArrayRecordAssembler<Header> { assemble(tokens): IterableIterator<CSVArrayRecord<Header>>; } // Deprecated type alias (backward compatibility) type CSVRecordAssembler<Header, Format> = Format extends "array" ? CSVArrayRecordAssembler<Header> : CSVObjectRecordAssembler<Header>;
Migration Guide
For Most Users
No changes required. All existing code continues to work:
// Existing code works without changes for await (const record of parseString(csv)) { console.log(record); // Still returns objects by default }
Using New Array Output Format
Simply add the
outputFormatoption:// New: Array output for await (const record of parseString(csv, { outputFormat: "array" })) { console.log(record); // Returns arrays }
For Advanced Users Using Low-Level APIs
The existing
FlexibleCSVRecordAssemblerclass continues to work. Optionally migrate to specialized classes:// Option 1: Continue using FlexibleCSVRecordAssembler (no changes needed) const assembler = new FlexibleCSVRecordAssembler({ outputFormat: "array" }); // Option 2: Use factory function (recommended) const assembler = createCSVRecordAssembler({ outputFormat: "array" }); // Option 3: Use specialized classes directly const assembler = new FlexibleCSVArrayRecordAssembler({ header: ["name", "age"], columnCountStrategy: "pad", });
Use Cases
Machine Learning / Data Science
// Easily convert CSV to training data arrays const features = []; for await (const record of parseString(csv, { outputFormat: "array" })) { features.push(record.map(Number)); }
Headerless CSV Files
const csv = `Alice,30,Tokyo Bob,25,Osaka`; for await (const record of parseString(csv, { outputFormat: "array", header: [], // Headerless })) { console.log(record); // ['Alice', '30', 'Tokyo'] }
Type-Safe Column Access
const csv = `name,age,city Alice,30,Tokyo`; for await (const record of parseString(csv, { outputFormat: "array" })) { // TypeScript knows the tuple structure const name: string = record[0]; // Type-safe const age: string = record[1]; // Type-safe const city: string = record[2]; // Type-safe }
Benefits
- Memory efficiency: Arrays use less memory than objects for large datasets
- Type safety: Named Tuples provide compile-time type checking
- Flexibility: Choose output format based on your use case
- Compatibility: Easier integration with ML libraries and data processing pipelines
- Better Rust/WASM support: Separated interfaces simplify native implementation
-
#608
24f04d7Thanks @kamiazya! - refactor!: rename core classes and simplify type systemThis release contains breaking changes for users of low-level APIs. Most users are not affected.
Breaking Changes
1. Class Naming
Low-level CSV processing classes have been renamed:
- import { CSVLexer } from 'web-csv-toolbox'; + import { FlexibleStringCSVLexer } from 'web-csv-toolbox'; - const lexer = new CSVLexer(options); + const lexer = new FlexibleStringCSVLexer(options);
For CSV record assembly, use the factory function or specialized classes:
- import { CSVRecordAssembler } from 'web-csv-toolbox'; + import { createCSVRecordAssembler, FlexibleCSVObjectRecordAssembler, FlexibleCSVArrayRecordAssembler } from 'web-csv-toolbox'; - const assembler = new CSVRecordAssembler(options); + // Option 1: Use factory function (recommended) + const assembler = createCSVRecordAssembler({ outputFormat: 'object', ...options }); + + // Option 2: Use specialized class directly + const assembler = new FlexibleCSVObjectRecordAssembler(options);
2. Type Renaming
The
CSVtype has been renamed toCSVData:- import type { CSV } from 'web-csv-toolbox'; + import type { CSVData } from 'web-csv-toolbox'; - function processCSV(data: CSV) { + function processCSV(data: CSVData) { // ... }
Bug Fixes
- Fixed stream reader locks not being released when AbortSignal was triggered
- Fixed Node.js WASM module loading
- Improved error handling
Migration Guide
For most users: No changes required if you only use high-level functions like
parse(),parseString(),parseBlob(), etc.For advanced users using low-level APIs:
- Rename
CSVtype toCSVData - Rename
CSVLexertoFlexibleStringCSVLexer - Replace
CSVRecordAssemblerwithcreateCSVRecordAssembler()factory function or specialized classes (FlexibleCSVObjectRecordAssembler/FlexibleCSVArrayRecordAssembler)
Patch Changes
-
#608
24f04d7Thanks @kamiazya! - Consolidate and enhance benchmark suiteThis changeset focuses on benchmark organization and expansion:
Benchmark Consolidation:
- Integrated 3 separate benchmark files (concurrent-performance.ts, queuing-strategy.bench.ts, worker-performance.ts) into main.ts
- Unified benchmark suite now contains 57 comprehensive tests
- Added conditional Worker support for Node.js vs browser environments
API Migration:
- Migrated from deprecated
{ execution: ['worker'] }API to new EnginePresets API - Added tests for all engine presets: mainThread, wasm, worker, workerStreamTransfer, workerWasm, balanced, fastest, strict
Bottleneck Detection:
- Added 23 new benchmarks for systematic bottleneck detection:
- Row count scaling (50-5000 rows)
- Field length scaling (10 chars - 10KB)
- Quote ratio impact (0%-100%)
- Column count scaling (10-10,000 columns)
- Line ending comparison (LF vs CRLF)
- Engine comparison at different scales
Documentation Scenario Coverage:
- Added benchmarks for all scenarios mentioned in documentation
- Included WASM performance tests
- Added custom delimiter tests
- Added parseStringStream tests
- Added data transformation overhead tests
Key Findings:
- Column count is the most critical bottleneck (99.7% slower at 10k columns)
- Field length has non-linear behavior at 1KB threshold
- WASM advantage increases with data size (+18% β +32%)
- Quote processing overhead is minimal (1.1-10% depending on scale)
-
#608
24f04d7Thanks @kamiazya! - fix: add charset validation to prevent malicious Content-Type header manipulationThis patch addresses a security vulnerability where malicious or invalid charset values in Content-Type headers could cause parsing failures or unexpected behavior.
Changes:
- Fixed
parseMimeto handle Content-Type parameters without values (preventsundefined.trim()errors) - Added charset validation similar to existing compression validation pattern
- Created
SUPPORTED_CHARSETSconstants for commonly used character encodings - Added
allowNonStandardCharsetsoption toBinaryOptionsfor opt-in support of non-standard charsets - Added error handling in
convertBinaryToStringto catch TextDecoder instantiation failures - Charset values are now validated against a whitelist and normalized to lowercase
Security Impact:
- Invalid or malicious charset values are now rejected with clear error messages
- Prevents DoS attacks via malformed Content-Type headers
- Reduces risk of charset-based injection attacks
Breaking Changes: None - existing valid charset values continue to work as before.
- Fixed
-
#608
24f04d7Thanks @kamiazya! - Add bundler integration guide for Workers and WebAssemblyThis release adds comprehensive documentation for using web-csv-toolbox with modern JavaScript bundlers (Vite, Webpack, Rollup) when using Worker-based or WebAssembly execution.
Package Structure Improvements:
- Moved worker files to root level for cleaner package exports
src/execution/worker/helpers/worker.{node,web}.tsβsrc/worker.{node,web}.ts
- Added
./workerexport with environment-specific resolution (node/browser/default) - Added
./web_csv_toolbox_wasm_bg.wasmexport for explicit WASM file access - Updated internal relative paths in
createWorker.{node,web}.tsto reflect new structure
New Documentation:
-
How-to Guide: Use with Bundlers - Step-by-step configuration for Vite, Webpack, and Rollup
- Worker configuration with
?urlimports - WASM configuration with explicit URL handling
- WorkerPool reuse patterns
- Common issues and troubleshooting
- Worker configuration with
-
Explanation: Package Exports - Deep dive into environment detection mechanism
- Conditional exports for node/browser environments
- Worker implementation differences
- Bundler compatibility
-
Reference: Package Exports - API reference for all package exports
- Export paths and their resolutions
- Conditional export conditions
Updated Documentation:
Added bundler usage notes to all Worker and WASM-related documentation:
README.mddocs/explanation/execution-strategies.mddocs/explanation/worker-pool-architecture.mddocs/how-to-guides/choosing-the-right-api.mddocs/how-to-guides/wasm-performance-optimization.md
Key Differences: Workers vs WASM with Bundlers
Workers π’:
- Bundled automatically as data URLs using
?urlsuffix - Works out of the box with Vite
- Example:
import workerUrl from 'web-csv-toolbox/worker?url'
WASM π‘:
- Requires explicit URL configuration via
?urlimport - Must call
loadWASM(wasmUrl)before parsing - Example:
import wasmUrl from 'web-csv-toolbox/web_csv_toolbox_wasm_bg.wasm?url' - Alternative: Copy WASM file to public directory
Migration Guide:
For users already using Workers with bundlers, no changes are required. The package now explicitly documents the
workerURLoption that was previously implicit.For new users, follow the bundler integration guide:
import { parseString, EnginePresets } from "web-csv-toolbox"; import workerUrl from "web-csv-toolbox/worker?url"; // Vite for await (const record of parseString(csv, { engine: EnginePresets.worker({ workerURL: workerUrl }), })) { console.log(record); }
Breaking Changes:
None - this is purely additive documentation and package export improvements. Existing code continues to work without modifications.
- Moved worker files to root level for cleaner package exports
-
#608
24f04d7Thanks @kamiazya! - Refactor CI workflows to separate TypeScript and Rust environmentsThis change improves CI efficiency by:
- Splitting setup actions into setup-typescript, setup-rust, and setup-full
- Separating WASM build and TypeScript build jobs with clear dependencies
- Removing unnecessary tool installations from jobs that don't need them
- Clarifying dependencies between TypeScript tests and WASM artifacts
-
#608
24f04d7Thanks @kamiazya! - chore: eliminate circular dependencies and improve code qualityThis patch improves the internal code structure by eliminating all circular dependencies and adding tooling to prevent future issues.
Changes:
- Introduced
madgefor circular dependency detection and visualization - Eliminated circular dependencies:
common/types.tsβutils/types.ts: Merged type definitions intocommon/types.tsparseFile.tsβparseFileToArray.ts: Refactored to use direct dependencies
- Fixed import paths in test files to consistently use
.tsextension - Added npm scripts for dependency analysis:
check:circular: Detect circular dependenciesgraph:main: Visualize main entry point dependenciesgraph:worker: Visualize worker entry point dependenciesgraph:json,graph:summary,graph:orphans,graph:leaves: Various analysis tools
- Added circular dependency check to CI pipeline (
.github/workflows/.build.yaml) - Updated
.gitignoreto exclude generated dependency graph files
Impact:
- No runtime behavior changes
- Better maintainability and code structure
- Faster build times due to cleaner dependency graph
- Automated prevention of circular dependency introduction
Breaking Changes: None - this is purely an internal refactoring with no API changes.
- Introduced
-
#608
24f04d7Thanks @kamiazya! - docs: comprehensive documentation update and new examplesThis release brings significant improvements to the documentation and examples, making it easier to get started and use advanced features.
New Examples
Added comprehensive example projects for various environments and bundlers:
- Deno:
examples/deno-main,examples/deno-slim - Node.js:
examples/node-main,examples/node-slim,examples/node-worker-main - Vite:
examples/vite-bundle-main,examples/vite-bundle-slim,examples/vite-bundle-worker-main,examples/vite-bundle-worker-slim - Webpack:
examples/webpack-bundle-worker-main,examples/webpack-bundle-worker-slim
These examples demonstrate:
- How to use the new
slimentry point - Worker integration with different bundlers
- Configuration for Vite and Webpack
- TypeScript setup
Documentation Improvements
- Engine Presets: Detailed guide on choosing the right engine preset for your use case
- Main vs Slim: Explanation of the trade-offs between the main (auto-init) and slim (manual-init) entry points
- WASM Architecture: Updated architecture documentation reflecting the new module structure
- Performance Guide: Improved guide on optimizing performance with WASM and Workers
- Deno:
-
#608
24f04d7Thanks @kamiazya! - Expand browser testing coverage and improve documentationTesting Infrastructure Improvements:
- macOS Browser Testing: Added Chrome and Firefox testing on macOS in CI/CD
- Vitest 4 stable browser mode enabled headless testing on macOS
- Previously blocked due to Safari headless limitations
- Parallel Browser Execution: Multiple browsers now run in parallel within each OS job
- Linux: Chrome + Firefox in parallel
- macOS: Chrome + Firefox in parallel
- Windows: Chrome + Firefox + Edge in parallel
- Dynamic Browser Configuration: Browser instances automatically determined by platform
- Uses
process.platformto select appropriate browsers - Eliminates need for environment variables
- Uses
- Explicit Browser Project Targeting: Updated
test:browserscript to explicitly run only browser tests- Added
--project browserflag to prevent running Node.js tests during browser test execution - Ensures CI jobs run only their intended test suites
- Added
Documentation Improvements:
- Quick Overview Section: Added comprehensive support matrix and metrics
- Visual support matrix showing all environment/platform combinations
- Tier summary with coverage statistics
- Testing coverage breakdown by category
- Clear legend explaining all support status icons
- Clearer Support Tiers: Improved distinction between support levels
- β Full Support (Tier 1): Tested and officially supported
- π‘ Active Support (Tier 2): Limited testing, active maintenance
- π΅ Community Support (Tier 3): Not tested, best-effort support
- Cross-Platform Runtime Support: Clarified Node.js and Deno support across all platforms
- Node.js LTS: Tier 1 support on Linux, macOS, and Windows
- Deno LTS: Tier 2 support on Linux, macOS, and Windows
- Testing performed on Linux only due to cross-platform runtime design
- Eliminates unnecessary concern about untested platforms
- Simplified Tables: Converted redundant tables to concise bullet lists
- Removed repetitive "Full Support" entries
- Easier to scan and understand
Browser Testing Coverage:
- Chrome: Tested on Linux, macOS, and Windows (Tier 1)
- Firefox: Tested on Linux, macOS, and Windows (Tier 1)
- Edge: Tested on Windows only (Tier 1)
- Safari: Community support (headless mode not supported by Vitest)
Breaking Changes:
None - this release only improves testing infrastructure and documentation.
- macOS Browser Testing: Added Chrome and Firefox testing on macOS in CI/CD
-
#608
24f04d7Thanks @kamiazya! - Add regression tests and documentation for prototype pollution safetyThis changeset adds comprehensive tests and documentation to ensure that CSVRecordAssembler does not cause prototype pollution when processing CSV headers with dangerous property names.
Security Verification:
- Verified that
Object.fromEntries()is safe from prototype pollution attacks - Confirmed that dangerous property names (
__proto__,constructor,prototype) are handled safely - Added 8 comprehensive regression tests in
FlexibleCSVRecordAssembler.prototype-safety.test.ts
Test Coverage:
- Tests with
__proto__as CSV header - Tests with
constructoras CSV header - Tests with
prototypeas CSV header - Tests with multiple dangerous property names
- Tests with multiple records
- Tests with quoted fields
- Baseline tests documenting
Object.fromEntries()behavior
Documentation:
- Added detailed safety comments to all
Object.fromEntries()usage in CSVRecordAssembler - Documented why the implementation is safe from prototype pollution
- Added references to regression tests for verification
Conclusion:
The AI security report suggesting prototype pollution vulnerability was a false positive.Object.fromEntries()creates own properties (not prototype properties), making it inherently safe from prototype pollution attacks. This changeset provides regression tests to prevent future concerns and documents the safety guarantees. - Verified that
-
#608
24f04d7Thanks @kamiazya! - Improve Rust/WASM development environment and add comprehensive testsInternal Improvements
- Migrated from Homebrew Rust to rustup for better toolchain management
- Updated Rust dependencies to latest versions (csv 1.4, wasm-bindgen 0.2.105, serde 1.0.228)
- Added 10 comprehensive unit tests for CSV parsing functionality
- Added Criterion-based benchmarks for performance tracking
- Improved error handling in WASM bindings
- Configured rust-analyzer and development tools (rustfmt, clippy)
- Added
pkg/directory to.gitignore(build artifacts should not be tracked) - Added Rust tests to CI pipeline (GitHub Actions Dynamic Tests workflow)
- Integrated Rust coverage with Codecov (separate from TypeScript with
rustflag) - Integrated Rust benchmarks with CodSpeed for performance regression detection
These changes improve code quality and maintainability without affecting the public API or functionality.
-
#608
24f04d7Thanks @kamiazya! - chore: upgrade Biome to 2.3.4 and update configurationUpgraded development dependency @biomejs/biome from 1.9.4 to 2.3.4 and updated configuration for compatibility with Biome v2. This change has no impact on the runtime behavior or public API.
-
#608
24f04d7Thanks @kamiazya! - chore: upgrade TypeScript to 5.9.3 and typedoc to 0.28.14 with enhanced documentationDeveloper Experience Improvements:
- Upgraded TypeScript from 5.8.3 to 5.9.3
- Upgraded typedoc from 0.28.5 to 0.28.14
- Enabled strict type checking options (
noUncheckedIndexedAccess,exactOptionalPropertyTypes) - Enhanced TypeDoc configuration with version display, improved sorting, and navigation
- Integrated all documentation markdown files with TypeDoc using native
projectDocumentssupport - Added YAML frontmatter to all documentation files for better organization
Type Safety Enhancements:
- Added explicit
| undefinedto all optional properties for stricter type checking - Added proper undefined checks for array/object indexed access
- Improved TextDecoderOptions usage to avoid explicit undefined values
Documentation Improvements:
- Enhanced TypeDoc navigation with categories, groups, and folders
- Added sidebar and navigation links to GitHub and npm
- Organized documentation into Tutorials, How-to Guides, Explanation, and Reference sections
- Improved documentation discoverability with YAML frontmatter grouping
Breaking Changes: None - all changes are backward compatible
-
#608
24f04d7Thanks @kamiazya! - feat(wasm): add input size validation and source option for error reportingThis patch enhances the WASM CSV parser with security improvements and better error reporting capabilities.
Security Enhancements:
- Input Size Validation: Added validation to prevent memory exhaustion attacks
- Validates CSV input size against
maxBufferSizeparameter before processing - Returns clear error message when size limit is exceeded
- Default limit: 10MB (configurable via TypeScript options)
- Addresses potential DoS vulnerability from maliciously large CSV inputs
- Validates CSV input size against
Error Reporting Improvements:
- Source Option: Added optional
sourceparameter for better error context- Allows specifying a source identifier (e.g., filename) in error messages
- Error format:
"Error message in \"filename\"" - Significantly improves debugging when processing multiple CSV files
- Aligns with TypeScript implementation's
CommonOptions.source
Performance Optimizations:
- Optimized
format_error()to take ownership of String- Avoids unnecessary allocation when source is None
- Improves error path performance by eliminating
to_string()call - Zero-cost abstraction in the common case (no source identifier)
Code Quality Improvements:
- Used
bool::then_some()for more idiomatic Option handling - Fixed Clippy
needless_borrowwarnings in tests - Applied cargo fmt formatting for consistency
Implementation Details:
Rust (
web-csv-toolbox-wasm/src/lib.rs):- Added
format_error()helper function for consistent error formatting - Updated
parse_csv_to_json()to acceptmax_buffer_sizeandsourceparameters - Implemented input size validation at parse entry point
- Applied source context to all error types (headers, records, JSON serialization)
TypeScript (
src/parseStringToArraySyncWASM.ts):- Updated to pass
maxBufferSizefrom options to WASM function - Updated to pass
sourcefrom options to WASM function
Breaking Changes: None - this is a backward-compatible enhancement with sensible defaults.
Migration: No action required. Existing code continues to work without modification.
- Input Size Validation: Added validation to prevent memory exhaustion attacks