-
Notifications
You must be signed in to change notification settings - Fork 2
A lot of improvements. Thanks Claude. #224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,57 @@ | ||
| --- | ||
| name: CI | ||
|
|
||
| on: | ||
| push: | ||
| branches: [main, master] | ||
| pull_request: | ||
| branches: [main, master] | ||
|
|
||
| env: | ||
| CARGO_TERM_COLOR: always | ||
|
|
||
| jobs: | ||
| test: | ||
| name: Build and Test | ||
| runs-on: ${{ matrix.os }} | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| os: [ubuntu-latest, macos-latest, windows-latest] | ||
| include: | ||
| - os: ubuntu-latest | ||
| libxml2_install: sudo apt-get update && sudo apt-get install -y libxml2-dev | ||
| - os: macos-latest | ||
| libxml2_install: brew install libxml2 | ||
| - os: windows-latest | ||
| libxml2_install: | | ||
| choco install libxml2 | ||
| echo "LIBXML2_LIB_DIR=C:\tools\libxml2\lib" >> $GITHUB_ENV | ||
| echo "LIBXML2_INCLUDE_DIR=C:\tools\libxml2\include" >> $GITHUB_ENV | ||
|
|
||
| steps: | ||
| - name: Checkout code | ||
| uses: actions/checkout@v5 | ||
|
|
||
| - name: Install Rust toolchain | ||
| uses: dtolnay/rust-toolchain@stable | ||
| with: | ||
| components: rustfmt, clippy | ||
|
Comment on lines
+37
to
+39
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chainVerify clippy runs on all code (tests and benches). Line 54 runs 🏁 Script executed: #!/bin/bash
# Verify current clippy invocation and check if tests contain lintable code
# Expected: If tests exist, consider adding --all-targets to catch issues there
# Check for test files
fd -e rs -g '*/tests/*' . | head -5
# Show current behavior of cargo clippy (docs only)
echo "Default clippy targets library code and examples, not tests/benches unless --all-targets is used"Length of output: 96 🏁 Script executed: #!/bin/bash
# Verify the actual CI workflow content and search for test files more comprehensively
echo "=== Lines 50-60 of .github/workflows/ci.yml ==="
sed -n '50,60p' .github/workflows/ci.yml
echo ""
echo "=== Search for all Rust test files (broader patterns) ==="
fd -e rs | grep -E '(test|spec)' | head -10
echo ""
echo "=== Check if tests/ directory exists ==="
ls -la tests/ 2>/dev/null || echo "No tests/ directory found"
echo ""
echo "=== Check for inline tests in src ==="
rg -n '#\[cfg\(test\)\]' --type rs | head -5Length of output: 1708 Add Line 54 runs Suggested change: 🤖 Prompt for AI Agents |
||
|
|
||
| - name: Cache Rust dependencies and build artifacts | ||
| uses: Swatinem/rust-cache@v2 | ||
|
|
||
| - name: Install libxml2 | ||
| run: ${{ matrix.libxml2_install }} | ||
|
|
||
| - name: Build | ||
| run: cargo build --release | ||
|
|
||
| - name: Run tests | ||
| run: cargo test | ||
|
|
||
| - name: Run clippy | ||
| run: cargo clippy -- -D warnings | ||
|
|
||
| - name: Check formatting | ||
| run: cargo fmt --check | ||
This file was deleted.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,278 @@ | ||
| # CLAUDE.md | ||
|
|
||
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | ||
|
|
||
| ## Project Overview | ||
|
|
||
| validate-xml is a high-performance XML schema validator written in Rust. It validates thousands of XML files against XSD schemas using concurrent processing and intelligent two-tier caching (memory + disk). Built with libxml2 FFI bindings and async I/O throughout. | ||
|
|
||
| **Key Performance**: Validates 20,000 files in ~2 seconds (cached) or ~30 seconds (first run with schema downloads). | ||
|
|
||
| ## Common Commands | ||
|
|
||
| ### Building and Testing | ||
|
|
||
| ```bash | ||
| # Development build | ||
| cargo build | ||
|
|
||
| # Release build (optimized) | ||
| cargo build --release | ||
|
|
||
| # Run all tests (deterministic, no network calls) | ||
| cargo test | ||
|
|
||
| # Run a specific test | ||
| cargo test test_name | ||
|
|
||
| # Run tests with output visible | ||
| cargo test -- --nocapture | ||
|
|
||
| # Run only library tests (fastest) | ||
| cargo test --lib | ||
|
|
||
| # Run ignored network tests (requires internet) | ||
| cargo test -- --ignored | ||
|
|
||
| # Run a single test file | ||
| cargo test --test http_client_test | ||
| ``` | ||
|
|
||
| ### Running the Binary | ||
|
|
||
| ```bash | ||
| # Run with development build | ||
| cargo run -- /path/to/xml/files | ||
|
|
||
| # Run with release build (much faster) | ||
| cargo run --release -- /path/to/xml/files | ||
|
|
||
| # With options | ||
| cargo run --release -- --verbose --extensions xml,cmdi /path/to/files | ||
|
|
||
| # With debug logging | ||
| RUST_LOG=debug cargo run -- /path/to/files | ||
| ``` | ||
|
|
||
| ### Code Quality | ||
|
|
||
| ```bash | ||
| # Format code | ||
| cargo fmt | ||
|
|
||
| # Check formatting without changes | ||
| cargo fmt --check | ||
|
|
||
| # Run clippy linter | ||
| cargo clippy | ||
|
|
||
| # Fix clippy warnings automatically | ||
| cargo clippy --fix | ||
| ``` | ||
|
|
||
| ## Architecture | ||
|
|
||
| ### Core Components | ||
|
|
||
| The codebase follows a modular async-first architecture with clear separation of concerns: | ||
|
|
||
| 1. **File Discovery** (`file_discovery.rs`) | ||
| - Recursively traverses directories to find XML files | ||
| - Filters by extension using glob patterns | ||
| - Single-threaded sequential operation | ||
|
|
||
| 2. **Schema Loading** (`schema_loader.rs`) | ||
| - Extracts schema URLs from XML using regex (xsi:schemaLocation, xsi:noNamespaceSchemaLocation) | ||
| - Downloads remote schemas via async HTTP client | ||
| - Validates schema content before caching | ||
| - Integrates with two-tier cache system | ||
|
|
||
| 3. **Two-Tier Caching** (`cache.rs`) | ||
| - **L1 (Memory)**: moka cache for in-run reuse (microsecond lookups) | ||
| - **L2 (Disk)**: cacache for cross-run persistence (millisecond lookups) | ||
| - Thread-safe via Arc wrapping | ||
| - Configurable TTL and size limits | ||
|
|
||
| 4. **Validation Engine** (`validator.rs`) | ||
| - **Hybrid architecture**: Async I/O orchestration + sync CPU-bound validation | ||
| - Spawns concurrent async tasks (bounded by semaphore) | ||
| - Each task: load XML → fetch schema → validate via libxml2 (synchronous, thread-safe) | ||
| - Collects results and statistics | ||
| - Default concurrency = CPU core count | ||
|
|
||
| 5. **libxml2 FFI** (`libxml2.rs`) | ||
| - Safe Rust wrappers around unsafe C FFI calls | ||
| - Memory management via RAII patterns | ||
| - Schema parsing and XML validation | ||
| - **CRITICAL Thread Safety**: | ||
| - Schema parsing is NOT thread-safe (serialized via cache) | ||
| - Validation IS thread-safe (parallel execution, no global locks) | ||
|
|
||
| 6. **Error Handling** (`error.rs`, `error_reporter.rs`) | ||
| - Structured error types using thiserror | ||
| - Context-rich error messages with recovery hints | ||
| - Line/column precision for validation errors | ||
| - Both human-readable and JSON output formats | ||
|
|
||
| 7. **Configuration** (`config.rs`) | ||
| - Environment variable support via `EnvProvider` trait pattern | ||
| - File-based config (TOML/JSON) | ||
| - CLI argument merging (CLI > env > file > defaults) | ||
| - **IMPORTANT**: Uses dependency injection for testability | ||
|
|
||
| ### Data Flow | ||
|
|
||
| ``` | ||
| CLI Args → Config Merge → File Discovery → Schema Extraction | ||
| ↓ | ||
| Schema Cache Check | ||
| (L1 → L2 → HTTP) | ||
| ↓ | ||
| Concurrent Validation Tasks | ||
| (bounded by semaphore) | ||
| ↓ | ||
| Error Aggregation → Output | ||
| (Text or JSON format) | ||
| ``` | ||
|
|
||
| ### Key Design Patterns | ||
|
|
||
| 1. **Async-First**: All I/O operations use tokio async runtime | ||
| 2. **Dependency Injection**: Config system uses `EnvProvider` trait for testability | ||
| 3. **Two-Tier Caching**: Memory (fast) + Disk (persistent) for optimal performance | ||
| 4. **Bounded Concurrency**: Semaphore limits prevent resource exhaustion | ||
| 5. **RAII for FFI**: Proper cleanup of libxml2 resources via Drop trait | ||
|
|
||
| ## Testing Philosophy | ||
|
|
||
| ### Test Structure | ||
|
|
||
| The project has **214+ passing tests** organized as: | ||
| - **115 unit tests** in `src/` modules (fast, no I/O) | ||
| - **99 integration tests** in `tests/` (slower, includes I/O simulation) | ||
| - **24 ignored tests** (network-dependent, run explicitly with `--ignored`) | ||
|
|
||
| ### Critical Testing Rules | ||
|
|
||
| 1. **No Unsafe Code in Tests**: All environment variable manipulation must use `MockEnvProvider` pattern (see `src/config.rs` tests) | ||
|
|
||
| 2. **No Real Network Calls**: Tests making HTTP requests to external services (httpbin.org) must be marked `#[ignore]` | ||
| ```rust | ||
| #[tokio::test] | ||
| #[ignore] // Requires internet connectivity - run with: cargo test -- --ignored | ||
| async fn test_network_operation() { ... } | ||
| ``` | ||
|
|
||
| 3. **Deterministic Tests Only**: Never use: | ||
| - `tokio::time::sleep()` without proper synchronization | ||
| - `tokio::spawn()` without waiting for completion | ||
| - Real system time for timing assertions | ||
|
|
||
| 4. **Race Condition Prevention**: When testing concurrent code, use proper synchronization: | ||
| ```rust | ||
| // BAD: Race condition | ||
| tokio::spawn(async move { /* ... */ }); | ||
| tokio::time::sleep(Duration::from_millis(50)).await; // Hope it finishes | ||
|
|
||
| // GOOD: Proper synchronization | ||
| let handle = tokio::spawn(async move { /* ... */ }); | ||
| handle.await.unwrap(); // Wait for completion | ||
| ``` | ||
|
|
||
| ### Running Flaky/Network Tests | ||
|
|
||
| Network tests are ignored by default to ensure CI reliability: | ||
| ```bash | ||
| # Run only network tests | ||
| cargo test -- --ignored | ||
|
|
||
| # Run all tests including network tests | ||
| cargo test -- --include-ignored | ||
| ``` | ||
|
|
||
| ## Environment Variables | ||
|
|
||
| The config system supports environment variable overrides: | ||
|
|
||
| ```bash | ||
| # Cache configuration | ||
| export VALIDATE_XML_CACHE_DIR=/custom/cache | ||
| export VALIDATE_XML_CACHE_TTL=48 | ||
|
|
||
| # Validation settings | ||
| export VALIDATE_XML_THREADS=4 | ||
| export VALIDATE_XML_TIMEOUT=120 | ||
|
|
||
| # Output settings | ||
| export VALIDATE_XML_VERBOSE=true | ||
| export VALIDATE_XML_FORMAT=json | ||
| ``` | ||
|
|
||
| ## libxml2 FFI Critical Notes | ||
|
|
||
| When working with `libxml2.rs`: | ||
|
|
||
| 1. **Memory Safety**: All pointers must be checked for null before dereferencing | ||
| 2. **Cleanup**: Schema contexts must be freed via `xmlSchemaFree` in Drop implementations | ||
| 3. **Thread Safety** (see ARCHITECTURE_CHANGES.md for details): | ||
| - **Schema parsing** (`xmlSchemaParse`): NOT thread-safe, serialized via cache | ||
| - **Validation** (`xmlSchemaValidateFile`): IS thread-safe, runs in parallel | ||
| - Arc-wrapped schemas enable safe sharing across tasks | ||
| - Each validation creates its own context (per-task isolation) | ||
| 4. **Error Handling**: libxml2 prints errors to stderr - this is expected in tests (e.g., "Schemas parser error" messages) | ||
|
|
||
| Example safe pattern: | ||
| ```rust | ||
| impl Drop for SchemaContext { | ||
| fn drop(&mut self) { | ||
| unsafe { | ||
| if !self.schema.is_null() { | ||
| xmlSchemaFree(self.schema); | ||
| } | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ## Dependency Injection Pattern | ||
|
|
||
| For testability, the config system uses trait-based dependency injection: | ||
|
|
||
| ```rust | ||
| // Production: uses real environment variables | ||
| ConfigManager::apply_environment_overrides(config) | ||
|
|
||
| // Testing: uses mock provider (no unsafe code) | ||
| let mut mock_env = MockEnvProvider::new(); | ||
| mock_env.set("VALIDATE_XML_THREADS", "16"); | ||
| ConfigManager::apply_environment_overrides_with(&mock_env, config) | ||
| ``` | ||
|
|
||
| **Never** use `std::env::set_var` or `std::env::remove_var` in tests - always use `MockEnvProvider`. | ||
|
|
||
| ## Performance Considerations | ||
|
|
||
| 1. **Schema Caching**: First run downloads schemas (~30s for 20k files), subsequent runs use cache (~2s) | ||
| 2. **Concurrency**: Default = CPU cores, but can be limited for memory-constrained systems | ||
| 3. **Memory**: Bounded by L1 cache size (default 100 entries) and concurrent task count | ||
| 4. **Network**: HTTP client uses connection pooling and retry logic with exponential backoff | ||
|
|
||
| ## Common Gotchas | ||
|
|
||
| 1. **libxml2 Errors to stderr**: The message "Schemas parser error : The XML document 'in_memory_buffer' is not a schema document" is EXPECTED in test output - it's from tests validating error handling | ||
|
|
||
| 2. **Timing Tests**: Any test using `tokio::time::sleep()` is likely flaky - refactor to use proper synchronization | ||
|
|
||
| 3. **Environment Pollution**: Tests must not modify global environment state - use `MockEnvProvider` pattern | ||
|
|
||
| 4. **Ignored Tests**: Running full test suite may show "24 ignored" - this is correct (network tests) | ||
|
|
||
| ## Code Generation and AI Assistance | ||
|
|
||
| This project was collaboratively developed with Claude Code. When making changes: | ||
|
|
||
| 1. Maintain the existing architecture patterns (async-first, dependency injection, trait-based abstractions) | ||
| 2. Add tests for all new functionality (aim for 100% coverage) | ||
| 3. Update documentation strings for public APIs | ||
| 4. Run full test suite before committing: `cargo test && cargo clippy` | ||
| 5. For network-dependent code, mark tests with `#[ignore]` and document why |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify Windows libxml2 setup reliability.
The Windows libxml2 installation uses
chocoand hardcoded environment paths. This may fail if Chocolatey package structure changes or paths vary across CI runners.🏁 Script executed:
Length of output: 51
🌐 Web query:
💡 Result:
Short answer — recommended options (pick one):
MSYS2 (recommended for MinGW toolchains / pkg-config workflows)
vcpkg (recommended for MSVC/CMake toolchains)
Build libxml2 from source with CMake (or the Windows project files) and point your crate to the install
Other package managers / ecosystems
Notes:
Sources:
[1] MSYS2 libxml2 / libxml2-devel / mingw-w64 packages (pacman instructions).
[2] MSYS2 mingw-w64 libxml2 package page (commands and packages).
[3] vcpkg libxml2 package docs (vcpkg install libxml2, CMake targets).
[4] Example vcpkg usage output / discussion showing libxml2 installed for x64-windows.
[5] libxml2 Windows README / win32 docs (recommend CMake; build/install guidance).
[6] Libxml2 CMake/autotools notes and upstream release/build info.
[7] xrepo/xmake and other package manager entries for libxml2.
If you want, tell me which Windows toolchain (MSVC or GNU) and Rust crate you’re using and I’ll give exact commands and env vars.
I need to gather more context about the repository's setup before making a final assessment. Let me examine the CI configuration more thoroughly and check the project's Rust dependencies.
Now let me examine the repository's actual configuration:
I found a critical mismatch. Let me examine the repository's actual structure and dependencies:
Based on my investigation, the Chocolatey community repository does not guarantee reliability, and I've identified a critical issue with the current setup:
The CI workflow installs libxml2 via Chocolatey but then sets environment variables pointing to
C:\tools\libxml2. However, Chocolatey defaults to installing to C:\ProgramData\Chocolatey, making the hardcoded path incorrect. Historical examples show package paths contain version numbers like C:\Chocolatey\lib\libxml2.2.7.8.7\build\native, which are fragile and break when package versions change.Additionally, GitHub-hosted runner images are maintained in the actions/runner-images repository with tools updated weekly, but libxml2 is not preinstalled on Windows runners, requiring manual setup.
Fix the path mismatch: either correct env vars to reference Chocolatey's actual install location, or use a more reliable approach (MSYS2, vcpkg, or downloading binaries from https://www.zlatkovic.com/projects/libxml/) as recommended by projects like ROS 2.