Skip to content

Conversation

@gouhongshen
Copy link
Contributor

@gouhongshen gouhongshen commented Nov 19, 2025

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue ##17994

What this PR does / why we need it:

PR Summary

  • Added OR-aware parsing in BasePKFilter: the parser now distributes AND over OR, builds Disjuncts, and invalidates the whole OR if any branch can’t be mapped to a PK predicate. This enables
    expressions such as (pk = 1 AND pk BETWEEN 5 AND 9) OR pk IN (20, 30) to produce multiple atomic filters.
  • ConstructBlockPKFilter understands those disjuncts, builds per-atom search functions, and unions their offsets so a single filter can cover mixed operators (e.g. pk prefix_eq 'ab' OR pk >= 1024). Mem
    filters remain single-op.
  • Strengthened unit tests (TestConstructBasePKFilterWithOr, TestConstructBlockPKFilterWithOr) to cover composite OR shapes and datatype coverage; vectors are freed properly after each test.
  • Updated BVT SQL suites (block_or_single_pk.sql, block_or_composite_pk.sql, block_or_no_pk.sql) to insert 8K rows with fault injection hooks, add rich OR queries (<10 rows per result, direct row
    outputs), and cover (a AND b) OR c plus long OR chains across all supported PK types.

Examples

  1. Query SELECT ... WHERE pk = 5 OR pk BETWEEN 100 AND 120 OR pk IN (512, 1024) now yields a BlockPKFilter whose SortedSearchFunc merges binary search offsets from all three predicates, ensuring every
    block overlap is found.
  2. Complex condition (pk >= 10 AND pk <= 20) OR pk prefix_eq 'ab' OR pk IN ('z1','z2') now expands into three disjuncts; blocks matching any disjunct are scanned, while MemPKFilter correctly deems
    multi-op filters invalid.

PR Type

Enhancement, Tests


Description

  • Added OR-aware parsing in BasePKFilter that distributes AND over OR, builds disjunctive normal form, and invalidates invalid OR branches

  • Introduced Disjuncts field to BasePKFilter to store multiple atomic filters from OR expressions

  • Refactored ConstructBlockPKFilter to support disjuncts by extracting search function building into buildBlockPKSearchFuncs helper

  • Added combineOffsetFuncs to merge offset results from multiple disjunctive predicates with deduplication and sorting

  • Restricted memory path filters to reject disjuncts, as they only support single atomic predicates

  • Added comprehensive unit tests (TestConstructBasePKFilterWithOr, TestConstructBlockPKFilterWithOr) covering composite OR shapes and multiple data types

  • Added extensive BVT test suites for single PK, composite PK, and non-PK tables with OR conditions across 8+ data types (int, uint, double, decimal, date, timestamp, uuid, varchar)

  • Tests cover mixed operators (equality, IN, BETWEEN, comparison) and complex OR chains with 8K-row datasets and fault injection


Diagram Walkthrough

flowchart LR
  A["OR Expression<br/>e.g. pk=5 OR pk BETWEEN 100-120"] --> B["ConstructBasePKFilter<br/>Distribute AND over OR"]
  B --> C["Build Disjuncts<br/>Multiple atomic filters"]
  C --> D["ConstructBlockPKFilter<br/>Process Disjuncts"]
  D --> E["buildBlockPKSearchFuncs<br/>Per-atom search functions"]
  E --> F["combineOffsetFuncs<br/>Merge offsets"]
  F --> G["Single BlockPKFilter<br/>Covers all OR branches"]
  C --> H["MemPKFilter<br/>Reject disjuncts"]
Loading

File Walkthrough

Relevant files
Enhancement
2 files
pk_filter.go
Refactor block PK filter to support OR conditions with disjuncts

pkg/vm/engine/readutil/pk_filter.go

  • Refactored ConstructBlockPKFilter to support OR conditions by
    extracting search function building into a separate
    buildBlockPKSearchFuncs function
  • Added combineOffsetFuncs helper to merge offset results from multiple
    disjunctive predicates using deduplication and sorting
  • Introduced support for Disjuncts field in BasePKFilter to handle
    multiple atomic filters from OR expressions
  • Moved large type-switch logic for search function construction into
    the new buildBlockPKSearchFuncs function for better code organization
+470/-390
pk_filter_base.go
Add disjunctive normal form support to base PK filter       

pkg/vm/engine/readutil/pk_filter_base.go

  • Added Disjuncts field to BasePKFilter struct to store OR-ed atomic
    filters
  • Enhanced OR clause handling in ConstructBasePKFilter to distribute AND
    over OR and build disjunctive normal form
  • Added toDisjuncts helper function to flatten filters into disjunct
    lists
  • Improved error handling to continue processing valid OR branches
    instead of failing immediately on invalid ones
+48/-14 
Error handling
1 files
pk_filter_mem.go
Restrict memory filters to single atomic predicates           

pkg/vm/engine/readutil/pk_filter_mem.go

  • Added validation to reject memory path filters with disjuncts, as
    memory filters currently only support single atomic predicates
  • Prevents incorrect behavior when OR conditions are present in
    memory-based filtering
+5/-0     
Tests
7 files
filter_test.go
Add comprehensive unit tests for OR condition support       

pkg/vm/engine/readutil/filter_test.go

  • Added TestConstructBasePKFilterWithOr to test OR-aware parsing with
    composite OR shapes and multiple data types
  • Added TestConstructBlockPKFilterWithOr to verify combined search
    functions correctly merge offsets from disjunctive predicates
  • Updated Test_ConstructBasePKFilter to skip OR expressions during
    validation
  • Enhanced TestConstructBlockPKFilterWithBloomFilter with helper
    functions for flexible result validation
+487/-7 
block_or_single_pk.result
Add BVT test results for single PK OR queries                       

test/distributed/cases/disttae/disttae_filters/reader_filters/block_reader/block_or_single_pk.result

  • Added BVT test results for single-column primary key tables with OR
    conditions across multiple data types (int, uint, double, decimal,
    date, timestamp, uuid, varchar)
  • Tests cover mixed operators: equality, IN, BETWEEN, comparison
    operators, and complex OR chains
  • Validates that queries like pk = 5 OR pk BETWEEN 100 AND 120 OR pk IN
    (512, 1024) return correct results
+412/-0 
block_or_composite_pk.result
Add BVT test results for composite PK OR queries                 

test/distributed/cases/disttae/disttae_filters/reader_filters/block_reader/block_or_composite_pk.result

  • Added BVT test results for composite primary key tables with OR
    conditions on the first column
  • Tests cover int and varchar composite keys with mixed operators and
    complex OR chains
  • Validates correct filtering behavior for multi-column primary keys
    with disjunctive predicates
+105/-0 
block_or_no_pk.sql
Add BVT test suite for non-PK OR queries                                 

test/distributed/cases/disttae/disttae_filters/reader_filters/block_reader/block_or_no_pk.sql

  • Added BVT test suite for tables without primary keys to verify OR
    filtering works correctly in non-PK scenarios
  • Tests cover int and varchar columns with various OR conditions and
    mixed operators
  • Ensures fault injection hooks are properly configured for block
    flushing
+32/-0   
block_or_single_pk.sql
Single PK OR condition test suite with multi-type coverage

test/distributed/cases/disttae/disttae_filters/reader_filters/block_reader/block_or_single_pk.sql

  • New test suite for single primary key OR conditions with fault
    injection across 7 data types (int, bigint unsigned, double, decimal,
    date, timestamp, uuid, varchar)
  • Inserts 8192 rows per table and tests various OR query patterns
    including equality, IN clauses, BETWEEN ranges, and comparison
    operators
  • Covers edge cases like contradictory AND conditions within OR
    expressions and long OR chains with multiple operators
  • Uses fault injection to control block flushing for comprehensive
    block-level filter testing
+109/-0 
block_or_no_pk.result
No-PK table OR condition test expected results                     

test/distributed/cases/disttae/disttae_filters/reader_filters/block_reader/block_or_no_pk.result

  • Expected output results for no-primary-key table OR condition tests
    with int and varchar columns
  • Validates correct row filtering for OR queries including equality, IN,
    BETWEEN, and comparison operators
  • Tests 8192-row datasets with fault injection to verify block-level
    filtering behavior on non-PK tables
  • Demonstrates that OR conditions work correctly even without primary
    key constraints
+105/-0 
block_or_composite_pk.sql
Composite PK OR condition test suite with multi-column keys

test/distributed/cases/disttae/disttae_filters/reader_filters/block_reader/block_or_composite_pk.sql

  • New test suite for composite primary key OR conditions with fault
    injection across 2 data type combinations (int pairs and varchar
    pairs)
  • Inserts 8192 rows per table and tests OR query patterns on the first
    PK column with various operators
  • Covers equality, IN clauses, BETWEEN ranges, comparison operators, and
    long OR chains
  • Validates that composite PK filters correctly handle disjunctive
    conditions across multiple blocks
+32/-0   

@mergify
Copy link
Contributor

mergify bot commented Dec 4, 2025

Merge Queue Status Beta

✅ The pull request has been merged

This pull request spent 8 seconds in the queue, with no time running CI.
The checks were run in-place.

Required conditions to merge
  • #approved-reviews-by >= 1 [🛡 GitHub branch protection]
  • #changes-requested-reviews-by = 0 [🛡 GitHub branch protection]
  • #review-threads-unresolved = 0 [🛡 GitHub branch protection]
  • branch-protection-review-decision = APPROVED [🛡 GitHub branch protection]
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-neutral = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-skipped = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / SCA Test on Ubuntu/x86
    • check-neutral = Matrixone CI / SCA Test on Ubuntu/x86
    • check-skipped = Matrixone CI / SCA Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / UT Test on Ubuntu/x86
    • check-neutral = Matrixone CI / UT Test on Ubuntu/x86
    • check-skipped = Matrixone CI / UT Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-neutral = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-skipped = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Utils CI / Coverage
    • check-neutral = Matrixone Utils CI / Coverage
    • check-skipped = Matrixone Utils CI / Coverage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/enhancement Review effort 4/5 size/XXL Denotes a PR that changes 2000+ lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants