Compute Dynamic Filters only when a consumer supports them #18938

LiaCastaneda · 2025-11-26T10:39:05Z

Which issue does this PR close?

Rationale for this change

Currently, DataFusion computes bounds for all queries that contain a HashJoinExec node whenever the option enable_dynamic_filter_pushdown is set to true (default). It might make sense to compute these bounds only when we explicitly know there is a consumer that will use them.

What changes are included in this PR?

This PR expands the filter pushdown result enum from two variants (Yes/No) to three variants : Exact/Inexact/Unsupported as suggested in #18856 and #17527

in handle_child_pushdown_result, the HashJoinExec checks the discriminant returned by its probe side child to determine whether the dynamic filter will be used. If the child returns Unsupported, the HashJoinExec skips creating the dynamic filter accumulator, avoiding unnecessary computation.

Are these changes tested?

Added a test test_hash_join_dynamic_filter_with_unsupported_scan that verifies that the DynamicFilter placeholder is not present in the probe node.

Are there any user-facing changes?

Yes, the PushedDown enum now has three variants instead of two.

LiaCastaneda · 2025-11-26T10:50:15Z

datafusion/physical-expr/src/expressions/dynamic_filters.rs

+    pub fn is_used(self: &Arc<Self>) -> bool {
+        // Strong count > 1 means at least one consumer is holding a reference beyond the producer.
+        Arc::strong_count(self) > 1
+    }


I'm not really sure how to test a condition where is_used() returns false without adding too much machinery or making the dynamic_filter attribute from HashJoin public which would make it easy to mess with the Arc reference count.

Can’t you just make a new DynamicFilterPhysicalExpr and check that is_used is False?

Looks like you have a test already?

adriangb

I know this is draft but the current code looks good to me, I’ll approve once it’s ready :)

LiaCastaneda · 2025-11-26T11:20:09Z

Something funky is going on here with the Arc count, some queries are not pushing down the filter to the probe because Arc count remains at 1 -> Arc count=1 in execution time, will take a look...

It doesn't happen in all the tests though, which is strange

Edit: For the tests that fails seems like partition 0 never gets to see a strong_count >1

[partition 0] Arc count at execution: 1
  [partition 0] Is used: false
  [partition 1] Arc count at execution: 2
  [partition 1] Is used: true
  [partition 2] Arc count at execution: 2
  [partition 2] Is used: true
  [partition 3] Arc count at execution: 2
  [partition 3] Is used: true
  [partition 4] Arc count at execution: 2
  [partition 4] Is used: true
  [partition 5] Arc count at execution: 2
  [partition 5] Is used: true
  [partition 6] Arc count at execution: 2
  [partition 6] Is used: true
  [partition 7] Arc count at execution: 2
  [partition 7] Is used: true
  [partition 8] Arc count at execution: 2
  [partition 8] Is used: true
  [partition 9] Arc count at execution: 2
  [partition 9] Is used: true
  [partition 10] Arc count at execution: 2
  [partition 10] Is used: true
  [partition 11] Arc count at execution: 2
  [partition 11] Is used: true

I think this might be because we clone the DynamicFilterPhysicalExpr directly in the execute phase of DataSourceExec (at least for this kind of node)

In any case, seems like the strong_count approach might hit some edge cases here...

…ilters-only-when-consumer-asks-for-it

alamb

Makes sense to me -- thank you @LiaCastaneda

The only thing I think is needed is to add a note to the upgrade guide

@adriangb do you want to take a look at this PR prior to merge?

alamb · 2025-12-01T15:51:08Z

datafusion/physical-plan/src/filter_pushdown.rs

 /// Discriminant for the result of pushing down a filter into a child node.
-#[derive(Debug, Clone, Copy)]
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
 pub enum PushedDown {


Since this is a public enum (doc link) I think this will be an API change (I marked the PR as such)

Can you please add a note to the DataFusion 52 upgrade guide explaining how people need to update their existing code (e.g. if they used to return Yes or No what should they return after this change)?

https://github.com/apache/datafusion/blob/9f725d9c7064813cda0de0f87d115354b68d76e6/docs/source/library-user-guide/upgrading.md#L22-L21

alamb · 2025-12-01T15:53:56Z

run benchmarks

alamb · 2025-12-01T17:32:28Z

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing lia/compute-dyn-filters-only-when-consumer-asks-for-it (728fde9) to fb14d7c diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

alamb · 2025-12-01T18:30:16Z

🤖: Benchmark completed

Details

Comparing HEAD and lia_compute-dyn-filters-only-when-consumer-asks-for-it
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ lia_compute-dyn-filters-only-when-consumer-asks-for-it ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0     │  2697.10 ms │                                             2748.66 ms │ no change │
│ QQuery 1     │  1316.46 ms │                                             1310.41 ms │ no change │
│ QQuery 2     │  2484.15 ms │                                             2498.55 ms │ no change │
│ QQuery 3     │  1116.93 ms │                                             1110.21 ms │ no change │
│ QQuery 4     │  2371.11 ms │                                             2321.44 ms │ no change │
│ QQuery 5     │ 28306.32 ms │                                            28476.49 ms │ no change │
│ QQuery 6     │  4230.39 ms │                                             4237.43 ms │ no change │
│ QQuery 7     │  3647.99 ms │                                             3615.26 ms │ no change │
└──────────────┴─────────────┴────────────────────────────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                                     ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                                     │ 46170.45ms │
│ Total Time (lia_compute-dyn-filters-only-when-consumer-asks-for-it)   │ 46318.45ms │
│ Average Time (HEAD)                                                   │  5771.31ms │
│ Average Time (lia_compute-dyn-filters-only-when-consumer-asks-for-it) │  5789.81ms │
│ Queries Faster                                                        │          0 │
│ Queries Slower                                                        │          0 │
│ Queries with No Change                                                │          8 │
│ Queries with Failure                                                  │          0 │
└───────────────────────────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ lia_compute-dyn-filters-only-when-consumer-asks-for-it ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.53 ms │                                                2.21 ms │ +1.15x faster │
│ QQuery 1     │    50.65 ms │                                               48.48 ms │     no change │
│ QQuery 2     │   138.85 ms │                                              137.64 ms │     no change │
│ QQuery 3     │   166.42 ms │                                              167.48 ms │     no change │
│ QQuery 4     │  1074.24 ms │                                             1107.96 ms │     no change │
│ QQuery 5     │  1500.86 ms │                                             1505.22 ms │     no change │
│ QQuery 6     │     2.19 ms │                                                2.14 ms │     no change │
│ QQuery 7     │    55.99 ms │                                               55.80 ms │     no change │
│ QQuery 8     │  1433.46 ms │                                             1438.73 ms │     no change │
│ QQuery 9     │  1834.71 ms │                                             1855.08 ms │     no change │
│ QQuery 10    │   396.37 ms │                                              397.92 ms │     no change │
│ QQuery 11    │   445.08 ms │                                              450.27 ms │     no change │
│ QQuery 12    │  1337.56 ms │                                             1368.91 ms │     no change │
│ QQuery 13    │  2159.17 ms │                                             2141.45 ms │     no change │
│ QQuery 14    │  1253.87 ms │                                             1285.80 ms │     no change │
│ QQuery 15    │  1214.01 ms │                                             1227.89 ms │     no change │
│ QQuery 16    │  2713.52 ms │                                             2727.51 ms │     no change │
│ QQuery 17    │  2672.03 ms │                                             2696.53 ms │     no change │
│ QQuery 18    │  5312.51 ms │                                             5035.52 ms │ +1.06x faster │
│ QQuery 19    │   130.21 ms │                                              126.79 ms │     no change │
│ QQuery 20    │  2016.14 ms │                                             1991.80 ms │     no change │
│ QQuery 21    │  2342.35 ms │                                             2312.08 ms │     no change │
│ QQuery 22    │  3948.29 ms │                                             3949.44 ms │     no change │
│ QQuery 23    │ 14706.89 ms │                                            13287.83 ms │ +1.11x faster │
│ QQuery 24    │   231.24 ms │                                              218.60 ms │ +1.06x faster │
│ QQuery 25    │   485.79 ms │                                              489.69 ms │     no change │
│ QQuery 26    │   245.70 ms │                                              217.96 ms │ +1.13x faster │
│ QQuery 27    │  2904.01 ms │                                             2833.98 ms │     no change │
│ QQuery 28    │ 23753.40 ms │                                            23583.56 ms │     no change │
│ QQuery 29    │   980.90 ms │                                              973.83 ms │     no change │
│ QQuery 30    │  1374.88 ms │                                             1339.59 ms │     no change │
│ QQuery 31    │  1408.58 ms │                                             1402.35 ms │     no change │
│ QQuery 32    │  4834.01 ms │                                             5151.11 ms │  1.07x slower │
│ QQuery 33    │  5903.63 ms │                                             5954.64 ms │     no change │
│ QQuery 34    │  6095.62 ms │                                             6197.45 ms │     no change │
│ QQuery 35    │  1915.96 ms │                                             1875.74 ms │     no change │
│ QQuery 36    │   118.10 ms │                                              120.20 ms │     no change │
│ QQuery 37    │    52.24 ms │                                               51.90 ms │     no change │
│ QQuery 38    │   121.59 ms │                                              119.37 ms │     no change │
│ QQuery 39    │   198.20 ms │                                              198.99 ms │     no change │
│ QQuery 40    │    42.91 ms │                                               41.56 ms │     no change │
│ QQuery 41    │    40.53 ms │                                               40.88 ms │     no change │
│ QQuery 42    │    33.56 ms │                                               32.87 ms │     no change │
└──────────────┴─────────────┴────────────────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                                     ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                                     │ 97648.75ms │
│ Total Time (lia_compute-dyn-filters-only-when-consumer-asks-for-it)   │ 96164.73ms │
│ Average Time (HEAD)                                                   │  2270.90ms │
│ Average Time (lia_compute-dyn-filters-only-when-consumer-asks-for-it) │  2236.39ms │
│ Queries Faster                                                        │          5 │
│ Queries Slower                                                        │          1 │
│ Queries with No Change                                                │         37 │
│ Queries with Failure                                                  │          0 │
└───────────────────────────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ lia_compute-dyn-filters-only-when-consumer-asks-for-it ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 128.18 ms │                                              139.76 ms │  1.09x slower │
│ QQuery 2     │  27.45 ms │                                               26.80 ms │     no change │
│ QQuery 3     │  38.94 ms │                                               36.01 ms │ +1.08x faster │
│ QQuery 4     │  28.49 ms │                                               28.88 ms │     no change │
│ QQuery 5     │  87.67 ms │                                               86.86 ms │     no change │
│ QQuery 6     │  19.14 ms │                                               19.34 ms │     no change │
│ QQuery 7     │ 216.95 ms │                                              224.91 ms │     no change │
│ QQuery 8     │  34.24 ms │                                               30.97 ms │ +1.11x faster │
│ QQuery 9     │ 105.71 ms │                                              101.77 ms │     no change │
│ QQuery 10    │  63.70 ms │                                               62.73 ms │     no change │
│ QQuery 11    │  19.00 ms │                                               18.50 ms │     no change │
│ QQuery 12    │  53.13 ms │                                               51.61 ms │     no change │
│ QQuery 13    │  44.97 ms │                                               48.78 ms │  1.08x slower │
│ QQuery 14    │  14.27 ms │                                               13.56 ms │     no change │
│ QQuery 15    │  24.14 ms │                                               24.30 ms │     no change │
│ QQuery 16    │  25.07 ms │                                               24.75 ms │     no change │
│ QQuery 17    │ 150.74 ms │                                              152.84 ms │     no change │
│ QQuery 18    │ 286.35 ms │                                              281.81 ms │     no change │
│ QQuery 19    │  39.98 ms │                                               38.59 ms │     no change │
│ QQuery 20    │  49.11 ms │                                               49.62 ms │     no change │
│ QQuery 21    │ 325.29 ms │                                              314.30 ms │     no change │
│ QQuery 22    │  17.56 ms │                                               17.61 ms │     no change │
└──────────────┴───────────┴────────────────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                                     ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                                     │ 1800.07ms │
│ Total Time (lia_compute-dyn-filters-only-when-consumer-asks-for-it)   │ 1794.29ms │
│ Average Time (HEAD)                                                   │   81.82ms │
│ Average Time (lia_compute-dyn-filters-only-when-consumer-asks-for-it) │   81.56ms │
│ Queries Faster                                                        │         2 │
│ Queries Slower                                                        │         2 │
│ Queries with No Change                                                │        18 │
│ Queries with Failure                                                  │         0 │
└───────────────────────────────────────────────────────────────────────┴───────────┘

adriangb

I think this might be because we clone the DynamicFilterPhysicalExpr directly in the execute phase of DataSourceExec (at least for this kind of node)

In any case, seems like the strong_count approach might hit some edge cases here...

This is interesting. Could it indicate a bug? I don't think we downcast into DynamicFilterPhysicalExpr anywhere in the Parquet specific code. So if we keep a reference it must be via Arc::clone.

I ask because I think this approach is what we actually want. As per https://github.com/apache/datafusion/pull/18938/files#r2578193421 I'm not even sure this change ends up having a different behavior at runtime?

datafusion/physical-plan/src/joins/hash_join/exec.rs

adriangb · 2025-12-01T18:29:35Z

datafusion/physical-plan/src/joins/hash_join/exec.rs

+            // Only create the dynamic filter if the probe side will actually use it (Exact or Inexact).
+            // If it's Unsupported, don't compute the filter since it won't be used.
+            let will_be_used = !matches!(filter.discriminant, PushedDown::Unsupported);


If this is the case, don't we end up in the same place as Yes/No? I.e. this change only seems helpful if we did something like "only create the filter if the child said Exact".

i think it lets scans say "I can't use this at all" (Unsupported), so we can skip computing filters entirely if stats prunning is not supported either - the Yes/No system had no way to express that: if we had stats pruning with the filters, it would fall under the No discriminant, but we would still need them. I'm also thinking: if we know a scan will only use the filter for stats pruning (Inexact), maybe would it make sense to compute just the min/max bounds instead of both IN LIST and bounds?

I think these are both good reasons to make this API change. Maybe we can justify the change by doing what you are saying and skipping pushing down the entire hash table if it won't be used? But then again that is basically free... and where do bloom filters fall into this calculation? i.e. bloom filter pruning only works if HashJoinExec produces an InListExpr and the scan node is a Parquet node (or other format that supports bloom filters). That seems like an awful lot of complex coordination between the producer/consumer that is specific to each file (some may have bloom filters some don't) and the filter being pushed down (min/max vs. InList vs. Hash Table).

Yeah, it seems like it adds complexity to the “which filter to use” decision. Maybe the only clear use case is:

it lets scans say "I can't use this at all" (Unsupported), so we can skip computing filters entirely if stats prunning is not supported eitheri think it lets scans say "I can't use this at all" (Unsupported), so we can skip computing filters entirely if stats prunning is not supported either

I was also thinking, what if we just let consumers communicate what kinds of filters they support, and the producer only adjusts that decision based on memory or row-count limits? Or would that be an anti-pattern? In any case, that still wouldn’t let the producer understand the purpose of the dynamic filters (if for stats or row level filtering)

LiaCastaneda · 2025-12-01T20:13:11Z

I ask because I think this approach is what we actually want.

Yeah, sorry for the confusion -- I switched the approach to exact/inexact/unsupported just as an alternative since it was getting really hard to get is_used fully right. I will open a separate PR for is_used to avoid confusion, I will attempt to solve the bug (or at least find out what's going on) between today and tomorrow.

This is interesting. Could it indicate a bug?

I took a look the other day, and the thing is that for some reason the strong count is not accounting for the leaf node that supposedly also holds a reference to DynamicFilterPhysicalExpr. 🤔

[partition 0] Arc count at execution: 1
[partition 0] Is used: false
[partition 1] Arc count at execution: 2

The 1 -> 2 strong count is actually expected here because the first partition creates the SharedBuildAccumulator which also holds a reference to the DynamicFilterPhysicalExpr, but if the hash join + the leaf keep the ref (which is what we should be seeing) I would have expected to see something like:

[partition 0] Arc count at execution: 2
[partition 0] Is used: true
[partition 1] Arc count at execution: 3

alamb · 2025-12-01T21:30:24Z

run benchmarks

LiaCastaneda · 2025-12-01T21:39:29Z

@alamb I will attempt to get right the other approach (is_used) first, as @adriangb maybe we shouldn't modify the API unless we have an clear need for a third (stat only) discriminant. Lets not merge this yet 😄

alamb · 2025-12-02T12:18:03Z

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing lia/compute-dyn-filters-only-when-consumer-asks-for-it (cb749ad) to fb14d7c diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

alamb · 2025-12-02T12:18:16Z

show benchmark queue

alamb · 2025-12-02T12:18:19Z

🤖 Hi @alamb, you asked to view the benchmark queue (#18938 (comment)).

Job	User	Benchmarks	Comment
`18938_3598974879.sh`	alamb	default	`https://github.com/apache/datafusion/pull/18938#issuecomment-3598974879`
`18972_3600358896.sh`	Dandandan	tpch_mem	`https://github.com/apache/datafusion/pull/18972#issuecomment-3600358896`

alamb · 2025-12-02T13:17:23Z

🤖: Benchmark completed

Details

Comparing HEAD and lia_compute-dyn-filters-only-when-consumer-asks-for-it
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ lia_compute-dyn-filters-only-when-consumer-asks-for-it ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0     │  2767.67 ms │                                             2740.62 ms │ no change │
│ QQuery 1     │  1262.43 ms │                                             1287.83 ms │ no change │
│ QQuery 2     │  2465.37 ms │                                             2422.75 ms │ no change │
│ QQuery 3     │  1172.47 ms │                                             1119.26 ms │ no change │
│ QQuery 4     │  2364.16 ms │                                             2343.32 ms │ no change │
│ QQuery 5     │ 28802.93 ms │                                            28731.60 ms │ no change │
│ QQuery 6     │  4258.92 ms │                                             4279.77 ms │ no change │
│ QQuery 7     │  3588.21 ms │                                             3490.75 ms │ no change │
└──────────────┴─────────────┴────────────────────────────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                                     ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                                     │ 46682.16ms │
│ Total Time (lia_compute-dyn-filters-only-when-consumer-asks-for-it)   │ 46415.89ms │
│ Average Time (HEAD)                                                   │  5835.27ms │
│ Average Time (lia_compute-dyn-filters-only-when-consumer-asks-for-it) │  5801.99ms │
│ Queries Faster                                                        │          0 │
│ Queries Slower                                                        │          0 │
│ Queries with No Change                                                │          8 │
│ Queries with Failure                                                  │          0 │
└───────────────────────────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ lia_compute-dyn-filters-only-when-consumer-asks-for-it ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.55 ms │                                                2.68 ms │     no change │
│ QQuery 1     │    52.02 ms │                                               50.21 ms │     no change │
│ QQuery 2     │   136.76 ms │                                              137.00 ms │     no change │
│ QQuery 3     │   169.62 ms │                                              167.50 ms │     no change │
│ QQuery 4     │  1187.17 ms │                                             1161.64 ms │     no change │
│ QQuery 5     │  1590.44 ms │                                             1546.91 ms │     no change │
│ QQuery 6     │     2.32 ms │                                                2.12 ms │ +1.09x faster │
│ QQuery 7     │    56.54 ms │                                               54.82 ms │     no change │
│ QQuery 8     │  1511.29 ms │                                             1509.88 ms │     no change │
│ QQuery 9     │  1990.88 ms │                                             1955.38 ms │     no change │
│ QQuery 10    │   403.54 ms │                                              409.87 ms │     no change │
│ QQuery 11    │   451.48 ms │                                              457.59 ms │     no change │
│ QQuery 12    │  1470.94 ms │                                             1485.98 ms │     no change │
│ QQuery 13    │  2217.06 ms │                                             2133.54 ms │     no change │
│ QQuery 14    │  1337.52 ms │                                             1329.19 ms │     no change │
│ QQuery 15    │  1307.39 ms │                                             1301.05 ms │     no change │
│ QQuery 16    │  2752.91 ms │                                             2740.93 ms │     no change │
│ QQuery 17    │  2738.79 ms │                                             2701.23 ms │     no change │
│ QQuery 18    │  5113.70 ms │                                             5073.84 ms │     no change │
│ QQuery 19    │   130.34 ms │                                              130.32 ms │     no change │
│ QQuery 20    │  2059.88 ms │                                             2002.18 ms │     no change │
│ QQuery 21    │  2359.42 ms │                                             2310.55 ms │     no change │
│ QQuery 22    │  4012.46 ms │                                             3913.87 ms │     no change │
│ QQuery 23    │ 13467.65 ms │                                            13188.10 ms │     no change │
│ QQuery 24    │   236.69 ms │                                              234.30 ms │     no change │
│ QQuery 25    │   499.69 ms │                                              491.81 ms │     no change │
│ QQuery 26    │   234.39 ms │                                              218.38 ms │ +1.07x faster │
│ QQuery 27    │  2966.96 ms │                                             2828.76 ms │     no change │
│ QQuery 28    │ 23816.85 ms │                                            23563.87 ms │     no change │
│ QQuery 29    │   943.64 ms │                                              947.57 ms │     no change │
│ QQuery 30    │  1386.29 ms │                                             1399.26 ms │     no change │
│ QQuery 31    │  1412.73 ms │                                             1418.85 ms │     no change │
│ QQuery 32    │  4623.55 ms │                                             4803.31 ms │     no change │
│ QQuery 33    │  5898.12 ms │                                             6005.81 ms │     no change │
│ QQuery 34    │  6020.30 ms │                                             6059.38 ms │     no change │
│ QQuery 35    │  1960.47 ms │                                             1949.46 ms │     no change │
│ QQuery 36    │   122.08 ms │                                              121.00 ms │     no change │
│ QQuery 37    │    51.18 ms │                                               55.14 ms │  1.08x slower │
│ QQuery 38    │   120.04 ms │                                              122.63 ms │     no change │
│ QQuery 39    │   198.46 ms │                                              199.93 ms │     no change │
│ QQuery 40    │    45.03 ms │                                               43.30 ms │     no change │
│ QQuery 41    │    41.59 ms │                                               39.01 ms │ +1.07x faster │
│ QQuery 42    │    33.88 ms │                                               32.37 ms │     no change │
└──────────────┴─────────────┴────────────────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                                     ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                                     │ 97134.62ms │
│ Total Time (lia_compute-dyn-filters-only-when-consumer-asks-for-it)   │ 96300.53ms │
│ Average Time (HEAD)                                                   │  2258.94ms │
│ Average Time (lia_compute-dyn-filters-only-when-consumer-asks-for-it) │  2239.55ms │
│ Queries Faster                                                        │          3 │
│ Queries Slower                                                        │          1 │
│ Queries with No Change                                                │         39 │
│ Queries with Failure                                                  │          0 │
└───────────────────────────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ lia_compute-dyn-filters-only-when-consumer-asks-for-it ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 131.73 ms │                                              140.41 ms │  1.07x slower │
│ QQuery 2     │  27.82 ms │                                               28.02 ms │     no change │
│ QQuery 3     │  39.56 ms │                                               35.74 ms │ +1.11x faster │
│ QQuery 4     │  29.95 ms │                                               29.67 ms │     no change │
│ QQuery 5     │  89.46 ms │                                               87.23 ms │     no change │
│ QQuery 6     │  19.88 ms │                                               19.53 ms │     no change │
│ QQuery 7     │ 228.13 ms │                                              228.75 ms │     no change │
│ QQuery 8     │  36.75 ms │                                               35.39 ms │     no change │
│ QQuery 9     │ 109.11 ms │                                              104.25 ms │     no change │
│ QQuery 10    │  68.04 ms │                                               63.60 ms │ +1.07x faster │
│ QQuery 11    │  18.48 ms │                                               18.83 ms │     no change │
│ QQuery 12    │  51.29 ms │                                               51.85 ms │     no change │
│ QQuery 13    │  47.86 ms │                                               48.06 ms │     no change │
│ QQuery 14    │  14.17 ms │                                               13.95 ms │     no change │
│ QQuery 15    │  24.99 ms │                                               24.89 ms │     no change │
│ QQuery 16    │  25.18 ms │                                               25.17 ms │     no change │
│ QQuery 17    │ 155.49 ms │                                              156.26 ms │     no change │
│ QQuery 18    │ 285.00 ms │                                              277.43 ms │     no change │
│ QQuery 19    │  39.68 ms │                                               37.68 ms │ +1.05x faster │
│ QQuery 20    │  51.35 ms │                                               50.76 ms │     no change │
│ QQuery 21    │ 330.72 ms │                                              333.45 ms │     no change │
│ QQuery 22    │  18.13 ms │                                               17.88 ms │     no change │
└──────────────┴───────────┴────────────────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                                     ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                                     │ 1842.76ms │
│ Total Time (lia_compute-dyn-filters-only-when-consumer-asks-for-it)   │ 1828.79ms │
│ Average Time (HEAD)                                                   │   83.76ms │
│ Average Time (lia_compute-dyn-filters-only-when-consumer-asks-for-it) │   83.13ms │
│ Queries Faster                                                        │         3 │
│ Queries Slower                                                        │         1 │
│ Queries with No Change                                                │        18 │
│ Queries with Failure                                                  │         0 │
└───────────────────────────────────────────────────────────────────────┴───────────┘

LiaCastaneda · 2025-12-03T17:24:41Z

I didn't had time to get back to this :( I’ll be on vacation until next next week, so I won’t be able to look at this again until then. If anyone wants to take over feel free, otherwise, I’ll continue when I’m back!

github-actions bot added physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate labels Nov 26, 2025

Compute Dynamic Filters only when a consumer supports them

3b31893

LiaCastaneda force-pushed the lia/compute-dyn-filters-only-when-consumer-asks-for-it branch from ebc401a to 3b31893 Compare November 26, 2025 10:40

LiaCastaneda commented Nov 26, 2025

View reviewed changes

adriangb approved these changes Nov 26, 2025

View reviewed changes

github-actions bot added optimizer Optimizer rules core Core DataFusion crate datasource Changes to the datasource crate and removed physical-expr Changes to the physical-expr crates labels Nov 27, 2025

LiaCastaneda force-pushed the lia/compute-dyn-filters-only-when-consumer-asks-for-it branch from 62688c7 to 7664b0d Compare November 27, 2025 16:57

Change Approach to Exact/Inexact/Unsupported

8c467c9

LiaCastaneda force-pushed the lia/compute-dyn-filters-only-when-consumer-asks-for-it branch from 7664b0d to 8c467c9 Compare November 27, 2025 17:09

Add test

5195faa

LiaCastaneda mentioned this pull request Nov 27, 2025

Make filter pushdown API more precise for their purpose #18856

Open

LiaCastaneda marked this pull request as ready for review November 27, 2025 17:45

LiaCastaneda added 2 commits November 27, 2025 19:11

Merge remote-tracking branch 'upstreamDF/main' into lia/compute-dyn-f…

335d5f5

…ilters-only-when-consumer-asks-for-it

Fix clippy warning

728fde9

github-actions bot added the execution Related to the execution crate label Nov 27, 2025

alamb added the api change Changes the API exposed to users of the crate label Dec 1, 2025

alamb approved these changes Dec 1, 2025

View reviewed changes

alamb mentioned this pull request Dec 1, 2025

TEST: enable pushdown_filters and reorder_filters by default #18873

Draft

adriangb reviewed Dec 1, 2025

View reviewed changes

Remove outdated comment

cb749ad

Compute Dynamic Filters only when a consumer supports them #18938

Are you sure you want to change the base?

Compute Dynamic Filters only when a consumer supports them #18938

Conversation

LiaCastaneda commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adriangb left a comment

Choose a reason for hiding this comment

Uh oh!

LiaCastaneda commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Dec 1, 2025

Uh oh!

alamb commented Dec 1, 2025

Uh oh!

alamb commented Dec 1, 2025

Uh oh!

adriangb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LiaCastaneda commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Dec 1, 2025

Uh oh!

LiaCastaneda commented Dec 1, 2025

Uh oh!

alamb commented Dec 2, 2025

Uh oh!

alamb commented Dec 2, 2025

Uh oh!

alamb commented Dec 2, 2025

Uh oh!

alamb commented Dec 2, 2025

Uh oh!

LiaCastaneda commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LiaCastaneda commented Nov 26, 2025 •

edited

Loading

LiaCastaneda commented Nov 26, 2025 •

edited

Loading

LiaCastaneda commented Dec 1, 2025 •

edited

Loading