Skip to content

Conversation

@kenwenzel
Copy link
Contributor

@kenwenzel kenwenzel commented Nov 10, 2025

GitHub issue resolved: #4218

Briefly describe the changes proposed in this PR:


PR Author Checklist (see the contributor guidelines for more details):

  • my pull request is self-contained
  • I've added tests for the changes I made
  • I've applied code formatting (you can use mvn process-resources to format from the command line)
  • I've squashed my commits where necessary
  • every commit message starts with the issue number (GH-xxxx) followed by a meaningful description of the change

@kenwenzel
Copy link
Contributor Author

kenwenzel commented Nov 10, 2025

I've played a bit with DUPSORT and have the following results:

  • database is a bit smaller if only DUPSORT with variable values is used
  • database is way larger if DUPFIXED is used (at least around 80% more)
  • benchmarks are a bit slower due to matching keys and values

Peformance with DUPSORT:

Benchmark                                                             (numThreads)  Mode  Cnt     Score      Error  Units
QueryBenchmark.complexQuery                                                    N/A  avgt    3     8.351 ±    2.485  ms/op
QueryBenchmark.different_datasets_with_similar_distributions                   N/A  avgt    3     4.911 ±   10.699  ms/op
QueryBenchmark.groupByQuery                                                    N/A  avgt    3     2.189 ±    0.573  ms/op
QueryBenchmark.long_chain                                                      N/A  avgt    3  1459.867 ± 1600.677  ms/op
QueryBenchmark.lots_of_optional                                                N/A  avgt    3   408.424 ±  170.489  ms/op
QueryBenchmark.minus                                                           N/A  avgt    3    19.063 ±   30.869  ms/op
QueryBenchmark.multiple_sub_select                                             N/A  avgt    3   101.625 ±   30.560  ms/op
QueryBenchmark.nested_optionals                                                N/A  avgt    3   292.365 ±   59.615  ms/op
QueryBenchmark.optional_lhs_filter                                             N/A  avgt    3    69.231 ±    7.340  ms/op
QueryBenchmark.optional_rhs_filter                                             N/A  avgt    3   106.239 ±   50.879  ms/op
QueryBenchmark.ordered_union_limit                                             N/A  avgt    3   177.760 ±  602.567  ms/op
QueryBenchmark.pathExpressionQuery1                                            N/A  avgt    3    43.933 ±    6.755  ms/op
QueryBenchmark.pathExpressionQuery2                                            N/A  avgt    3     6.644 ±    4.060  ms/op
QueryBenchmark.query_distinct_predicates                                       N/A  avgt    3   131.304 ±   82.359  ms/op
QueryBenchmark.simple_filter_not                                               N/A  avgt    3    11.713 ±    2.413  ms/op
QueryBenchmark.sub_select                                                      N/A  avgt    3   166.433 ±   36.140  ms/op
QueryBenchmarkFoaf.groupByCount                                                N/A  avgt    5  1203.649 ±   38.835  ms/op
QueryBenchmarkFoaf.groupByCountSorted                                          N/A  avgt    5  1169.111 ±   57.376  ms/op
QueryBenchmarkFoaf.personsAndFriends                                           N/A  avgt    5   299.565 ±   10.615  ms/op
QueryBenchmarkParallel.complexQuery                                              4  avgt    3    21.410 ±   12.238  ms/op
QueryBenchmarkParallel.different_datasets_with_similar_distributions             4  avgt    3    11.305 ±    4.164  ms/op
QueryBenchmarkParallel.groupByQuery                                              4  avgt    3     8.376 ±    1.347  ms/op
QueryBenchmarkParallel.lots_of_optional                                          4  avgt    3  1005.969 ±  160.791  ms/op

Performance on develop:

Benchmark                                                             (numThreads)  Mode  Cnt     Score     Error  Units
QueryBenchmark.complexQuery                                                    N/A  avgt    3     7.803 ±   2.515  ms/op
QueryBenchmark.different_datasets_with_similar_distributions                   N/A  avgt    3     4.154 ±   1.314  ms/op
QueryBenchmark.groupByQuery                                                    N/A  avgt    3     1.868 ±   1.031  ms/op
QueryBenchmark.long_chain                                                      N/A  avgt    3  1257.513 ± 272.696  ms/op
QueryBenchmark.lots_of_optional                                                N/A  avgt    3   392.749 ±  93.742  ms/op
QueryBenchmark.minus                                                           N/A  avgt    3    17.783 ±  15.425  ms/op
QueryBenchmark.multiple_sub_select                                             N/A  avgt    3    96.446 ±  17.314  ms/op
QueryBenchmark.nested_optionals                                                N/A  avgt    3   273.396 ±  49.543  ms/op
QueryBenchmark.optional_lhs_filter                                             N/A  avgt    3    66.929 ±  25.232  ms/op
QueryBenchmark.optional_rhs_filter                                             N/A  avgt    3    92.581 ±  28.442  ms/op
QueryBenchmark.ordered_union_limit                                             N/A  avgt    3   134.559 ±  20.988  ms/op
QueryBenchmark.pathExpressionQuery1                                            N/A  avgt    3    38.629 ±  13.478  ms/op
QueryBenchmark.pathExpressionQuery2                                            N/A  avgt    3     6.942 ±   3.485  ms/op
QueryBenchmark.query_distinct_predicates                                       N/A  avgt    3   155.704 ± 396.561  ms/op
QueryBenchmark.simple_filter_not                                               N/A  avgt    3    11.171 ±   2.426  ms/op
QueryBenchmark.sub_select                                                      N/A  avgt    3   131.768 ±  44.619  ms/op
QueryBenchmarkFoaf.groupByCount                                                N/A  avgt    5  1174.969 ±  57.989  ms/op
QueryBenchmarkFoaf.groupByCountSorted                                          N/A  avgt    5  1091.847 ± 100.255  ms/op
QueryBenchmarkFoaf.personsAndFriends                                           N/A  avgt    5   300.028 ±  31.409  ms/op
QueryBenchmarkParallel.complexQuery                                              4  avgt    3    19.595 ±  11.311  ms/op
QueryBenchmarkParallel.different_datasets_with_similar_distributions             4  avgt    3    10.852 ±   1.590  ms/op
QueryBenchmarkParallel.groupByQuery                                              4  avgt    3     7.837 ±   0.967  ms/op
QueryBenchmarkParallel.lots_of_optional                                          4  avgt    3   965.467 ±  61.832  ms/op

@kenwenzel kenwenzel force-pushed the lmdb-dupsort branch 2 times, most recently from 6f953d9 to a7fa590 Compare November 11, 2025 07:41
@hmottestad
Copy link
Contributor

I tested DUPSORT in my branch and implemented it specifically for the SP** index. Seemed to make som small performance improvements, but not sure it was much.

@kenwenzel
Copy link
Contributor Author

@hmottestad My primary goal is to reduce the DB size on disk. I've also experimented with Morton (Z-oder) codes to only have one index. But this performed bad.
Do you think that DUPFIXED with MDB_NEXT_MULTIPLE is worth considering?

@kenwenzel kenwenzel force-pushed the lmdb-dupsort branch 9 times, most recently from 09b18cf to d8e4017 Compare November 18, 2025 20:18
@kenwenzel kenwenzel force-pushed the lmdb-dupsort branch 2 times, most recently from 5033f2a to 4b8ec4b Compare November 20, 2025 16:24
@kenwenzel kenwenzel force-pushed the lmdb-dupsort branch 4 times, most recently from 5dfe593 to eabd5ca Compare November 25, 2025 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants