-
Notifications
You must be signed in to change notification settings - Fork 177
LMDB Store: working on new ID based join iterator #5549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds timeout protections to test cases and implements a safeguard against infinite query result iterations in the LMDB store. The changes focus on preventing tests from hanging indefinitely by adding explicit timeouts and result count limits.
Key Changes
- Added 5-second timeouts to 28 test methods across snapshot and serializable isolation level tests
- Added 10-second timeout to one delete-insert test
- Implemented a safety check in the
eval()method to prevent collecting more than 1 million results - Added timeout handling with explicit time units to
CountDownLatch.await()calls in concurrency tests
Reviewed Changes
Copilot reviewed 95 out of 96 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
SnapshotTest.java |
Added 5-second timeout annotations to 28 test methods |
SerializableTest.java |
Added 5-second timeout annotations to 19 test methods |
DeleteInsertTest.java |
Added 10-second timeout annotation to test method |
IsolationLevelTest.java |
Replaced indefinite await() calls with 10-second timeouts on CountDownLatches |
DeadLockTest.java |
Added TimeUnit import and 10-second timeouts to CountDownLatch waits |
SparqlOrderByTest.java |
Added class-level 10-second timeout annotation |
pom.xml |
Updated jacoco-maven-plugin version from 0.8.13 to 0.8.14 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (list.size() > 1000000) { | ||
| throw new RuntimeException("Too many results: " + list.size()); | ||
| } |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message includes list.size() which will always be 1000001 when the exception is thrown due to the condition checking > 1000000. Consider using a constant for the limit value to keep the message and condition in sync, or simplify the message to just state 'Too many results (exceeded limit of 1000000)'.
| try (RepositoryConnection write = store.getConnection()) { | ||
| start.countDown(); | ||
| start.await(); | ||
| start.await(10, TimeUnit.SECONDS); |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timeout value of 10 seconds is hardcoded across multiple await() calls in this file. Consider extracting this as a named constant (e.g., private static final long LATCH_TIMEOUT_SECONDS = 10;) to improve maintainability and make it easier to adjust if needed.
| Varint.writeUnsigned(bb, pred); | ||
| } | ||
|
|
||
| void toDupValue(ByteBuffer bb, long subj, long pred, long obj, long context) { |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter 'subj' is never used.
| Varint.writeUnsigned(bb, pred); | ||
| } | ||
|
|
||
| void toDupValue(ByteBuffer bb, long subj, long pred, long obj, long context) { |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter 'pred' is never used.
| private final QueryEvaluationStep fallbackStep; | ||
| private final boolean fallbackImmediately; | ||
|
|
||
| public LmdbIdJoinQueryEvaluationStep(EvaluationStrategy strategy, Join join, QueryEvaluationContext context, |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter 'strategy' is never used.
| } | ||
| } | ||
| for (String v : right.getVariableNames()) { | ||
| Integer mask = right.getPositionsMask(v); |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable 'mask' is only assigned values of primitive type and is never 'null', but it is declared with the boxed type 'Integer'.
| Integer mask = right.getPositionsMask(v); | |
| int mask = right.getPositionsMask(v); |
|
|
||
| ValueFactory vf = SimpleValueFactory.getInstance(); | ||
| IRI alice = vf.createIRI(NS, "alice"); | ||
| IRI bob = vf.createIRI(NS, "bob"); |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable 'IRI bob' is never read.
| * The higher the score, the better the index is suited for matching the pattern. Lowest score is 0, which means | ||
| * that the index will perform a sequential scan. | ||
| */ | ||
| public int getPatternScore(long subj, long pred, long obj, long context) { |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method overrides DupIndex.getPatternScore; it is advisable to add an Override annotation.
| return dupsortEnabled; | ||
| } | ||
|
|
||
| public int getDupDB(boolean explicit) { |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method overrides DupIndex.getDupDB; it is advisable to add an Override annotation.
| this(fieldSeq, false); | ||
| } | ||
|
|
||
| public char[] getFieldSeq() { |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method overrides DupIndex.getFieldSeq; it is advisable to add an Override annotation.
|
@kenwenzel I’m working on an ID-based join for the LMDB store where we don’t need to create LmdbValue objects or binding sets. There is something not quite right though, causing a bunch of tests that test SNAPSHOT and SERIALIZABLE isolation to fail. I think that ID-based joins won’t work for SERIALIZABLE isolation, but I don’t quite understand what makes SNAPSHOT fail. |
|
@hmottestad This is a great idea. Can you please explain why you would expect ID-based joins to fail for SERIALIZABLE? What is different in this case from joining on the value objects? |
|
@hmottestad I think in both cases the triples are only in memory - managed by SnapshotSailStore. |
|
I thought that SERIALIZABLE relied on reporting which statement patterns that have been read so that you can monitor for concurrent writes that would potentially invalidate your reads. I don't really want to have to reimplement that using IDs, so I would much rather prefer to rely on the existing solution that uses Values. |
Yes, I understand. Nevertheless, something like an ID service for query evaluation (a query-local map with long IDs and weak keys) could help to get rid of value objects at all. |
BeforeAfterQueryBenchmark.query_distinct_predicates is actually a bit slower. Otherwise everything else is a lot faster. I still haven't fixed SNAPSHOT isolation. |
e73f154 to
9852af0
Compare
|
The numbers are looking really good :-) |
|
@hmottestad Maybe you also like to check out #5558 I've played a bit with DUPSORT and have the following results:
I've seen that DUPFIXED allows us to fetch multiple values but due to the "explosion" of database size I would not recommend to go that route. QLever seems to be pretty fast when querying indexes. Maybe we can draw some inspiration from it: |
|
@hmottestad For a DUPSORT-based implementation see also #5558 Is it possible that you remove anything related to DUPSORT from this PR and we implement it somewhere else? |
|
@hmottestad This commit here also implements pooling for cursors and internal state of LmdbRecordIterator: It has no problems with different isolation levels as cursors are closed if they get invalid. |
|
I got a decent performance improvement from pooling cursors. Did you benchmark your code? |
Not fully, but for the complexQuery benchmark I have seen something like 10-20%. |
GitHub issue resolved: #
Briefly describe the changes proposed in this PR:
PR Author Checklist (see the contributor guidelines for more details):
mvn process-resourcesto format from the command line)