Skip to content

Conversation

@WaVEV
Copy link
Collaborator

@WaVEV WaVEV commented Nov 9, 2025

INTPYTHON-824

Summary

This PR refactors the generation of $lookup stages to use the simpler localField / foreignField syntax when possible.
Previously, all joins were built using the let and pipeline form, even for simple one-field equality joins. This change improves readability and allows MongoDB to optimize execution internally for straightforward joins, while still supporting additional filtering logic through let variables when necessary.

Changes in this PR

  • Added logic to detect when a $lookup stage can safely use localField and foreignField instead of let and pipeline.

  • Updated the query compiler to:

    • Use localField / foreignField when the join condition involves a single equality comparison between plain fields.
    • Combine localField / foreignField with a let / pipeline when additional filter expressions are required beyond the join condition.
  • Ensured compatibility with existing $lookup stages in subqueries and nested joins.

  • Added test coverage to confirm behavior parity between both forms.

Test Plan

  • Added new unit tests verifying that:

    • Simple joins now generate $lookup using localField / foreignField.
    • Complex joins continue using the let / pipeline form.
    • Mixed correctly fallback to the pipeline form.
  • Verified all existing tests pass, ensuring backward compatibility.

  • Manual validation with sample queries confirmed that results remain identical before and after the refactor.

Checklist

Checklist for Author

  • Did you update the changelog (if necessary)?
  • Is the intention of the code captured in relevant tests?
  • If there are new TODOs, has a related JIRA ticket been created? No

Checklist for Reviewer @Jibola @aclark4life @timgraham

  • Does the title of the PR reference a JIRA Ticket?
  • Do you fully understand the implementation? (Would you be comfortable explaining how this code works to someone else?)
  • Have you checked for spelling & grammar errors?
  • Is all relevant documentation (README or docstring) updated?

Focus Areas for Reviewer

  • Confirm that the mixed $lookup cases (using both localField and let) are generated correctly and still behave as expected.
  • Ensure that the logic distinguishing foreign key joins from filter conditions works as intended inside the loop that builds join conditions.
  • Validate that all pre-existing $lookup scenarios, including nested and subquery joins, remain fully supported.

@WaVEV WaVEV force-pushed the INTPYTHON-824-Refactor-lookup-stages-to-use-localField-and-foreignField-when-applicable branch from c57b03f to 8330b8f Compare November 9, 2025 21:02
@aclark4life
Copy link
Collaborator

@WaVEV Awesome! I merged this with the QE PR for testing. With a separate encrypted connection, I get these 2 errors:

======================================================================
FAIL: test_join (encryption_.test_fields.QueryTests.test_join)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/alex.clark/Developer/django-mongodb-cli/src/django-mongodb-backend/tests/encryption_/test_fields.py", line 320, in test_join
    with self.assertRaisesMessage(DatabaseError, msg):
         ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex.clark/.pyenv/versions/3.13.9/lib/python3.13/contextlib.py", line 148, in __exit__
    next(self.gen)
    ~~~~^^^^^^^^^^
AssertionError: DatabaseError not raised

----------------------------------------------------------------------
(0.001) db.encryption__book.aggregate([{'$lookup': {'from': 'encryption__author', 'pipeline': [{'$match': {'name': 'xxx'}}], 'as': 'encryption__author', 'localField': 'author_id', 'foreignField': '_id'}}, {'$unwind': '$encryption__author'}, {'$match': {'encryption__author.name': 'xxx'}}]);

ALIAS=ENCRYPTED
======================================================================
FAIL: test_select_related (encryption_.test_fields.QueryTests.test_select_related)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/alex.clark/Developer/django-mongodb-cli/src/django-mongodb-backend/tests/encryption_/test_fields.py", line 333, in test_select_related
    with self.assertRaisesMessage(DatabaseError, msg):
         ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex.clark/.pyenv/versions/3.13.9/lib/python3.13/contextlib.py", line 148, in __exit__
    next(self.gen)
    ~~~~^^^^^^^^^^
AssertionError: DatabaseError not raised

----------------------------------------------------------------------
(0.001) db.encryption__book.aggregate([{'$lookup': {'from': 'encryption__author', 'pipeline': [], 'as': 'encryption__author', 'localField': 'author_id', 'foreignField': '_id'}}, {'$unwind': '$encryption__author'}]);

ALIAS=ENCRYPTED
----------------------------------------------------------------------
Ran 55 tests in 36.017s

FAILED (failures=2)

With a single encrypted connection I get these errors:


Creating example.com Site object
Traceback (most recent call last):
  File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 129, in _wrap_encryption_errors
    yield
  File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 472, in encrypt
    encrypted_cmd = self._auto_encrypter.encrypt(database, encoded_cmd)
  File "/Users/alex.clark/Developer/django-mongodb-cli/src/libmongocrypt/bindings/python/pymongocrypt/synchronous/auto_encrypter.py", line 44, in encrypt
    return run_state_machine(ctx, self.callback)
  File "/Users/alex.clark/Developer/django-mongodb-cli/src/libmongocrypt/bindings/python/pymongocrypt/synchronous/state_machine.py", line 136, in run_state_machine
    result = callback.mark_command(ctx.database, mongocryptd_cmd)
  File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 292, in mark_command
    res = self.mongocryptd_client[database].command(
        inflated_cmd, codec_options=DEFAULT_RAW_BSON_OPTIONS
    )
  File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/_csot.py", line 125, in csot_wrapper
    return func(self, *args, **kwargs)
  File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/database.py", line 938, in command
    return self._command(
           ~~~~~~~~~~~~~^
        connection,
        ^^^^^^^^^^^
    ...<7 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/database.py", line 778, in _command
    return conn.command(
           ~~~~~~~~~~~~^
        self._name,
        ^^^^^^^^^^^
    ...<8 lines>...
        client=self._client,
        ^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/helpers.py", line 45, in inner
    return func(*args, **kwargs)
  File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/pool.py", line 413, in command
    return command(
        self,
    ...<20 lines>...
        write_concern=write_concern,
    )
  File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/network.py", line 212, in command
    helpers_shared._check_command_response(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        response_doc,
        ^^^^^^^^^^^^^
    ...<2 lines>...
        parse_write_concern_error=parse_write_concern_error,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/helpers_shared.py", line 284, in _check_command_response
    raise OperationFailure(errmsg, code, response, max_wire_version)
pymongo.errors.OperationFailure: Pipelines in updates are not allowed to modify encrypted fields or add fields which are not present in the schema, full error: RawBSONDocument(b'\xb5\x00\x00\x00\x01ok\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02errmsg\x00r\x00\x00\x00Pipelines in updates are not allowed to modify encrypted fields or add fields which are not present in the schema\x00\x10code\x00\xaay\x00\x00\x02codeName\x00\x0e\x00\x00\x00Location31146\x00\x00', codec_options=CodecOptions(document_class=<class 'bson.raw_bson.RawBSONDocument'>, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None), datetime_conversion=DatetimeConversion.DATETIME))

The above exception was the direct cause of the following exception:
...

@timgraham
Copy link
Collaborator

The first two errors about "DatabaseError not raised" demonstrate queries that couldn't run before this change but now no longer use $let. I'll have to add new tests with queries that still do.

The error "Pipelines in updates are not allowed to modify encrypted fields or add fields which are not present in the schema" suggests a query that uses order_by(). (There's a list of error messages and the corresponding QuerySet method in the QE design doc.)

@timgraham timgraham changed the title Refactor $lookup stages to use localField and foreignField when applicable INTPYTHON-824 Refactor $lookup stages to use localField and foreignField Nov 13, 2025
@WaVEV WaVEV marked this pull request as ready for review November 13, 2025 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants