Skip to content

Conversation

@tanujnay112
Copy link
Contributor

@tanujnay112 tanujnay112 commented Dec 5, 2025

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • Allow get_statistics to take in multiple filter keys
  • New functionality
    • ...

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

@github-actions
Copy link

github-actions bot commented Dec 5, 2025

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Copy link
Contributor Author

tanujnay112 commented Dec 5, 2025

@propel-code-bot
Copy link
Contributor

propel-code-bot bot commented Dec 5, 2025

Enable multi-key filtering in get_statistics with guardrails

This PR extends get_statistics in chromadb/utils/statistics.py to accept one or many filter keys and enforces a hard cap of 30 keys to avoid oversized $in queries. Callers now pass the keys argument, which is normalized via maybe_cast_one_to_many, and an informative ValueError is raised when the limit is exceeded.
A new test in chromadb/test/distributed/test_statistics_wrapper.py covers the over-limit scenario and updates existing key-filter tests to use the list-form API, ensuring the revised interface continues to return summaries alongside filtered statistics.

Key Changes

• Updated get_statistics signature to accept keys: Optional[OneOrMany[str]] and convert inputs with maybe_cast_one_to_many.
• Added validation that raises ValueError when more than 30 keys are supplied, including user-facing guidance in the error message.
• Adjusted key-filter integration tests to pass keys=[...] and added test_statistics_wrapper_key_filter_too_many_keys that asserts the new guard using pytest.raises.

Affected Areas

chromadb/utils/statistics.py
chromadb/test/distributed/test_statistics_wrapper.py

This summary was automatically generated by @propel-code-bot

Copy link
Contributor

@kylediaz kylediaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's gooo

collection: "Collection", stats_collection_name: str, key: Optional[str] = None
collection: "Collection",
stats_collection_name: str,
keys: Optional[List[str]] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to OneOrMany imo, always annoying to single list an input

Comment on lines 123 to 127
def get_statistics(
collection: "Collection", stats_collection_name: str, key: Optional[str] = None
collection: "Collection",
stats_collection_name: str,
keys: Optional[OneOrMany[str]] = None,
) -> Dict[str, Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Maintainability] Renaming the key parameter to keys is a breaking API change for any callers using keyword arguments. While the new name is more descriptive, this could be made backward-compatible to avoid breaking existing user code and provide a smoother transition.

If this breaking change is intentional, it should be clearly documented in the project's release notes. If not, you could consider handling the old key parameter with a deprecation warning.

Here's an example of how you could support both for a transition period by modifying the function signature and adding logic to the function body:

Signature:

def get_statistics(
    collection: "Collection",
    stats_collection_name: str,
    keys: Optional[OneOrMany[str]] = None,
    key: Optional[str] = None,  # Deprecated
) -> Dict[str, Any]:

Function Body Logic:

    if key is not None:
        import warnings
        warnings.warn(
            "The 'key' parameter is deprecated and will be removed in a future version. Please use 'keys' instead.",
            DeprecationWarning,
            stacklevel=2,
        )
        if keys is not None:
            raise ValueError("Cannot provide both 'key' and 'keys' arguments.")
        keys = key
Context for Agents
Renaming the `key` parameter to `keys` is a breaking API change for any callers using keyword arguments. While the new name is more descriptive, this could be made backward-compatible to avoid breaking existing user code and provide a smoother transition.

If this breaking change is intentional, it should be clearly documented in the project's release notes. If not, you could consider handling the old `key` parameter with a deprecation warning.

Here's an example of how you could support both for a transition period by modifying the function signature and adding logic to the function body:

**Signature:**
```python
def get_statistics(
    collection: "Collection",
    stats_collection_name: str,
    keys: Optional[OneOrMany[str]] = None,
    key: Optional[str] = None,  # Deprecated
) -> Dict[str, Any]:
```

**Function Body Logic:**
```python
    if key is not None:
        import warnings
        warnings.warn(
            "The 'key' parameter is deprecated and will be removed in a future version. Please use 'keys' instead.",
            DeprecationWarning,
            stacklevel=2,
        )
        if keys is not None:
            raise ValueError("Cannot provide both 'key' and 'keys' arguments.")
        keys = key
```

File: chromadb/utils/statistics.py
Line: 127

keys_list = maybe_cast_one_to_many(keys)

# Validate keys count to avoid issues with large $in queries
MAX_KEYS = 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommended

[Maintainability] Consider defining MAX_KEYS as a module-level constant (e.g., at the top of the file) instead of inside the function. This improves readability, makes it clear that this is a fixed configuration value, and avoids re-declaration on every function call.

Context for Agents
Consider defining `MAX_KEYS` as a module-level constant (e.g., at the top of the file) instead of inside the function. This improves readability, makes it clear that this is a fixed configuration value, and avoids re-declaration on every function call.

File: chromadb/utils/statistics.py
Line: 188

Copy link
Contributor Author

tanujnay112 commented Dec 9, 2025

Merge activity

  • Dec 9, 2:51 AM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Dec 9, 2:53 AM UTC: Graphite rebased this pull request as part of a merge.
  • Dec 9, 3:18 AM UTC: Graphite couldn't merge this PR because it was not satisfying all requirements (Failed CI: 'all-required-pr-checks-passed', 'Python tests / test-windows-smoke (3.9)').

@tanujnay112 tanujnay112 changed the base branch from wrapper_change to graphite-base/5963 December 9, 2025 02:51
@tanujnay112 tanujnay112 changed the base branch from graphite-base/5963 to main December 9, 2025 02:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants