-
Notifications
You must be signed in to change notification settings - Fork 34
Implement Validation for Non-Queryable Attributes in Filtering #532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…lds in QueryablesCache
| | `USE_DATETIME_NANOS` | Enables nanosecond precision handling for `datetime` field searches as per the `date_nanos` type. When `False`, it uses 3 millisecond precision as per the type `date`. | `true` | Optional | | ||
| | `EXCLUDED_FROM_QUERYABLES` | Comma-separated list of fully qualified field names to exclude from the queryables endpoint and filtering. Use full paths like `properties.auth:schemes,properties.storage:schemes`. Excluded fields and their nested children will not be exposed in queryables. | None | Optional | | ||
| | `EXCLUDED_FROM_ITEMS` | Specifies fields to exclude from STAC item responses. Supports comma-separated field names and dot notation for nested fields (e.g., `private_data,properties.confidential,assets.internal`). | `None` | Optional | | ||
| | `VALIDATE_QUERYABLES` | Enable validation of query parameters against the collection's queryables. If set to `true`, the API will reject queries containing fields that are not defined in the collection's queryables. | `false` | Optional | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bountx How does this interact with the EXCLUDED_FROM_QUERYABLES environment variable? Should both these env vars be combined into one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did miss this in my implementation so I will fix it, but these two variables serve a different purpose: VALIDATE_QUERYABLES should validate on set of all queryables and EXCLUDED_FROM_QUERYABLES can be used to make this set of all queryables smaller.
Their interaction when both used should look something along:
Cache of all queryables -> Removes from this set all excluded queryables -> Validates on that set
If you had something other in mind let me know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonhealy1 Fixed this interaction to work the way I mentioned beforehand.
README.md
Outdated
| - If a request contains a query parameter or filter field that is not in the list of allowed queryables, the API returns a `400 Bad Request` error with a message indicating the invalid field(s). | ||
| - The cache is automatically refreshed based on the `QUERYABLES_CACHE_TTL` setting. | ||
|
|
||
| This feature helps prevent queries on unindexed fields which could lead to poor performance or unexpected results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be wrong but I think Opensearch automatically indexes everything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Poor wording on my side. Unindexed fields = nonexistant fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed wording
README.md
Outdated
| | `STAC_INDEX_ASSETS` | Controls if Assets are indexed when added to Elasticsearch/Opensearch. This allows asset fields to be included in search queries. | `false` | Optional | | ||
| | `USE_DATETIME` | Configures the datetime search behavior in SFEOS. When enabled, searches both datetime field and falls back to start_datetime/end_datetime range for items with null datetime. When disabled, searches only by start_datetime/end_datetime range. | `true` | Optional | | ||
| | `USE_DATETIME_NANOS` | Enables nanosecond precision handling for `datetime` field searches as per the `date_nanos` type. When `False`, it uses 3 millisecond precision as per the type `date`. | `true` | Optional | | ||
| | `EXCLUDED_FROM_QUERYABLES` | Comma-separated list of fully qualified field names to exclude from the queryables endpoint and filtering. Use full paths like `properties.auth:schemes,properties.storage:schemes`. Excluded fields and their nested children will not be exposed in queryables. | None | Optional | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a line here explaining the expected behavior if VALIDATE_QUERYABLES is also set.
| get_properties_from_cql2_filter, | ||
| initialize_queryables_cache, | ||
| validate_queryables, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are importing stac_fastapi.sfeos_helpers into stac_fastapi.core. The Core library is a generic abstraction and must not depend on specific implementation helpers. This breaks the modularity of the library.
Query validation is a generic feature that belongs in Core. However, we cannot import it from sfeos_helpers.
- Move to Core Please move queryables.py (the Cache and Validator logic) out of stac_fastapi/sfeos_helpers/ and into stac_fastapi/core/.
Once the code lives in Core, CoreClient can import it safely without creating a dependency on the Elasticsearch/OpenSearch specific package. The Validator should rely strictly on BaseDatabaseLogic interfaces. (Note: You will likely need to add get_queryables_mapping as an abstract method to BaseDatabaseLogic so the validator can call it generically).
- Remove Global State Please remove the global variable (_queryables_cache_instance). Instead, instantiate QueryablesCache directly inside CoreClient.init as self.queryables_cache. This resolves the threading and testing fragility issues inherent in global state.
| ```bash | ||
| VALIDATE_QUERYABLES=true | ||
| QUERYABLES_CACHE_TTL=3600 # Optional, defaults to 3600 seconds (1 hour) | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this seem like a long default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what my team suggested at first as the cache query for queryable parameters doesn't seem that costly. I'll rediscuss that with them though in detail.
|
|
||
|
|
||
| _queryables_cache_instance: Optional[QueryablesCache] = None | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above. Please remove _queryables_cache_instance
class CoreClient(AsyncBaseCoreClient):
def __init__(self, database: BaseDatabaseLogic, ...):
self.database = database
# Initialize the cache as an instance attribute
# This solves the threading and testing fragility issues
self.queryables_cache = QueryablesCache(database)```
|
|
||
| class TestGlobalFunctions: | ||
| @pytest.fixture(autouse=True) | ||
| def reset_global_cache(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to remove this fixture after the other changes have been made.
|
Related Issue(s):
Description:
Created query validation to return bad request 400 on requests with parameters not existent in any queryables without unnecessary searching (optional: turned off by default - turned on by environment variable
VALIDATE_QUERYABLES).To do that PR also introduces local caching system (global variable
_queryables_cache_instance: QueryablesCache) that collects a set of queryables parameters from all the collections and updates it hourly by default (set with environment variableQUERYABLES_CACHE_TTLin seconds - default to3600, 1 hour).PR Checklist:
pre-commit run --all-files)make test)