-
Notifications
You must be signed in to change notification settings - Fork 34
Implement Validation for Non-Queryable Attributes in Filtering #532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 6 commits
c488a39
7434493
bb939bd
15ad1b9
e3ff023
5de02fc
3c3b5cb
7eef065
6cdb67c
a46665e
487f3f7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -369,6 +369,8 @@ You can customize additional settings in your `.env` file: | |
| | `USE_DATETIME_NANOS` | Enables nanosecond precision handling for `datetime` field searches as per the `date_nanos` type. When `False`, it uses 3 millisecond precision as per the type `date`. | `true` | Optional | | ||
| | `EXCLUDED_FROM_QUERYABLES` | Comma-separated list of fully qualified field names to exclude from the queryables endpoint and filtering. Use full paths like `properties.auth:schemes,properties.storage:schemes`. Excluded fields and their nested children will not be exposed in queryables. | None | Optional | | ||
| | `EXCLUDED_FROM_ITEMS` | Specifies fields to exclude from STAC item responses. Supports comma-separated field names and dot notation for nested fields (e.g., `private_data,properties.confidential,assets.internal`). | `None` | Optional | | ||
| | `VALIDATE_QUERYABLES` | Enable validation of query parameters against the collection's queryables. If set to `true`, the API will reject queries containing fields that are not defined in the collection's queryables. | `false` | Optional | | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bountx How does this interact with the EXCLUDED_FROM_QUERYABLES environment variable? Should both these env vars be combined into one?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did miss this in my implementation so I will fix it, but these two variables serve a different purpose: Their interaction when both used should look something along: If you had something other in mind let me know
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jonhealy1 Fixed this interaction to work the way I mentioned beforehand. |
||
| | `QUERYABLES_CACHE_TTL` | Time-to-live (in seconds) for the queryables cache. Used when `VALIDATE_QUERYABLES` is enabled. | `3600` | Optional | | ||
|
|
||
|
|
||
| > [!NOTE] | ||
|
|
@@ -424,6 +426,28 @@ EXCLUDED_FROM_QUERYABLES="properties.auth:schemes,properties.storage:schemes,pro | |
| - Excluded fields and their nested children will be skipped during field traversal | ||
| - Both the field itself and any nested properties will be excluded | ||
|
|
||
| ## Queryables Validation | ||
|
|
||
| SFEOS supports validating query parameters against the collection's defined queryables. This ensures that users only query fields that are explicitly exposed and indexed. | ||
|
|
||
| **Configuration:** | ||
|
|
||
| To enable queryables validation, set the following environment variables: | ||
|
|
||
| ```bash | ||
| VALIDATE_QUERYABLES=true | ||
| QUERYABLES_CACHE_TTL=3600 # Optional, defaults to 3600 seconds (1 hour) | ||
| ``` | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this seem like a long default?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is what my team suggested at first as the cache query for queryable parameters doesn't seem that costly. I'll rediscuss that with them though in detail. |
||
|
|
||
| **Behavior:** | ||
|
|
||
| - When enabled, the API maintains a cache of all queryable fields across all collections. | ||
| - Search requests (both GET and POST) are checked against this cache. | ||
| - If a request contains a query parameter or filter field that is not in the list of allowed queryables, the API returns a `400 Bad Request` error with a message indicating the invalid field(s). | ||
| - The cache is automatically refreshed based on the `QUERYABLES_CACHE_TTL` setting. | ||
|
|
||
| This feature helps prevent queries on unindexed fields which could lead to poor performance or unexpected results. | ||
|
||
|
|
||
| ## Datetime-Based Index Management | ||
|
|
||
| ### Overview | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -39,6 +39,11 @@ | |
| BulkTransactionMethod, | ||
| Items, | ||
| ) | ||
| from stac_fastapi.sfeos_helpers.queryables import ( | ||
| get_properties_from_cql2_filter, | ||
| initialize_queryables_cache, | ||
| validate_queryables, | ||
| ) | ||
|
||
| from stac_fastapi.types import stac as stac_types | ||
| from stac_fastapi.types.conformance import BASE_CONFORMANCE_CLASSES | ||
| from stac_fastapi.types.core import AsyncBaseCoreClient | ||
|
|
@@ -88,6 +93,10 @@ class CoreClient(AsyncBaseCoreClient): | |
| title: str = attr.ib(default="stac-fastapi") | ||
| description: str = attr.ib(default="stac-fastapi") | ||
|
|
||
| def __attrs_post_init__(self): | ||
| """Initialize the queryables cache.""" | ||
| initialize_queryables_cache(self.database) | ||
|
|
||
| def _landing_page( | ||
| self, | ||
| base_url: str, | ||
|
|
@@ -815,6 +824,8 @@ async def post_search( | |
| ) | ||
|
|
||
| if hasattr(search_request, "query") and getattr(search_request, "query"): | ||
| query_fields = set(getattr(search_request, "query").keys()) | ||
| await validate_queryables(query_fields) | ||
| for field_name, expr in getattr(search_request, "query").items(): | ||
| field = "properties__" + field_name | ||
| for op, value in expr.items(): | ||
|
|
@@ -833,7 +844,11 @@ async def post_search( | |
|
|
||
| if cql2_filter is not None: | ||
| try: | ||
| query_fields = get_properties_from_cql2_filter(cql2_filter) | ||
| await validate_queryables(query_fields) | ||
| search = await self.database.apply_cql2_filter(search, cql2_filter) | ||
| except HTTPException: | ||
| raise | ||
| except Exception as e: | ||
| raise HTTPException( | ||
| status_code=400, detail=f"Error with cql2 filter: {e}" | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,131 @@ | ||
| """A module for managing queryable attributes.""" | ||
|
|
||
| import asyncio | ||
| import os | ||
| import time | ||
| from typing import Any, Dict, List, Optional, Set | ||
|
|
||
| from fastapi import HTTPException | ||
|
|
||
| from stac_fastapi.core.base_database_logic import BaseDatabaseLogic | ||
|
|
||
|
|
||
| class QueryablesCache: | ||
| """A thread-safe, time-based cache for queryable properties.""" | ||
|
|
||
| def __init__(self, database_logic: Any): | ||
| """ | ||
| Initialize the QueryablesCache. | ||
|
|
||
| Args: | ||
| database_logic: An instance of a class with a `get_queryables_mapping` method. | ||
| """ | ||
| self._db_logic = database_logic | ||
| self._cache: Dict[str, List[str]] = {} | ||
| self._all_queryables: Set[str] = set() | ||
| self._last_updated: float = 0 | ||
| self._lock = asyncio.Lock() | ||
| self.validation_enabled: bool = False | ||
| self.cache_ttl: int = 3600 # How often to refresh cache (in seconds) | ||
| self.reload_settings() | ||
|
|
||
| def reload_settings(self): | ||
| """Reload settings from environment variables.""" | ||
| self.validation_enabled = ( | ||
| os.getenv("VALIDATE_QUERYABLES", "false").lower() == "true" | ||
| ) | ||
| self.cache_ttl = int(os.getenv("QUERYABLES_CACHE_TTL", "3600")) | ||
|
|
||
| async def _update_cache(self): | ||
| """Update the cache with the latest queryables from the database.""" | ||
| if not self.validation_enabled: | ||
| return | ||
|
|
||
| async with self._lock: | ||
| if (time.time() - self._last_updated < self.cache_ttl) and self._cache: | ||
| return | ||
|
|
||
| queryables_mapping = await self._db_logic.get_queryables_mapping() | ||
| all_queryables_set = set(queryables_mapping.keys()) | ||
|
|
||
| self._all_queryables = all_queryables_set | ||
|
|
||
| self._cache = {"*": list(all_queryables_set)} | ||
| self._last_updated = time.time() | ||
|
|
||
| async def get_all_queryables(self) -> Set[str]: | ||
| """ | ||
| Return a set of all queryable attributes across all collections. | ||
|
|
||
| This method will update the cache if it's stale or has been cleared. | ||
| """ | ||
| if not self.validation_enabled: | ||
| return set() | ||
|
|
||
| if (time.time() - self._last_updated >= self.cache_ttl) or not self._cache: | ||
| await self._update_cache() | ||
| return self._all_queryables | ||
|
|
||
| async def validate(self, fields: Set[str]) -> None: | ||
| """ | ||
| Validate if the provided fields are queryable. | ||
|
|
||
| Raises HTTPException if invalid fields are found. | ||
| """ | ||
| if not self.validation_enabled: | ||
| return | ||
|
|
||
| allowed_fields = await self.get_all_queryables() | ||
| invalid_fields = fields - allowed_fields | ||
| if invalid_fields: | ||
| raise HTTPException( | ||
| status_code=400, | ||
| detail=f"Invalid query fields: {', '.join(invalid_fields)}.", | ||
| ) | ||
|
|
||
|
|
||
| _queryables_cache_instance: Optional[QueryablesCache] = None | ||
|
|
||
|
||
|
|
||
| def initialize_queryables_cache(database_logic: BaseDatabaseLogic): | ||
| """ | ||
| Initialize the global queryables cache. | ||
|
|
||
| :param database_logic: An instance of DatabaseLogic. | ||
| """ | ||
| global _queryables_cache_instance | ||
| if _queryables_cache_instance is None: | ||
| _queryables_cache_instance = QueryablesCache(database_logic) | ||
|
|
||
|
|
||
| async def all_queryables() -> Set[str]: | ||
| """Get all queryable properties from the cache.""" | ||
| if _queryables_cache_instance is None: | ||
| raise Exception("Queryables cache not initialized.") | ||
| return await _queryables_cache_instance.get_all_queryables() | ||
|
|
||
|
|
||
| async def validate_queryables(fields: Set[str]) -> None: | ||
| """Validate if the provided fields are queryable.""" | ||
| if _queryables_cache_instance is None: | ||
| return | ||
| await _queryables_cache_instance.validate(fields) | ||
|
|
||
|
|
||
| def reload_queryables_settings(): | ||
| """Reload queryables settings from environment variables.""" | ||
| if _queryables_cache_instance: | ||
| _queryables_cache_instance.reload_settings() | ||
|
|
||
|
|
||
| def get_properties_from_cql2_filter(cql2_filter: Dict[str, Any]) -> Set[str]: | ||
| """Recursively extract property names from a CQL2 filter.""" | ||
| props: Set[str] = set() | ||
| if "op" in cql2_filter and "args" in cql2_filter: | ||
| for arg in cql2_filter["args"]: | ||
| if isinstance(arg, dict): | ||
| if "op" in arg: | ||
| props.update(get_properties_from_cql2_filter(arg)) | ||
| elif "property" in arg: | ||
| props.add(arg["property"]) | ||
| return props | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,67 @@ | ||
| import json | ||
| import os | ||
| from unittest import mock | ||
|
|
||
| import pytest | ||
|
|
||
| from stac_fastapi.sfeos_helpers.queryables import reload_queryables_settings | ||
|
|
||
|
|
||
| @pytest.fixture(autouse=True) | ||
| def enable_validation(): | ||
| with mock.patch.dict(os.environ, {"VALIDATE_QUERYABLES": "true"}): | ||
| reload_queryables_settings() | ||
| yield | ||
| reload_queryables_settings() | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_search_post_query_valid_param(app_client, ctx): | ||
| """Test POST /search with a valid query parameter""" | ||
| query = {"query": {"eo:cloud_cover": {"lt": 10}}} | ||
| resp = await app_client.post("/search", json=query) | ||
| assert resp.status_code == 200 | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_search_post_query_invalid_param(app_client, ctx): | ||
| """Test POST /search with an invalid query parameter""" | ||
| query = {"query": {"invalid_param": {"eq": "test"}}} | ||
| resp = await app_client.post("/search", json=query) | ||
| assert resp.status_code == 400 | ||
| resp_json = resp.json() | ||
| assert "Invalid query fields: invalid_param" in resp_json["detail"] | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_item_collection_get_filter_valid_param(app_client, ctx): | ||
| """Test GET /collections/{collection_id}/items with a valid filter parameter""" | ||
| collection_id = ctx.item["collection"] | ||
| filter_body = { | ||
| "op": "<", | ||
| "args": [{"property": "eo:cloud_cover"}, 10], | ||
| } | ||
| params = { | ||
| "filter-lang": "cql2-json", | ||
| "filter": json.dumps(filter_body), | ||
| } | ||
| resp = await app_client.get(f"/collections/{collection_id}/items", params=params) | ||
| assert resp.status_code == 200 | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_item_collection_get_filter_invalid_param(app_client, ctx): | ||
| """Test GET /collections/{collection_id}/items with an invalid filter parameter""" | ||
| collection_id = ctx.item["collection"] | ||
| filter_body = { | ||
| "op": "=", | ||
| "args": [{"property": "invalid_param"}, "test"], | ||
| } | ||
| params = { | ||
| "filter-lang": "cql2-json", | ||
| "filter": json.dumps(filter_body), | ||
| } | ||
| resp = await app_client.get(f"/collections/{collection_id}/items", params=params) | ||
| assert resp.status_code == 400 | ||
| resp_json = resp.json() | ||
| assert "Invalid query fields: invalid_param" in resp_json["detail"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a line here explaining the expected behavior if VALIDATE_QUERYABLES is also set.