Skip to content
Open
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,12 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

### Added

- Environment variable `VALIDATE_QUERYABLES` to enable/disable validation of queryables in search/filter requests. When set to `true`, search requests will be validated against the defined queryables, returning an error for any unsupported fields. Defaults to `false` for backward compatibility.[#532](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/532)

- Environment variable `QUERYABLES_CACHE_TTL` to configure the TTL (in seconds) for caching queryables. Default is `3600` seconds (1 hour) to balance performance and freshness of queryables data. [#532](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/532)

- Added retry with back-off logic for Redis related functions. [#528](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/528)

- Added nanosecond precision datetime filtering that ensures nanosecond precision support in filtering by datetime. This is configured via the `USE_DATETIME_NANOS` environment variable, while maintaining microseconds compatibility for datetime precision. [#529](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/529)

### Changed
Expand Down
24 changes: 24 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,8 @@ You can customize additional settings in your `.env` file:
| `USE_DATETIME_NANOS` | Enables nanosecond precision handling for `datetime` field searches as per the `date_nanos` type. When `False`, it uses 3 millisecond precision as per the type `date`. | `true` | Optional |
| `EXCLUDED_FROM_QUERYABLES` | Comma-separated list of fully qualified field names to exclude from the queryables endpoint and filtering. Use full paths like `properties.auth:schemes,properties.storage:schemes`. Excluded fields and their nested children will not be exposed in queryables. | None | Optional |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a line here explaining the expected behavior if VALIDATE_QUERYABLES is also set.

| `EXCLUDED_FROM_ITEMS` | Specifies fields to exclude from STAC item responses. Supports comma-separated field names and dot notation for nested fields (e.g., `private_data,properties.confidential,assets.internal`). | `None` | Optional |
| `VALIDATE_QUERYABLES` | Enable validation of query parameters against the collection's queryables. If set to `true`, the API will reject queries containing fields that are not defined in the collection's queryables. | `false` | Optional |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bountx How does this interact with the EXCLUDED_FROM_QUERYABLES environment variable? Should both these env vars be combined into one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did miss this in my implementation so I will fix it, but these two variables serve a different purpose: VALIDATE_QUERYABLES should validate on set of all queryables and EXCLUDED_FROM_QUERYABLES can be used to make this set of all queryables smaller.

Their interaction when both used should look something along:
Cache of all queryables -> Removes from this set all excluded queryables -> Validates on that set

If you had something other in mind let me know

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonhealy1 Fixed this interaction to work the way I mentioned beforehand.

| `QUERYABLES_CACHE_TTL` | Time-to-live (in seconds) for the queryables cache. Used when `VALIDATE_QUERYABLES` is enabled. | `3600` | Optional |


> [!NOTE]
Expand Down Expand Up @@ -424,6 +426,28 @@ EXCLUDED_FROM_QUERYABLES="properties.auth:schemes,properties.storage:schemes,pro
- Excluded fields and their nested children will be skipped during field traversal
- Both the field itself and any nested properties will be excluded

## Queryables Validation

SFEOS supports validating query parameters against the collection's defined queryables. This ensures that users only query fields that are explicitly exposed and indexed.

**Configuration:**

To enable queryables validation, set the following environment variables:

```bash
VALIDATE_QUERYABLES=true
QUERYABLES_CACHE_TTL=3600 # Optional, defaults to 3600 seconds (1 hour)
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this seem like a long default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what my team suggested at first as the cache query for queryable parameters doesn't seem that costly. I'll rediscuss that with them though in detail.


**Behavior:**

- When enabled, the API maintains a cache of all queryable fields across all collections.
- Search requests (both GET and POST) are checked against this cache.
- If a request contains a query parameter or filter field that is not in the list of allowed queryables, the API returns a `400 Bad Request` error with a message indicating the invalid field(s).
- The cache is automatically refreshed based on the `QUERYABLES_CACHE_TTL` setting.

This feature helps prevent queries on unindexed fields which could lead to poor performance or unexpected results.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong but I think Opensearch automatically indexes everything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poor wording on my side. Unindexed fields = nonexistant fields

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed wording


## Datetime-Based Index Management

### Overview
Expand Down
15 changes: 15 additions & 0 deletions stac_fastapi/core/stac_fastapi/core/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,11 @@
BulkTransactionMethod,
Items,
)
from stac_fastapi.sfeos_helpers.queryables import (
get_properties_from_cql2_filter,
initialize_queryables_cache,
validate_queryables,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are importing stac_fastapi.sfeos_helpers into stac_fastapi.core. The Core library is a generic abstraction and must not depend on specific implementation helpers. This breaks the modularity of the library.

Query validation is a generic feature that belongs in Core. However, we cannot import it from sfeos_helpers.

  1. Move to Core Please move queryables.py (the Cache and Validator logic) out of stac_fastapi/sfeos_helpers/ and into stac_fastapi/core/.

Once the code lives in Core, CoreClient can import it safely without creating a dependency on the Elasticsearch/OpenSearch specific package. The Validator should rely strictly on BaseDatabaseLogic interfaces. (Note: You will likely need to add get_queryables_mapping as an abstract method to BaseDatabaseLogic so the validator can call it generically).

  1. Remove Global State Please remove the global variable (_queryables_cache_instance). Instead, instantiate QueryablesCache directly inside CoreClient.init as self.queryables_cache. This resolves the threading and testing fragility issues inherent in global state.

from stac_fastapi.types import stac as stac_types
from stac_fastapi.types.conformance import BASE_CONFORMANCE_CLASSES
from stac_fastapi.types.core import AsyncBaseCoreClient
Expand Down Expand Up @@ -88,6 +93,10 @@ class CoreClient(AsyncBaseCoreClient):
title: str = attr.ib(default="stac-fastapi")
description: str = attr.ib(default="stac-fastapi")

def __attrs_post_init__(self):
"""Initialize the queryables cache."""
initialize_queryables_cache(self.database)

def _landing_page(
self,
base_url: str,
Expand Down Expand Up @@ -815,6 +824,8 @@ async def post_search(
)

if hasattr(search_request, "query") and getattr(search_request, "query"):
query_fields = set(getattr(search_request, "query").keys())
await validate_queryables(query_fields)
for field_name, expr in getattr(search_request, "query").items():
field = "properties__" + field_name
for op, value in expr.items():
Expand All @@ -833,7 +844,11 @@ async def post_search(

if cql2_filter is not None:
try:
query_fields = get_properties_from_cql2_filter(cql2_filter)
await validate_queryables(query_fields)
search = await self.database.apply_cql2_filter(search, cql2_filter)
except HTTPException:
raise
except Exception as e:
raise HTTPException(
status_code=400, detail=f"Error with cql2 filter: {e}"
Expand Down
131 changes: 131 additions & 0 deletions stac_fastapi/sfeos_helpers/stac_fastapi/sfeos_helpers/queryables.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
"""A module for managing queryable attributes."""

import asyncio
import os
import time
from typing import Any, Dict, List, Optional, Set

from fastapi import HTTPException

from stac_fastapi.core.base_database_logic import BaseDatabaseLogic


class QueryablesCache:
"""A thread-safe, time-based cache for queryable properties."""

def __init__(self, database_logic: Any):
"""
Initialize the QueryablesCache.

Args:
database_logic: An instance of a class with a `get_queryables_mapping` method.
"""
self._db_logic = database_logic
self._cache: Dict[str, List[str]] = {}
self._all_queryables: Set[str] = set()
self._last_updated: float = 0
self._lock = asyncio.Lock()
self.validation_enabled: bool = False
self.cache_ttl: int = 3600 # How often to refresh cache (in seconds)
self.reload_settings()

def reload_settings(self):
"""Reload settings from environment variables."""
self.validation_enabled = (
os.getenv("VALIDATE_QUERYABLES", "false").lower() == "true"
)
self.cache_ttl = int(os.getenv("QUERYABLES_CACHE_TTL", "3600"))

async def _update_cache(self):
"""Update the cache with the latest queryables from the database."""
if not self.validation_enabled:
return

async with self._lock:
if (time.time() - self._last_updated < self.cache_ttl) and self._cache:
return

queryables_mapping = await self._db_logic.get_queryables_mapping()
all_queryables_set = set(queryables_mapping.keys())

self._all_queryables = all_queryables_set

self._cache = {"*": list(all_queryables_set)}
self._last_updated = time.time()

async def get_all_queryables(self) -> Set[str]:
"""
Return a set of all queryable attributes across all collections.

This method will update the cache if it's stale or has been cleared.
"""
if not self.validation_enabled:
return set()

if (time.time() - self._last_updated >= self.cache_ttl) or not self._cache:
await self._update_cache()
return self._all_queryables

async def validate(self, fields: Set[str]) -> None:
"""
Validate if the provided fields are queryable.

Raises HTTPException if invalid fields are found.
"""
if not self.validation_enabled:
return

allowed_fields = await self.get_all_queryables()
invalid_fields = fields - allowed_fields
if invalid_fields:
raise HTTPException(
status_code=400,
detail=f"Invalid query fields: {', '.join(invalid_fields)}.",
)


_queryables_cache_instance: Optional[QueryablesCache] = None

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above. Please remove _queryables_cache_instance


class CoreClient(AsyncBaseCoreClient):
    def __init__(self, database: BaseDatabaseLogic, ...):
        self.database = database
        # Initialize the cache as an instance attribute
        # This solves the threading and testing fragility issues
        self.queryables_cache = QueryablesCache(database)```


def initialize_queryables_cache(database_logic: BaseDatabaseLogic):
"""
Initialize the global queryables cache.

:param database_logic: An instance of DatabaseLogic.
"""
global _queryables_cache_instance
if _queryables_cache_instance is None:
_queryables_cache_instance = QueryablesCache(database_logic)


async def all_queryables() -> Set[str]:
"""Get all queryable properties from the cache."""
if _queryables_cache_instance is None:
raise Exception("Queryables cache not initialized.")
return await _queryables_cache_instance.get_all_queryables()


async def validate_queryables(fields: Set[str]) -> None:
"""Validate if the provided fields are queryable."""
if _queryables_cache_instance is None:
return
await _queryables_cache_instance.validate(fields)


def reload_queryables_settings():
"""Reload queryables settings from environment variables."""
if _queryables_cache_instance:
_queryables_cache_instance.reload_settings()


def get_properties_from_cql2_filter(cql2_filter: Dict[str, Any]) -> Set[str]:
"""Recursively extract property names from a CQL2 filter."""
props: Set[str] = set()
if "op" in cql2_filter and "args" in cql2_filter:
for arg in cql2_filter["args"]:
if isinstance(arg, dict):
if "op" in arg:
props.update(get_properties_from_cql2_filter(arg))
elif "property" in arg:
props.add(arg["property"])
return props
67 changes: 67 additions & 0 deletions stac_fastapi/tests/api/test_api_query_validation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
import json
import os
from unittest import mock

import pytest

from stac_fastapi.sfeos_helpers.queryables import reload_queryables_settings


@pytest.fixture(autouse=True)
def enable_validation():
with mock.patch.dict(os.environ, {"VALIDATE_QUERYABLES": "true"}):
reload_queryables_settings()
yield
reload_queryables_settings()


@pytest.mark.asyncio
async def test_search_post_query_valid_param(app_client, ctx):
"""Test POST /search with a valid query parameter"""
query = {"query": {"eo:cloud_cover": {"lt": 10}}}
resp = await app_client.post("/search", json=query)
assert resp.status_code == 200


@pytest.mark.asyncio
async def test_search_post_query_invalid_param(app_client, ctx):
"""Test POST /search with an invalid query parameter"""
query = {"query": {"invalid_param": {"eq": "test"}}}
resp = await app_client.post("/search", json=query)
assert resp.status_code == 400
resp_json = resp.json()
assert "Invalid query fields: invalid_param" in resp_json["detail"]


@pytest.mark.asyncio
async def test_item_collection_get_filter_valid_param(app_client, ctx):
"""Test GET /collections/{collection_id}/items with a valid filter parameter"""
collection_id = ctx.item["collection"]
filter_body = {
"op": "<",
"args": [{"property": "eo:cloud_cover"}, 10],
}
params = {
"filter-lang": "cql2-json",
"filter": json.dumps(filter_body),
}
resp = await app_client.get(f"/collections/{collection_id}/items", params=params)
assert resp.status_code == 200


@pytest.mark.asyncio
async def test_item_collection_get_filter_invalid_param(app_client, ctx):
"""Test GET /collections/{collection_id}/items with an invalid filter parameter"""
collection_id = ctx.item["collection"]
filter_body = {
"op": "=",
"args": [{"property": "invalid_param"}, "test"],
}
params = {
"filter-lang": "cql2-json",
"filter": json.dumps(filter_body),
}
resp = await app_client.get(f"/collections/{collection_id}/items", params=params)
assert resp.status_code == 400
resp_json = resp.json()
assert "Invalid query fields: invalid_param" in resp_json["detail"]
5 changes: 5 additions & 0 deletions stac_fastapi/tests/data/test_collection.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,11 @@
"type": "Collection",
"description": "Landat 8 imagery radiometrically calibrated and orthorectified using gound points and Digital Elevation Model (DEM) data to correct relief displacement.",
"stac_version": "1.0.0",
"queryables": {
"eo:cloud_cover": {
"$ref": "https://stac-extensions.github.io/eo/v1.0.0/schema.json#/definitions/fields/properties/eo:cloud_cover"
}
},
"license": "PDDL-1.0",
"summaries": {
"platform": [
Expand Down
Loading