Skip to content
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

### Added

- Environment variable `VALIDATE_QUERYABLES` to enable/disable validation of queryables in search/filter requests. When set to `true`, search requests will be validated against the defined queryables, returning an error for any unsupported fields. Defaults to `false` for backward compatibility.[#532](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/532)

- Environment variable `QUERYABLES_CACHE_TTL` to configure the TTL (in seconds) for caching queryables. Default is `3600` seconds (1 hour) to balance performance and freshness of queryables data. [#532](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/532)

### Changed

### Fixed
Expand Down
27 changes: 26 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -367,8 +367,10 @@ You can customize additional settings in your `.env` file:
| `STAC_INDEX_ASSETS` | Controls if Assets are indexed when added to Elasticsearch/Opensearch. This allows asset fields to be included in search queries. | `false` | Optional |
| `USE_DATETIME` | Configures the datetime search behavior in SFEOS. When enabled, searches both datetime field and falls back to start_datetime/end_datetime range for items with null datetime. When disabled, searches only by start_datetime/end_datetime range. | `true` | Optional |
| `USE_DATETIME_NANOS` | Enables nanosecond precision handling for `datetime` field searches as per the `date_nanos` type. When `False`, it uses 3 millisecond precision as per the type `date`. | `true` | Optional |
| `EXCLUDED_FROM_QUERYABLES` | Comma-separated list of fully qualified field names to exclude from the queryables endpoint and filtering. Use full paths like `properties.auth:schemes,properties.storage:schemes`. Excluded fields and their nested children will not be exposed in queryables. | None | Optional |
| `EXCLUDED_FROM_QUERYABLES` | Comma-separated list of fully qualified field names to exclude from the queryables endpoint and filtering. Use full paths like `properties.auth:schemes,properties.storage:schemes`. Excluded fields and their nested children will not be exposed in queryables. If `VALIDATE_QUERYABLES` is enabled, these fields will also be considered invalid for filtering. | None | Optional |
| `EXCLUDED_FROM_ITEMS` | Specifies fields to exclude from STAC item responses. Supports comma-separated field names and dot notation for nested fields (e.g., `private_data,properties.confidential,assets.internal`). | `None` | Optional |
| `VALIDATE_QUERYABLES` | Enable validation of query parameters against the collection's queryables. If set to `true`, the API will reject queries containing fields that are not defined in the collection's queryables. | `false` | Optional |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bountx How does this interact with the EXCLUDED_FROM_QUERYABLES environment variable? Should both these env vars be combined into one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did miss this in my implementation so I will fix it, but these two variables serve a different purpose: VALIDATE_QUERYABLES should validate on set of all queryables and EXCLUDED_FROM_QUERYABLES can be used to make this set of all queryables smaller.

Their interaction when both used should look something along:
Cache of all queryables -> Removes from this set all excluded queryables -> Validates on that set

If you had something other in mind let me know

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonhealy1 Fixed this interaction to work the way I mentioned beforehand.

| `QUERYABLES_CACHE_TTL` | Time-to-live (in seconds) for the queryables cache. Used when `VALIDATE_QUERYABLES` is enabled. | `3600` | Optional |


> [!NOTE]
Expand Down Expand Up @@ -424,6 +426,29 @@ EXCLUDED_FROM_QUERYABLES="properties.auth:schemes,properties.storage:schemes,pro
- Excluded fields and their nested children will be skipped during field traversal
- Both the field itself and any nested properties will be excluded

## Queryables Validation

SFEOS supports validating query parameters against the collection's defined queryables. This ensures that users only query fields that are explicitly exposed and indexed.

**Configuration:**

To enable queryables validation, set the following environment variables:

```bash
VALIDATE_QUERYABLES=true
QUERYABLES_CACHE_TTL=3600 # Optional, defaults to 3600 seconds (1 hour)
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this seem like a long default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what my team suggested at first as the cache query for queryable parameters doesn't seem that costly. I'll rediscuss that with them though in detail.


**Behavior:**

- When enabled, the API maintains a cache of all queryable fields across all collections.
- Search requests (both GET and POST) are checked against this cache.
- If a request contains a query parameter or filter field that is not in the list of allowed queryables, the API returns a `400 Bad Request` error with a message indicating the invalid field(s).
- The cache is automatically refreshed based on the `QUERYABLES_CACHE_TTL` setting.
- **Interaction with `EXCLUDED_FROM_QUERYABLES`**: If `VALIDATE_QUERYABLES` is enabled, fields listed in `EXCLUDED_FROM_QUERYABLES` will also be considered invalid for filtering. This effectively enforces the exclusion of these fields from search queries.

This feature helps prevent queries on non-queryable fields which could lead to unnecessary load on the database.

## Datetime-Based Index Management

### Overview
Expand Down
5 changes: 5 additions & 0 deletions stac_fastapi/core/stac_fastapi/core/base_database_logic.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,3 +138,8 @@ async def delete_collection(
) -> None:
"""Delete a collection from the database."""
pass

@abc.abstractmethod
async def get_queryables_mapping(self, collection_id: str = "*") -> Dict[str, Any]:
"""Retrieve mapping of Queryables for search."""
pass
14 changes: 14 additions & 0 deletions stac_fastapi/core/stac_fastapi/core/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@
from stac_fastapi.core.base_settings import ApiBaseSettings
from stac_fastapi.core.datetime_utils import format_datetime_range
from stac_fastapi.core.models.links import PagingLinks
from stac_fastapi.core.queryables import (
QueryablesCache,
get_properties_from_cql2_filter,
)
from stac_fastapi.core.redis_utils import redis_pagination_links
from stac_fastapi.core.serializers import CollectionSerializer, ItemSerializer
from stac_fastapi.core.session import Session
Expand Down Expand Up @@ -88,6 +92,10 @@ class CoreClient(AsyncBaseCoreClient):
title: str = attr.ib(default="stac-fastapi")
description: str = attr.ib(default="stac-fastapi")

def __attrs_post_init__(self):
"""Initialize the queryables cache."""
self.queryables_cache = QueryablesCache(self.database)

def _landing_page(
self,
base_url: str,
Expand Down Expand Up @@ -816,6 +824,8 @@ async def post_search(
)

if hasattr(search_request, "query") and getattr(search_request, "query"):
query_fields = set(getattr(search_request, "query").keys())
await self.queryables_cache.validate(query_fields)
for field_name, expr in getattr(search_request, "query").items():
field = "properties__" + field_name
for op, value in expr.items():
Expand All @@ -834,7 +844,11 @@ async def post_search(

if cql2_filter is not None:
try:
query_fields = get_properties_from_cql2_filter(cql2_filter)
await self.queryables_cache.validate(query_fields)
search = await self.database.apply_cql2_filter(search, cql2_filter)
except HTTPException:
raise
except Exception as e:
raise HTTPException(
status_code=400, detail=f"Error with cql2 filter: {e}"
Expand Down
110 changes: 110 additions & 0 deletions stac_fastapi/core/stac_fastapi/core/queryables.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
"""A module for managing queryable attributes."""

import asyncio
import os
import time
from typing import Any, Dict, List, Set

from fastapi import HTTPException


class QueryablesCache:
"""A thread-safe, time-based cache for queryable properties."""

def __init__(self, database_logic: Any):
"""
Initialize the QueryablesCache.

Args:
database_logic: An instance of a class with a `get_queryables_mapping` method.
"""
self._db_logic = database_logic
self._cache: Dict[str, List[str]] = {}
self._all_queryables: Set[str] = set()
self._last_updated: float = 0
self._lock = asyncio.Lock()
self.validation_enabled: bool = False
self.cache_ttl: int = 3600 # How often to refresh cache (in seconds)
self.excluded_queryables: Set[str] = set()
self.reload_settings()

def reload_settings(self):
"""Reload settings from environment variables."""
self.validation_enabled = (
os.getenv("VALIDATE_QUERYABLES", "false").lower() == "true"
)
self.cache_ttl = int(os.getenv("QUERYABLES_CACHE_TTL", "3600"))

excluded = os.getenv("EXCLUDED_FROM_QUERYABLES", "")
self.excluded_queryables = set()
if excluded:
for field in excluded.split(","):
field = field.strip()
if field:
# Remove 'properties.' prefix if present
if field.startswith("properties."):
field = field[11:]
self.excluded_queryables.add(field)

async def _update_cache(self):
"""Update the cache with the latest queryables from the database."""
if not self.validation_enabled:
return

async with self._lock:
if (time.time() - self._last_updated < self.cache_ttl) and self._cache:
return

queryables_mapping = await self._db_logic.get_queryables_mapping()
all_queryables_set = set(queryables_mapping.keys())

if self.excluded_queryables:
all_queryables_set = all_queryables_set - self.excluded_queryables

self._all_queryables = all_queryables_set

self._cache = {"*": list(all_queryables_set)}
self._last_updated = time.time()

async def get_all_queryables(self) -> Set[str]:
"""
Return a set of all queryable attributes across all collections.

This method will update the cache if it's stale or has been cleared.
"""
if not self.validation_enabled:
return set()

if (time.time() - self._last_updated >= self.cache_ttl) or not self._cache:
await self._update_cache()
return self._all_queryables

async def validate(self, fields: Set[str]) -> None:
"""
Validate if the provided fields are queryable.

Raises HTTPException if invalid fields are found.
"""
if not self.validation_enabled:
return

allowed_fields = await self.get_all_queryables()
invalid_fields = fields - allowed_fields
if invalid_fields:
raise HTTPException(
status_code=400,
detail=f"Invalid query fields: {', '.join(invalid_fields)}.",
)


def get_properties_from_cql2_filter(cql2_filter: Dict[str, Any]) -> Set[str]:
"""Recursively extract property names from a CQL2 filter."""
props: Set[str] = set()
if "op" in cql2_filter and "args" in cql2_filter:
for arg in cql2_filter["args"]:
if isinstance(arg, dict):
if "op" in arg:
props.update(get_properties_from_cql2_filter(arg))
elif "property" in arg:
props.add(arg["property"])
return props
116 changes: 116 additions & 0 deletions stac_fastapi/tests/api/test_api_query_validation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
import json
import os
from unittest import mock

import pytest

if os.getenv("BACKEND", "elasticsearch").lower() == "opensearch":
from stac_fastapi.opensearch.app import app_config
else:
from stac_fastapi.elasticsearch.app import app_config


def get_core_client():
if os.getenv("BACKEND", "elasticsearch").lower() == "opensearch":
from stac_fastapi.opensearch.app import app_config
else:
from stac_fastapi.elasticsearch.app import app_config
return app_config["client"]


def reload_queryables_settings():
client = get_core_client()
if hasattr(client, "queryables_cache"):
client.queryables_cache.reload_settings()


@pytest.fixture(autouse=True)
def enable_validation():

client = app_config["client"]
with mock.patch.dict(os.environ, {"VALIDATE_QUERYABLES": "true"}):
client.queryables_cache.reload_settings()
yield
client.queryables_cache.reload_settings()


@pytest.mark.asyncio
async def test_search_post_query_valid_param(app_client, ctx):
"""Test POST /search with a valid query parameter"""
query = {"query": {"eo:cloud_cover": {"lt": 10}}}
resp = await app_client.post("/search", json=query)
assert resp.status_code == 200


@pytest.mark.asyncio
async def test_search_post_query_invalid_param(app_client, ctx):
"""Test POST /search with an invalid query parameter"""
query = {"query": {"invalid_param": {"eq": "test"}}}
resp = await app_client.post("/search", json=query)
assert resp.status_code == 400
resp_json = resp.json()
assert "Invalid query fields: invalid_param" in resp_json["detail"]


@pytest.mark.asyncio
async def test_item_collection_get_filter_valid_param(app_client, ctx):
"""Test GET /collections/{collection_id}/items with a valid filter parameter"""
collection_id = ctx.item["collection"]
filter_body = {
"op": "<",
"args": [{"property": "eo:cloud_cover"}, 10],
}
params = {
"filter-lang": "cql2-json",
"filter": json.dumps(filter_body),
}
resp = await app_client.get(f"/collections/{collection_id}/items", params=params)
assert resp.status_code == 200


@pytest.mark.asyncio
async def test_item_collection_get_filter_invalid_param(app_client, ctx):
"""Test GET /collections/{collection_id}/items with an invalid filter parameter"""
collection_id = ctx.item["collection"]
filter_body = {
"op": "=",
"args": [{"property": "invalid_param"}, "test"],
}
params = {
"filter-lang": "cql2-json",
"filter": json.dumps(filter_body),
}
resp = await app_client.get(f"/collections/{collection_id}/items", params=params)
assert resp.status_code == 400
resp_json = resp.json()
assert "Invalid query fields: invalid_param" in resp_json["detail"]


@pytest.mark.asyncio
async def test_validate_queryables_excluded(app_client, ctx):
"""Test that excluded queryables are rejected when validation is enabled."""

excluded_field = "eo:cloud_cover"
client = app_config["client"]

with mock.patch.dict(
os.environ,
{
"VALIDATE_QUERYABLES": "true",
"EXCLUDED_FROM_QUERYABLES": excluded_field,
"QUERYABLES_CACHE_TTL": "0",
},
):
client.queryables_cache.reload_settings()

query = {"query": {excluded_field: {"lt": 10}}}
resp = await app_client.post("/search", json=query)
assert resp.status_code == 400
assert "Invalid query fields" in resp.json()["detail"]
assert excluded_field in resp.json()["detail"]

query = {"query": {"id": {"eq": "test-item"}}}
resp = await app_client.post("/search", json=query)
assert resp.status_code == 200

client.queryables_cache.reload_settings()
Loading