Skip to content

Conversation

@bountx
Copy link
Contributor

@bountx bountx commented Nov 20, 2025

Related Issue(s):

Description:

Created query validation to return bad request 400 on requests with parameters not existent in any queryables without unnecessary searching (optional: turned off by default - turned on by environment variable VALIDATE_QUERYABLES).

To do that PR also introduces local caching system (global variable _queryables_cache_instance: QueryablesCache) that collects a set of queryables parameters from all the collections and updates it hourly by default (set with environment variable QUERYABLES_CACHE_TTL in seconds - default to 3600, 1 hour).

PR Checklist:

  • Code is formatted and linted (run pre-commit run --all-files)
  • Tests pass (run make test)
  • Documentation has been updated to reflect changes, if applicable
  • Changes are added to the changelog

| `USE_DATETIME_NANOS` | Enables nanosecond precision handling for `datetime` field searches as per the `date_nanos` type. When `False`, it uses 3 millisecond precision as per the type `date`. | `true` | Optional |
| `EXCLUDED_FROM_QUERYABLES` | Comma-separated list of fully qualified field names to exclude from the queryables endpoint and filtering. Use full paths like `properties.auth:schemes,properties.storage:schemes`. Excluded fields and their nested children will not be exposed in queryables. | None | Optional |
| `EXCLUDED_FROM_ITEMS` | Specifies fields to exclude from STAC item responses. Supports comma-separated field names and dot notation for nested fields (e.g., `private_data,properties.confidential,assets.internal`). | `None` | Optional |
| `VALIDATE_QUERYABLES` | Enable validation of query parameters against the collection's queryables. If set to `true`, the API will reject queries containing fields that are not defined in the collection's queryables. | `false` | Optional |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bountx How does this interact with the EXCLUDED_FROM_QUERYABLES environment variable? Should both these env vars be combined into one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did miss this in my implementation so I will fix it, but these two variables serve a different purpose: VALIDATE_QUERYABLES should validate on set of all queryables and EXCLUDED_FROM_QUERYABLES can be used to make this set of all queryables smaller.

Their interaction when both used should look something along:
Cache of all queryables -> Removes from this set all excluded queryables -> Validates on that set

If you had something other in mind let me know

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonhealy1 Fixed this interaction to work the way I mentioned beforehand.

README.md Outdated
- If a request contains a query parameter or filter field that is not in the list of allowed queryables, the API returns a `400 Bad Request` error with a message indicating the invalid field(s).
- The cache is automatically refreshed based on the `QUERYABLES_CACHE_TTL` setting.

This feature helps prevent queries on unindexed fields which could lead to poor performance or unexpected results.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong but I think Opensearch automatically indexes everything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poor wording on my side. Unindexed fields = nonexistant fields

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed wording

@bountx bountx requested a review from jonhealy1 November 27, 2025 11:26
README.md Outdated
| `STAC_INDEX_ASSETS` | Controls if Assets are indexed when added to Elasticsearch/Opensearch. This allows asset fields to be included in search queries. | `false` | Optional |
| `USE_DATETIME` | Configures the datetime search behavior in SFEOS. When enabled, searches both datetime field and falls back to start_datetime/end_datetime range for items with null datetime. When disabled, searches only by start_datetime/end_datetime range. | `true` | Optional |
| `USE_DATETIME_NANOS` | Enables nanosecond precision handling for `datetime` field searches as per the `date_nanos` type. When `False`, it uses 3 millisecond precision as per the type `date`. | `true` | Optional |
| `EXCLUDED_FROM_QUERYABLES` | Comma-separated list of fully qualified field names to exclude from the queryables endpoint and filtering. Use full paths like `properties.auth:schemes,properties.storage:schemes`. Excluded fields and their nested children will not be exposed in queryables. | None | Optional |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a line here explaining the expected behavior if VALIDATE_QUERYABLES is also set.

get_properties_from_cql2_filter,
initialize_queryables_cache,
validate_queryables,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are importing stac_fastapi.sfeos_helpers into stac_fastapi.core. The Core library is a generic abstraction and must not depend on specific implementation helpers. This breaks the modularity of the library.

Query validation is a generic feature that belongs in Core. However, we cannot import it from sfeos_helpers.

  1. Move to Core Please move queryables.py (the Cache and Validator logic) out of stac_fastapi/sfeos_helpers/ and into stac_fastapi/core/.

Once the code lives in Core, CoreClient can import it safely without creating a dependency on the Elasticsearch/OpenSearch specific package. The Validator should rely strictly on BaseDatabaseLogic interfaces. (Note: You will likely need to add get_queryables_mapping as an abstract method to BaseDatabaseLogic so the validator can call it generically).

  1. Remove Global State Please remove the global variable (_queryables_cache_instance). Instead, instantiate QueryablesCache directly inside CoreClient.init as self.queryables_cache. This resolves the threading and testing fragility issues inherent in global state.

```bash
VALIDATE_QUERYABLES=true
QUERYABLES_CACHE_TTL=3600 # Optional, defaults to 3600 seconds (1 hour)
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this seem like a long default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what my team suggested at first as the cache query for queryable parameters doesn't seem that costly. I'll rediscuss that with them though in detail.



_queryables_cache_instance: Optional[QueryablesCache] = None

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above. Please remove _queryables_cache_instance


class CoreClient(AsyncBaseCoreClient):
    def __init__(self, database: BaseDatabaseLogic, ...):
        self.database = database
        # Initialize the cache as an instance attribute
        # This solves the threading and testing fragility issues
        self.queryables_cache = QueryablesCache(database)```


class TestGlobalFunctions:
@pytest.fixture(autouse=True)
def reset_global_cache(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to remove this fixture after the other changes have been made.

@bountx
Copy link
Contributor Author

bountx commented Dec 1, 2025

  1. Moved to Core - The queryables.py module (containing QueryablesCache and validation logic) has been moved from sfeos_helpers to core. The Core library no longer depends on implementation-specific helpers.

  2. Added get_queryables_mapping to BaseDatabaseLogic - added get_queryables_mapping as an abstract method to BaseDatabaseLogic in base_database_logic.py. Both Elasticsearch and OpenSearch implementations now implement this method, allowing the validator to work generically without knowing about specific database implementations.

  3. Removed Global State - The global _queryables_cache_instance variable has been completely removed. Instead, QueryablesCache is now instantiated as an instance attribute in CoreClient.__attrs_post_init__()
    This resolves threading and testing fragility issues.

  4. Updated Tests - Removed the reset_global_cache fixture from stac_fastapi/tests/sfeos_helpers/test_queryables.py as it's no longer needed with instance-based caching.

  5. Clean Module Structure - All imports now flow correctly: CoreClient imports from stac_fastapi.core.queryables, and the queryables module only depends on BaseDatabaseLogic interfaces.

@jonhealy1

@bountx bountx requested a review from jonhealy1 December 1, 2025 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants