-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Open
Labels
Sponsor NeededNew component seeking sponsorNew component seeking sponsorneeds triageNew item requiring triageNew item requiring triagewaiting for author
Description
Building and distributing outside this repository
- I understand I can use and distribute my component outside of this repository.
The purpose and use-cases of the new component
Purpose
The sqlquerymasking processor masks sensitive data in database logs while preserving query structure for performance analysis and observability.
Problem Statement
Database logs are essential for performance monitoring but often contain:
- Sensitive user data in WHERE clauses (
WHERE email = 'john@example.com') - Authentication credentials in connection strings
Simple regex-based masking makes structurally identical queries appear different, preventing proper query analysis and performance aggregation.
Solution
This processor uses SQL-aware parsing to normalize queries by replacing literal values with placeholders while maintaining query structure:
-- Original (contains sensitive data)
INSERT INTO users VALUES ('john@example.com', 'password123', 30)
-- Masked (preserves structure)
INSERT INTO users VALUES ($1, $2, $3)Additionally, it generates query fingerprints (cryptographic hashes) to uniquely identify query patterns for aggregation and performance tracking.
Libraries Used
- PostgreSQL:
pg_query_golibrary (BSD-3-Clause license) - MySQL:
percona/go-mysqllibrary BSD-3-Clause license
Key Features
- SQL-Aware Parsing: Uses production-tested parser libraries instead of regex patterns
- Query Fingerprinting: Generates stable cryptographic hashes for identical query structures
- Multi-DBMS Support: PostgreSQL, MySQL supported
Example configuration for the component
Basic Configuration
processors:
sqlquerymasking:
# DBMS type: postgresql, mysql
dbms_type: postgresql
# Field path containing the SQL query
query_field: body.statement
# Output configuration
output:
# Where to write the normalized/masked query
masked_query_field: attributes.masking_query
# Where to write the query fingerprint (cryptographic hash)
fingerprint_field: attributes.masking_query_id
service:
pipelines:
logs/postgresql-slow:
receivers: [filelog/postgresql]
processors:
- sqlquerymasking
- batch
exporters: [otlphttp]Input/Output Example
Input log record:
{
"timestamp": "2025-05-20T15:03:15Z",
"severity": "INFO",
"body": {
"statement": "SELECT * FROM users WHERE email = 'john@example.com' AND age > 25",
"duration": "1003.405 ms"
},
"attributes": {
"dbms_type": "postgresql",
"database_name": "production_db"
}
}Output log record (after processing):
{
"timestamp": "2025-05-20T15:03:15Z",
"severity": "INFO",
"body": {
"statement": "SELECT * FROM users WHERE email = 'john@example.com' AND age > 25",
"duration": "1003.405 ms"
},
"attributes": {
"dbms_type": "postgresql",
"database_name": "production_db",
"masking_query": "SELECT * FROM users WHERE email = $1 AND age > $2",
"masking_query_id": "3a8f7b2c1d4e5f6a7b8c9d0e1f2a3b4c"
}
}Telemetry data types supported
Logs only
Code Owner(s)
vidam-io
Sponsor (optional)
Kakao Corp
Additional context
No response
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.
Metadata
Metadata
Assignees
Labels
Sponsor NeededNew component seeking sponsorNew component seeking sponsorneeds triageNew item requiring triageNew item requiring triagewaiting for author