Skip to content

New component: sqlquerymasking processor #44097

@vidam-io

Description

@vidam-io

Building and distributing outside this repository

  • I understand I can use and distribute my component outside of this repository.

The purpose and use-cases of the new component

Purpose

The sqlquerymasking processor masks sensitive data in database logs while preserving query structure for performance analysis and observability.

Problem Statement

Database logs are essential for performance monitoring but often contain:

  • Sensitive user data in WHERE clauses (WHERE email = 'john@example.com')
  • Authentication credentials in connection strings
    Simple regex-based masking makes structurally identical queries appear different, preventing proper query analysis and performance aggregation.

Solution

This processor uses SQL-aware parsing to normalize queries by replacing literal values with placeholders while maintaining query structure:

-- Original (contains sensitive data)
INSERT INTO users VALUES ('john@example.com', 'password123', 30)

-- Masked (preserves structure)
INSERT INTO users VALUES ($1, $2, $3)

Additionally, it generates query fingerprints (cryptographic hashes) to uniquely identify query patterns for aggregation and performance tracking.

Libraries Used

Key Features

  • SQL-Aware Parsing: Uses production-tested parser libraries instead of regex patterns
  • Query Fingerprinting: Generates stable cryptographic hashes for identical query structures
  • Multi-DBMS Support: PostgreSQL, MySQL supported

Example configuration for the component

Basic Configuration

processors:
  sqlquerymasking:
    # DBMS type: postgresql, mysql
    dbms_type: postgresql

    # Field path containing the SQL query
    query_field: body.statement

    # Output configuration
    output:
      # Where to write the normalized/masked query
      masked_query_field: attributes.masking_query

      # Where to write the query fingerprint (cryptographic hash)
      fingerprint_field: attributes.masking_query_id

service:
  pipelines:
    logs/postgresql-slow:
      receivers: [filelog/postgresql]
      processors:
        - sqlquerymasking
        - batch
      exporters: [otlphttp]

Input/Output Example

Input log record:

{
  "timestamp": "2025-05-20T15:03:15Z",
  "severity": "INFO",
  "body": {
    "statement": "SELECT * FROM users WHERE email = 'john@example.com' AND age > 25",
    "duration": "1003.405 ms"
  },
  "attributes": {
    "dbms_type": "postgresql",
    "database_name": "production_db"
  }
}

Output log record (after processing):

{
  "timestamp": "2025-05-20T15:03:15Z",
  "severity": "INFO",
  "body": {
    "statement": "SELECT * FROM users WHERE email = 'john@example.com' AND age > 25",
    "duration": "1003.405 ms"
  },
  "attributes": {
    "dbms_type": "postgresql",
    "database_name": "production_db",
    "masking_query": "SELECT * FROM users WHERE email = $1 AND age > $2",
    "masking_query_id": "3a8f7b2c1d4e5f6a7b8c9d0e1f2a3b4c"
  }
}

Telemetry data types supported

Logs only

Code Owner(s)

vidam-io

Sponsor (optional)

Kakao Corp

Additional context

No response

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions