From 21178de48014c61f561c690716fbb9b8ea78372d Mon Sep 17 00:00:00 2001 From: Oleg Mukhin Date: Sat, 19 Jul 2025 15:50:40 +0100 Subject: [PATCH 1/5] pipeline: filters: lookup added new filter New documentation page outlines description, example configuration and CSV handling for the new LookUp filter. Signed-off-by: Oleg Mukhin --- pipeline/filters/lookup.md | 134 +++++++++++++++++++++++++++++++++++++ 1 file changed, 134 insertions(+) create mode 100644 pipeline/filters/lookup.md diff --git a/pipeline/filters/lookup.md b/pipeline/filters/lookup.md new file mode 100644 index 000000000..298c57187 --- /dev/null +++ b/pipeline/filters/lookup.md @@ -0,0 +1,134 @@ +# Lookup + +The Lookup plugin looks up a key value from a record in a specified CSV file and, if a match is found, adds the corresponding value from the CSV as a new key-value pair to the record. + +## Configuration parameters + +The plugin supports the following configuration parameters + +| Key | Description | Default | +| :-- | :---------- | :------ | +| `file` | The CSV file that Fluent Bit will use as a lookup table. The file should contain two columns (key and value), with the first row as an optional header that is skipped. Supports quoted fields and escaped quotes. | _none_ | +| `lookup_key` | The specific key in the input record to look up in the CSV file's first column. Supports [record accessor](../../administration/configuring-fluent-bit/record-accessor). | _none_ | +| `result_key` | The name of the key to add to the output record with the matched value from the CSV file's second column if a match is found. | _none_ | +| `ignore_case` | Ignore case when matching the lookup key against the CSV keys. | `false` | + +## Example configuration + +{% tabs %} +{% tab title="fluent-bit.yaml" %} + +```yaml +parsers: + - name: json + format: json + +pipeline: + inputs: + - name: tail + tag: test + path: devices.log + read_from_head: true + parser: json + + filters: + - name: lookup + match: test + file: device-bu.csv + lookup_key: $hostname + result_key: business_line + ignore_case: true + + outputs: + - name: stdout + match: test +``` + +{% endtab %} +{% tab title="fluent-bit.conf" %} + +```text +[PARSER] + Name json + Format json + +[INPUT] + Name tail + Tag test + Path devices.log + Read_from_head On + Parser json + +[FILTER] + Name lookup + Match test + File device-bu.csv + Lookup_key $hostname + Result_key business_line + Ignore_case On + +[OUTPUT] + Name stdout + Match test +``` + +{% endtab %} +{% endtabs %} + +The following configuration reads log records from `devices.log` that includes the following values for device hostnames: + +```text +{"hostname": "server-prod-001"} +{"hostname": "Server-Prod-001"} +{"hostname": "db-test-abc"} +{"hostname": 123} +{"hostname": true} +{"hostname": " host with space "} +{"hostname": "quoted \"host\""} +{"hostname": "unknown-host"} +{} +{"hostname": [1,2,3]} +{"hostname": {"sub": "val"}} +{"hostname": " "} +``` + +It uses the value of the `hostname` field (which has been set as the `lookup_key`) to find matching values in column 1 of the (`device-bu.csv`) CSV file. + +```text +hostname,business_line +server-prod-001,Finance +db-test-abc,Engineering +db-test-abc,Marketing +web-frontend-xyz,Marketing +app-backend-123,Operations +"legacy-system true","Legacy IT" +" host with space ","Infrastructure" +"quoted ""host""", "R&D" +no-match-host,Should Not Appear +``` + +Where a match is found the filter adds new key (name of which is set by the `result_key` input) with the value from the second column of the CSV file of the matched row. + +For above configuration the following output can be expected (when matching case is ignored as `ignore_case` is set to true): + +```text +{"hostname"=>"server-prod-001", "business_line"=>"Finance"} +{"hostname"=>"Server-Prod-001", "business_line"=>"Finance"} +{"hostname"=>"db-test-abc", "business_line"=>"Marketing"} +{"hostname"=>123} +{"hostname"=>true} +{"hostname"=>" host with space ", "business_line"=>"Infrastructure"} +{"hostname"=>"quoted "host"", "business_line"=>"R&D"} +{"hostname"=>"unknown-host"} +{} +{"hostname"=>[1, 2, 3]} +{"hostname"=>{"sub"=>"val"}} +``` + +## CSV import + +The CSV is used to create an in-memory key value lookup table. Column 1 of the CSV is always used as key, while column 2 is assumed to be the value. All other columns in the CSV are ignored. + +This filter is intended for static datasets. CSV is loaded once when Fluent Bit starts and is not reloaded. + +Multiline values in CSV file are not currently supported. From e0d77618dd50b814ce60540cad639765984ca30a Mon Sep 17 00:00:00 2001 From: Oleg Mukhin Date: Mon, 24 Nov 2025 17:24:05 +0000 Subject: [PATCH 2/5] pipeline: filters: lookup added new filter Updated inputs based on code changes. Added metrics section. Made key considerations clearer with a separate section. Signed-off-by: Oleg Mukhin --- pipeline/filters/lookup.md | 58 +++++++++++++++++++++++++------------- 1 file changed, 38 insertions(+), 20 deletions(-) diff --git a/pipeline/filters/lookup.md b/pipeline/filters/lookup.md index 298c57187..bf80e5225 100644 --- a/pipeline/filters/lookup.md +++ b/pipeline/filters/lookup.md @@ -4,14 +4,15 @@ The Lookup plugin looks up a key value from a record in a specified CSV file and ## Configuration parameters -The plugin supports the following configuration parameters +The plugin supports the following configuration parameters: | Key | Description | Default | | :-- | :---------- | :------ | -| `file` | The CSV file that Fluent Bit will use as a lookup table. The file should contain two columns (key and value), with the first row as an optional header that is skipped. Supports quoted fields and escaped quotes. | _none_ | -| `lookup_key` | The specific key in the input record to look up in the CSV file's first column. Supports [record accessor](../../administration/configuring-fluent-bit/record-accessor). | _none_ | -| `result_key` | The name of the key to add to the output record with the matched value from the CSV file's second column if a match is found. | _none_ | -| `ignore_case` | Ignore case when matching the lookup key against the CSV keys. | `false` | +| `data_source` | Path to the CSV file that the Lookup filter will use as a lookup table. This file must contain one column of keys and one column of values. See [Key Considerations](#key-considerations) for details. | _none_ (required) | +| `lookup_key` | Specifies the record key whose value to search for in the CSV file's first column. Supports [record accessor](../administration/configuring-fluent-bit/classic-mode/record-accessor) syntax for nested fields and array indexing (e.g., `$user['profile']['id']`, `$users[0]['id']`). | _none_ (required) | +| `result_key` | If a CSV entry whose value matches the value of `lookup_key` is found, specifies the name of the new key to add to the output record. This new key uses the corresponding value from the second column of the CSV file in the same row where `lookup_key` was found. If this key already exists in the record, it will be overwritten. | _none_ (required) | +| `ignore_case` | Specifies whether to ignore case when searching for `lookup_key`. If `true`, searches are case-insensitive. If `false`, searches are case-sensitive. Case normalization applies to both the lookup key from the record and the keys in the CSV file. | `false` | +| `skip_header_row` | If `true`, the filter skips the first row of the CSV file, treating it as a header. If `false`, the first row is processed as data. | `false` | ## Example configuration @@ -34,10 +35,11 @@ pipeline: filters: - name: lookup match: test - file: device-bu.csv + data_source: device-bu.csv lookup_key: $hostname result_key: business_line ignore_case: true + skip_header_row: true outputs: - name: stdout @@ -60,12 +62,13 @@ pipeline: Parser json [FILTER] - Name lookup - Match test - File device-bu.csv - Lookup_key $hostname - Result_key business_line - Ignore_case On + Name lookup + Match test + data_source device-bu.csv + Lookup_key $hostname + Result_key business_line + Ignore_case On + Skip_header_row On [OUTPUT] Name stdout @@ -75,7 +78,7 @@ pipeline: {% endtab %} {% endtabs %} -The following configuration reads log records from `devices.log` that includes the following values for device hostnames: +The previous configuration reads log records from `devices.log` that includes the following values in the `hostname` field: ```text {"hostname": "server-prod-001"} @@ -92,7 +95,7 @@ The following configuration reads log records from `devices.log` that includes t {"hostname": " "} ``` -It uses the value of the `hostname` field (which has been set as the `lookup_key`) to find matching values in column 1 of the (`device-bu.csv`) CSV file. +Because `hostname` was set as the `lookup_key`, the Lookup filter uses the value of each `hostname` key within the record to search for matching values in the first column of the CSV file. ```text hostname,business_line @@ -107,9 +110,9 @@ app-backend-123,Operations no-match-host,Should Not Appear ``` -Where a match is found the filter adds new key (name of which is set by the `result_key` input) with the value from the second column of the CSV file of the matched row. +When the filter finds a match, it adds a new key with the name specified by `result_key` and a value from the second column of the CSV file of the row where `lookup_key` was found. -For above configuration the following output can be expected (when matching case is ignored as `ignore_case` is set to true): +For the above configuration the following output can be expected (when matching case is ignored as `ignore_case` is set to true): ```text {"hostname"=>"server-prod-001", "business_line"=>"Finance"} @@ -125,10 +128,25 @@ For above configuration the following output can be expected (when matching case {"hostname"=>{"sub"=>"val"}} ``` -## CSV import +## Metrics -The CSV is used to create an in-memory key value lookup table. Column 1 of the CSV is always used as key, while column 2 is assumed to be the value. All other columns in the CSV are ignored. +When metrics support is enabled, the Lookup filter exposes the following counters to help monitor filter performance and effectiveness: -This filter is intended for static datasets. CSV is loaded once when Fluent Bit starts and is not reloaded. +| Metric Name | Description | +| :---------- | :---------- | +| `fluentbit_filter_lookup_processed_records_total` | Total number of records processed by the filter | +| `fluentbit_filter_lookup_matched_records_total` | Total number of records where a lookup match was found and the result key was added | +| `fluentbit_filter_lookup_skipped_records_total` | Total number of records skipped due to encoding errors or other processing failures | -Multiline values in CSV file are not currently supported. +Each metric includes a `name` label to identify the filter instance. + +## Key considerations + +- The CSV is used to create an in-memory key value lookup table. Column 1 of the CSV is always used as key, while column 2 is assumed to be the value. All other columns in the CSV are ignored. +- CSV fields can be enclosed in double quotes (`"`). Lines with unmatched quotes are logged as warnings and skipped. +- Multiline values in CSV file are not currently supported. +- Duplicate keys (values in first column) in the CSV will use the last occurrence (hash table behavior) +- Leading and trailing whitespace is automatically trimmed from both keys and values. +- The `lookup_key` can be of various types: strings are used directly, integers and floats are converted to their string representation, booleans become "true" or "false", and null becomes "null". Records with array or object values for the lookup key are passed through unchanged. +- Records without the `lookup_key` field or with no matching CSV entry are passed through unchanged. +- This filter is currently intended for static datasets. CSV is loaded once when Fluent Bit starts and is not reloaded. From 604b0b1a1b2c7f90c01b25c876b9581ccb33e5c5 Mon Sep 17 00:00:00 2001 From: Oleg Mukhin Date: Mon, 24 Nov 2025 17:29:43 +0000 Subject: [PATCH 3/5] pipeline: filters: added link from summary page Added link from summary.md (overwritten by conflit). Signed-off-by: Oleg Mukhin --- SUMMARY.md | 1 + 1 file changed, 1 insertion(+) diff --git a/SUMMARY.md b/SUMMARY.md index de9f63829..48bba1231 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -150,6 +150,7 @@ * [Grep](pipeline/filters/grep.md) * [Kubernetes](pipeline/filters/kubernetes.md) * [Logs to metrics](pipeline/filters/log_to_metrics.md) + * [Lookup](pipeline/filters/lookup.md) * [Lua](pipeline/filters/lua.md) * [Modify](pipeline/filters/modify.md) * [Multiline](pipeline/filters/multiline-stacktrace.md) From dab162178d4682f3e7daaf426829c7f1eb7392ab Mon Sep 17 00:00:00 2001 From: Oleg Mukhin Date: Tue, 25 Nov 2025 10:49:53 +0000 Subject: [PATCH 4/5] pipeline: filters: addressed review comments Minor changes in language as recommended by vale. Signed-off-by: Oleg Mukhin --- pipeline/filters/lookup.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/pipeline/filters/lookup.md b/pipeline/filters/lookup.md index bf80e5225..140c4e322 100644 --- a/pipeline/filters/lookup.md +++ b/pipeline/filters/lookup.md @@ -1,6 +1,6 @@ # Lookup -The Lookup plugin looks up a key value from a record in a specified CSV file and, if a match is found, adds the corresponding value from the CSV as a new key-value pair to the record. +The Lookup plugin searches for a record key's value in a CSV file's first column and adds the matching row's second column value as a new key-value pair if found. ## Configuration parameters @@ -9,7 +9,7 @@ The plugin supports the following configuration parameters: | Key | Description | Default | | :-- | :---------- | :------ | | `data_source` | Path to the CSV file that the Lookup filter will use as a lookup table. This file must contain one column of keys and one column of values. See [Key Considerations](#key-considerations) for details. | _none_ (required) | -| `lookup_key` | Specifies the record key whose value to search for in the CSV file's first column. Supports [record accessor](../administration/configuring-fluent-bit/classic-mode/record-accessor) syntax for nested fields and array indexing (e.g., `$user['profile']['id']`, `$users[0]['id']`). | _none_ (required) | +| `lookup_key` | Specifies the record key whose value to search for in the CSV file's first column. Supports [record accessor](../administration/configuring-fluent-bit/classic-mode/record-accessor) syntax for nested fields and array indexing (for example, `$user['profile']['id']`, `$users[0]['id']`). | _none_ (required) | | `result_key` | If a CSV entry whose value matches the value of `lookup_key` is found, specifies the name of the new key to add to the output record. This new key uses the corresponding value from the second column of the CSV file in the same row where `lookup_key` was found. If this key already exists in the record, it will be overwritten. | _none_ (required) | | `ignore_case` | Specifies whether to ignore case when searching for `lookup_key`. If `true`, searches are case-insensitive. If `false`, searches are case-sensitive. Case normalization applies to both the lookup key from the record and the keys in the CSV file. | `false` | | `skip_header_row` | If `true`, the filter skips the first row of the CSV file, treating it as a header. If `false`, the first row is processed as data. | `false` | @@ -112,7 +112,7 @@ no-match-host,Should Not Appear When the filter finds a match, it adds a new key with the name specified by `result_key` and a value from the second column of the CSV file of the row where `lookup_key` was found. -For the above configuration the following output can be expected (when matching case is ignored as `ignore_case` is set to true): +For the previous configuration the following output can be expected (when matching case is ignored as `ignore_case` is set to true): ```text {"hostname"=>"server-prod-001", "business_line"=>"Finance"} @@ -144,9 +144,9 @@ Each metric includes a `name` label to identify the filter instance. - The CSV is used to create an in-memory key value lookup table. Column 1 of the CSV is always used as key, while column 2 is assumed to be the value. All other columns in the CSV are ignored. - CSV fields can be enclosed in double quotes (`"`). Lines with unmatched quotes are logged as warnings and skipped. -- Multiline values in CSV file are not currently supported. +- Multiline values in CSV file aren't currently supported. - Duplicate keys (values in first column) in the CSV will use the last occurrence (hash table behavior) - Leading and trailing whitespace is automatically trimmed from both keys and values. -- The `lookup_key` can be of various types: strings are used directly, integers and floats are converted to their string representation, booleans become "true" or "false", and null becomes "null". Records with array or object values for the lookup key are passed through unchanged. +- The `lookup_key` can be of various types: strings are used directly, integers and floats are converted to their string representation, booleans become `true` or `false`, and null becomes `null`. Records with array or object values for the lookup key are passed through unchanged. - Records without the `lookup_key` field or with no matching CSV entry are passed through unchanged. -- This filter is currently intended for static datasets. CSV is loaded once when Fluent Bit starts and is not reloaded. +- This filter is currently intended for static datasets. CSV is loaded once when Fluent Bit starts and isn't reloaded. From 5600eddaa391668ff10d9713b303946ac0e8052e Mon Sep 17 00:00:00 2001 From: Oleg Mukhin Date: Tue, 25 Nov 2025 12:35:02 +0000 Subject: [PATCH 5/5] pipeline: filters: fixed ra link Fixed broken record assessor link Minor grammar enhancement Signed-off-by: Oleg Mukhin --- pipeline/filters/lookup.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pipeline/filters/lookup.md b/pipeline/filters/lookup.md index 140c4e322..6f9e17017 100644 --- a/pipeline/filters/lookup.md +++ b/pipeline/filters/lookup.md @@ -9,7 +9,7 @@ The plugin supports the following configuration parameters: | Key | Description | Default | | :-- | :---------- | :------ | | `data_source` | Path to the CSV file that the Lookup filter will use as a lookup table. This file must contain one column of keys and one column of values. See [Key Considerations](#key-considerations) for details. | _none_ (required) | -| `lookup_key` | Specifies the record key whose value to search for in the CSV file's first column. Supports [record accessor](../administration/configuring-fluent-bit/classic-mode/record-accessor) syntax for nested fields and array indexing (for example, `$user['profile']['id']`, `$users[0]['id']`). | _none_ (required) | +| `lookup_key` | Specifies the record key whose value to search for in the CSV file's first column. Supports [record accessor](../../administration/configuring-fluent-bit/classic-mode/record-accessor.md) syntax for nested fields and array indexing (for example, `$user['profile']['id']`, `$users[0]['id']`). | _none_ (required) | | `result_key` | If a CSV entry whose value matches the value of `lookup_key` is found, specifies the name of the new key to add to the output record. This new key uses the corresponding value from the second column of the CSV file in the same row where `lookup_key` was found. If this key already exists in the record, it will be overwritten. | _none_ (required) | | `ignore_case` | Specifies whether to ignore case when searching for `lookup_key`. If `true`, searches are case-insensitive. If `false`, searches are case-sensitive. Case normalization applies to both the lookup key from the record and the keys in the CSV file. | `false` | | `skip_header_row` | If `true`, the filter skips the first row of the CSV file, treating it as a header. If `false`, the first row is processed as data. | `false` | @@ -142,7 +142,7 @@ Each metric includes a `name` label to identify the filter instance. ## Key considerations -- The CSV is used to create an in-memory key value lookup table. Column 1 of the CSV is always used as key, while column 2 is assumed to be the value. All other columns in the CSV are ignored. +- The CSV is used to create an in-memory key-value lookup table. Column 1 of the CSV is always used as key, while column 2 is assumed to be the value. All other columns in the CSV are ignored. - CSV fields can be enclosed in double quotes (`"`). Lines with unmatched quotes are logged as warnings and skipped. - Multiline values in CSV file aren't currently supported. - Duplicate keys (values in first column) in the CSV will use the last occurrence (hash table behavior)