You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/reference/pipeline/built-in-pipelines.md
+42-31Lines changed: 42 additions & 31 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,21 +12,22 @@ Additionally, the "greptime_" prefix of the pipeline name is reserved.
12
12
13
13
## `greptime_identity`
14
14
15
-
The `greptime_identity` pipeline is designed for writing JSON logs and automatically creates columns for each field in the JSON log.
15
+
The `greptime_identity` pipeline is designed for writing JSON logs and automatically creates columns for each field in the JSON log. Nested JSON objects are automatically flattened into separate columns using dot notation.
16
16
17
-
- The first-level keys in the JSON log are used as column names.
18
-
- An error is returned if the same field has different types.
19
-
- Fields with `null` values are ignored.
20
-
- If time index is not specified, an additional column, `greptime_timestamp`, is added to the table as the time index to indicate when the log was written.
- An error is returned if the same field has different types
20
+
- Fields with `null` values are ignored
21
+
- If time index is not specified, an additional column, `greptime_timestamp`, is added to the table as the time index to indicate when the log was written
21
22
22
23
### Type conversion rules
23
24
24
25
-`string` -> `string`
25
26
-`number` -> `int64` or `float64`
26
27
-`boolean` -> `bool`
27
28
-`null` -> ignore
28
-
-`array` -> `json`
29
-
-`object` -> `json`
29
+
-`array` -> `string` (JSON-stringified)
30
+
-`object` -> automatically flattened into separate columns (see [Flatten JSON objects](#flatten-json-objects))
30
31
31
32
32
33
For example, if we have the following json data:
@@ -39,7 +40,7 @@ For example, if we have the following json data:
39
40
]
40
41
```
41
42
42
-
We'll merge the schema for each row of this batch to get the final schema. The table schema will be:
43
+
We'll merge the schema for each row of this batch to get the final schema. Note that nested objects are automatically flattened into separate columns (e.g., `object.a`, `object.b`), and arrays are converted to JSON strings. The table schema will be:
43
44
44
45
```sql
45
46
mysql>desc pipeline_logs;
@@ -49,26 +50,27 @@ mysql> desc pipeline_logs;
49
50
| age | Int64 | | YES | | FIELD |
50
51
| is_student | Boolean | | YES | | FIELD |
51
52
| name | String | | YES | | FIELD |
52
-
| object | Json | | YES | | FIELD |
53
+
| object.a | Int64 | | YES | | FIELD |
54
+
| object.b | Int64 | | YES | | FIELD |
53
55
| score | Float64 | | YES | | FIELD |
54
56
| company | String | | YES | | FIELD |
55
-
| array | Json | | YES | | FIELD |
57
+
| array | String | | YES | | FIELD |
56
58
| greptime_timestamp | TimestampNanosecond | PRI | NO | | TIMESTAMP |
@@ -121,33 +123,38 @@ Here are some example of using `custom_time_index` assuming the time variable is
121
123
122
124
### Flatten JSON objects
123
125
124
-
If flattening a JSON object into a single-level structure is needed, add the `x-greptime-pipeline-params` header to the request and set `flatten_json_object` to `true`.
126
+
The `greptime_identity` pipeline **automatically flattens** nested JSON objects into a single-level structure. This behavior is always enabled and creates separate columns for each nested field using dot notation (e.g., `a.b.c`).
127
+
128
+
#### Controlling flattening depth
129
+
130
+
You can control how deeply nested objects are flattened using the `max_nested_levels` parameter in the `x-greptime-pipeline-params` header. The default value is 10 levels.
With this configuration, GreptimeDB will automatically flatten each field of the JSON object into separate columns. For example:
142
+
When the maximum nesting level is reached, any remaining nested structure is converted to a JSON string and stored in a single column. For example, with `max_nested_levels=3`:
137
143
138
144
```JSON
139
145
{
140
146
"a": {
141
147
"b": {
142
-
"c": [1, 2, 3]
148
+
"c": {
149
+
"d": [1, 2, 3]
150
+
}
143
151
}
144
152
},
145
-
"d": [
153
+
"e": [
146
154
"foo",
147
155
"bar"
148
156
],
149
-
"e": {
150
-
"f": [7, 8, 9],
157
+
"f": {
151
158
"g": {
152
159
"h": 123,
153
160
"i": "hello",
@@ -163,14 +170,18 @@ Will be flattened to:
163
170
164
171
```json
165
172
{
166
-
"a.b.c": [1,2,3],
167
-
"d": ["foo","bar"],
168
-
"e.f": [7,8,9],
169
-
"e.g.h": 123,
170
-
"e.g.i": "hello",
171
-
"e.g.j.k": true
173
+
"a.b.c": "{\"d\":[1,2,3]}",
174
+
"e": "[\"foo\",\"bar\"]",
175
+
"f.g.h": 123,
176
+
"f.g.i": "hello",
177
+
"f.g.j": "{\"k\":true}"
172
178
}
173
179
```
174
180
181
+
Note that:
182
+
- Arrays at any level are always converted to JSON strings (e.g., `"e"` becomes `"[\"foo\",\"bar\"]"`)
183
+
- When the nesting level limit is reached (level 3 in this example), the remaining nested objects are converted to JSON strings (e.g., `"a.b.c"` and `"f.g.j"`)
184
+
- Regular scalar values within the depth limit are stored as their native types (e.g., `"f.g.h"` as integer, `"f.g.i"` as string)
This example demonstrates how to use `greptimedb_logs` sink to write generated demo logs data to GreptimeDB. For more information, please refer to [Vector greptimedb_logs sink](https://vector.dev/docs/reference/configuration/sinks/greptimedb_logs/) documentation.
0 commit comments