You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/best-practices/json_type.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,23 +25,23 @@ If your data structure is known and consistent, there is rarely a need for the J
25
25
***Predictable nesting**: use Tuple, Array, or Nested types for these structures.
26
26
***Predictable structure with varying types**: consider Dynamic or Variant types instead.
27
27
28
-
You can also mix approaches - for example, use static columns for predictable top-level fields and a single JSON column for a dynamic section of the payload.
28
+
You can also mix approaches—for example, use static columns for predictable top-level fields and a single JSON column for a dynamic section of the payload.
29
29
30
30
## Considerations and tips for using JSON {#considerations-and-tips-for-using-json}
31
31
32
32
The JSON type enables efficient columnar storage by flattening paths into subcolumns. But with flexibility comes responsibility. To use it effectively:
33
33
34
-
***Specify path types** using [hints in the column definition](/sql-reference/data-types/newjson) to specify types for known sub columns, avoiding unnecessary type inference.
34
+
***Specify path types** using [hints in the column definition](/sql-reference/data-types/newjson) to specify types for known subcolumns, avoiding unnecessary type inference.
35
35
***Skip paths** if you don't need the values, with [SKIP and SKIP REGEXP](/sql-reference/data-types/newjson) to reduce storage and improve performance.
36
-
***Avoid setting [`max_dynamic_paths`](/sql-reference/data-types/newjson#reaching-the-limit-of-dynamic-paths-inside-json) too high** - large values increase resource consumption and reduce efficiency. As a rule of thumb, keep it below 10,000.
36
+
***Avoid setting [`max_dynamic_paths`](/sql-reference/data-types/newjson#reaching-the-limit-of-dynamic-paths-inside-json) too high**—large values increase resource consumption and reduce efficiency. As a rule of thumb, keep it below 10,000.
37
37
38
38
:::note Type hints
39
-
Type hints offer more than just a way to avoid unnecessary type inference - they eliminate storage and processing indirection entirely. JSON paths with type hints are always stored just like traditional columns, bypassing the need for [**discriminator columns**](https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse#storage-extension-for-dynamically-changing-data) or dynamic resolution during query time. This means that with well-defined type hints, nested JSON fields achieve the same performance and efficiency as if they were modeled as top-level fields from the outset. As a result, for datasets that are mostly consistent but still benefit from the flexibility of JSON, type hints provide a convenient way to preserve performance without needing to restructure your schema or ingest pipeline.
39
+
Type hints offer more than just a way to avoid unnecessary type inference—they eliminate storage and processing indirection entirely. JSON paths with type hints are always stored just like traditional columns, bypassing the need for [**discriminator columns**](https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse#storage-extension-for-dynamically-changing-data) or dynamic resolution during query time. This means that with well-defined type hints, nested JSON fields achieve the same performance and efficiency as if they were modeled as top-level fields from the outset. As a result, for datasets that are mostly consistent but still benefit from the flexibility of JSON, type hints provide a convenient way to preserve performance without needing to restructure your schema or ingest pipeline.
40
40
:::
41
41
42
42
## Advanced features {#advanced-features}
43
43
44
-
* JSON columns **can be used in primary keys** like any other columns. Codecs cannot be specified for a sub-column.
44
+
* JSON columns **can be used in primary keys** like any other columns. Codecs cannot be specified for a subcolumn.
45
45
* They support introspection via functions like [`JSONAllPathsWithTypes()` and `JSONDynamicPaths()`](/sql-reference/data-types/newjson#introspection-functions).
46
46
* You can read nested sub-objects using the `.^` syntax.
47
47
* Query syntax may differ from standard SQL and may require special casting or operators for nested fields.
@@ -156,7 +156,7 @@ INSERT INTO arxiv FORMAT JSONEachRow
156
156
{"id":"2101.11408","submitter":"Daniel Lemire","authors":"Daniel Lemire","title":"Number Parsing at a Gigabyte per Second","comments":"Software at https://github.com/fastfloat/fast_float and\n https://github.com/lemire/simple_fastfloat_benchmark/","journal-ref":"Software: Practice and Experience 51 (8), 2021","doi":"10.1002/spe.2984","report-no":null,"categories":"cs.DS cs.MS","license":"http://creativecommons.org/licenses/by/4.0/","abstract":"With disks and networks providing gigabytes per second ....\n","versions":[{"created":"Mon, 11 Jan 2021 20:31:27 GMT","version":"v1"},{"created":"Sat, 30 Jan 2021 23:57:29 GMT","version":"v2"}],"update_date":"2022-11-07","authors_parsed":[["Lemire","Daniel",""]]}
157
157
```
158
158
159
-
Suppose another column called `tags` is added. If this was simply a list of strings we could model as an `Array(String)`, but let's assume users can add arbitrary tag structures with mixed types (notice score is a string or integer). Our modified JSON document:
159
+
Suppose another column called `tags` is added. If this was simply a list of strings we could model this as an `Array(String)`, but let's assume users can add arbitrary tag structures with mixed types (notice `score` is a string or integer). Our modified JSON document:
160
160
161
161
```sql
162
162
{
@@ -222,7 +222,7 @@ ORDER BY doc.update_date
222
222
```
223
223
224
224
:::note
225
-
We provide a type hint for the `update_date` column in the JSON definition, as we use it in the ordering/primary key. This helps ClickHouse to know that this column won't be null and ensures it knows which `update_date`sub-column to use (there may be multiple for each type, so this is ambiguous otherwise).
225
+
We provide a type hint for the `update_date` column in the JSON definition, as we use it in the ordering/primary key. This helps ClickHouse to know that this column won't be null and ensures it knows which `update_date`subcolumn to use (there may be multiple for each type, so this is ambiguous otherwise).
226
226
:::
227
227
228
228
We can insert into this table and view the subsequently inferred schema using the [`JSONAllPathsWithTypes`](/sql-reference/functions/json-functions#JSONAllPathsWithTypes) function and [`PrettyJSONEachRow`](/interfaces/formats/PrettyJSONEachRow) output format:
@@ -295,7 +295,7 @@ INSERT INTO arxiv FORMAT JSONEachRow
295
295
{"id":"2101.11408","submitter":"Daniel Lemire","authors":"Daniel Lemire","title":"Number Parsing at a Gigabyte per Second","comments":"Software at https://github.com/fastfloat/fast_float and\n https://github.com/lemire/simple_fastfloat_benchmark/","journal-ref":"Software: Practice and Experience 51 (8), 2021","doi":"10.1002/spe.2984","report-no":null,"categories":"cs.DS cs.MS","license":"http://creativecommons.org/licenses/by/4.0/","abstract":"With disks and networks providing gigabytes per second ....\n","versions":[{"created":"Mon, 11 Jan 2021 20:31:27 GMT","version":"v1"},{"created":"Sat, 30 Jan 2021 23:57:29 GMT","version":"v2"}],"update_date":"2022-11-07","authors_parsed":[["Lemire","Daniel",""]],"tags":{"tag_1":{"name":"ClickHouse user","score":"A+","comment":"A good read, applicable to ClickHouse"},"28_03_2025":{"name":"professor X","score":10,"comment":"Didn't learn much","updates":[{"name":"professor X","comment":"Wolverine found more interesting"}]}}}
296
296
```
297
297
298
-
We can now infer the types of the sub column tags.
298
+
We can now infer the types of the subcolumn `tags`.
0 commit comments