Skip to content

Commit adeda46

Browse files
committed
Update json_type.md
1 parent bee7a35 commit adeda46

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

docs/best-practices/json_type.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -25,23 +25,23 @@ If your data structure is known and consistent, there is rarely a need for the J
2525
* **Predictable nesting**: use Tuple, Array, or Nested types for these structures.
2626
* **Predictable structure with varying types**: consider Dynamic or Variant types instead.
2727

28-
You can also mix approaches - for example, use static columns for predictable top-level fields and a single JSON column for a dynamic section of the payload.
28+
You can also mix approachesfor example, use static columns for predictable top-level fields and a single JSON column for a dynamic section of the payload.
2929

3030
## Considerations and tips for using JSON {#considerations-and-tips-for-using-json}
3131

3232
The JSON type enables efficient columnar storage by flattening paths into subcolumns. But with flexibility comes responsibility. To use it effectively:
3333

34-
* **Specify path types** using [hints in the column definition](/sql-reference/data-types/newjson) to specify types for known sub columns, avoiding unnecessary type inference.
34+
* **Specify path types** using [hints in the column definition](/sql-reference/data-types/newjson) to specify types for known subcolumns, avoiding unnecessary type inference.
3535
* **Skip paths** if you don't need the values, with [SKIP and SKIP REGEXP](/sql-reference/data-types/newjson) to reduce storage and improve performance.
36-
* **Avoid setting [`max_dynamic_paths`](/sql-reference/data-types/newjson#reaching-the-limit-of-dynamic-paths-inside-json) too high** - large values increase resource consumption and reduce efficiency. As a rule of thumb, keep it below 10,000.
36+
* **Avoid setting [`max_dynamic_paths`](/sql-reference/data-types/newjson#reaching-the-limit-of-dynamic-paths-inside-json) too high**large values increase resource consumption and reduce efficiency. As a rule of thumb, keep it below 10,000.
3737

3838
:::note Type hints
39-
Type hints offer more than just a way to avoid unnecessary type inference - they eliminate storage and processing indirection entirely. JSON paths with type hints are always stored just like traditional columns, bypassing the need for [**discriminator columns**](https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse#storage-extension-for-dynamically-changing-data) or dynamic resolution during query time. This means that with well-defined type hints, nested JSON fields achieve the same performance and efficiency as if they were modeled as top-level fields from the outset. As a result, for datasets that are mostly consistent but still benefit from the flexibility of JSON, type hints provide a convenient way to preserve performance without needing to restructure your schema or ingest pipeline.
39+
Type hints offer more than just a way to avoid unnecessary type inferencethey eliminate storage and processing indirection entirely. JSON paths with type hints are always stored just like traditional columns, bypassing the need for [**discriminator columns**](https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse#storage-extension-for-dynamically-changing-data) or dynamic resolution during query time. This means that with well-defined type hints, nested JSON fields achieve the same performance and efficiency as if they were modeled as top-level fields from the outset. As a result, for datasets that are mostly consistent but still benefit from the flexibility of JSON, type hints provide a convenient way to preserve performance without needing to restructure your schema or ingest pipeline.
4040
:::
4141

4242
## Advanced features {#advanced-features}
4343

44-
* JSON columns **can be used in primary keys** like any other columns. Codecs cannot be specified for a sub-column.
44+
* JSON columns **can be used in primary keys** like any other columns. Codecs cannot be specified for a subcolumn.
4545
* They support introspection via functions like [`JSONAllPathsWithTypes()` and `JSONDynamicPaths()`](/sql-reference/data-types/newjson#introspection-functions).
4646
* You can read nested sub-objects using the `.^` syntax.
4747
* Query syntax may differ from standard SQL and may require special casting or operators for nested fields.
@@ -156,7 +156,7 @@ INSERT INTO arxiv FORMAT JSONEachRow
156156
{"id":"2101.11408","submitter":"Daniel Lemire","authors":"Daniel Lemire","title":"Number Parsing at a Gigabyte per Second","comments":"Software at https://github.com/fastfloat/fast_float and\n https://github.com/lemire/simple_fastfloat_benchmark/","journal-ref":"Software: Practice and Experience 51 (8), 2021","doi":"10.1002/spe.2984","report-no":null,"categories":"cs.DS cs.MS","license":"http://creativecommons.org/licenses/by/4.0/","abstract":"With disks and networks providing gigabytes per second ....\n","versions":[{"created":"Mon, 11 Jan 2021 20:31:27 GMT","version":"v1"},{"created":"Sat, 30 Jan 2021 23:57:29 GMT","version":"v2"}],"update_date":"2022-11-07","authors_parsed":[["Lemire","Daniel",""]]}
157157
```
158158

159-
Suppose another column called `tags` is added. If this was simply a list of strings we could model as an `Array(String)`, but let's assume users can add arbitrary tag structures with mixed types (notice score is a string or integer). Our modified JSON document:
159+
Suppose another column called `tags` is added. If this was simply a list of strings we could model this as an `Array(String)`, but let's assume users can add arbitrary tag structures with mixed types (notice `score` is a string or integer). Our modified JSON document:
160160

161161
```sql
162162
{
@@ -222,7 +222,7 @@ ORDER BY doc.update_date
222222
```
223223

224224
:::note
225-
We provide a type hint for the `update_date` column in the JSON definition, as we use it in the ordering/primary key. This helps ClickHouse to know that this column won't be null and ensures it knows which `update_date` sub-column to use (there may be multiple for each type, so this is ambiguous otherwise).
225+
We provide a type hint for the `update_date` column in the JSON definition, as we use it in the ordering/primary key. This helps ClickHouse to know that this column won't be null and ensures it knows which `update_date` subcolumn to use (there may be multiple for each type, so this is ambiguous otherwise).
226226
:::
227227

228228
We can insert into this table and view the subsequently inferred schema using the [`JSONAllPathsWithTypes`](/sql-reference/functions/json-functions#JSONAllPathsWithTypes) function and [`PrettyJSONEachRow`](/interfaces/formats/PrettyJSONEachRow) output format:
@@ -295,7 +295,7 @@ INSERT INTO arxiv FORMAT JSONEachRow
295295
{"id":"2101.11408","submitter":"Daniel Lemire","authors":"Daniel Lemire","title":"Number Parsing at a Gigabyte per Second","comments":"Software at https://github.com/fastfloat/fast_float and\n https://github.com/lemire/simple_fastfloat_benchmark/","journal-ref":"Software: Practice and Experience 51 (8), 2021","doi":"10.1002/spe.2984","report-no":null,"categories":"cs.DS cs.MS","license":"http://creativecommons.org/licenses/by/4.0/","abstract":"With disks and networks providing gigabytes per second ....\n","versions":[{"created":"Mon, 11 Jan 2021 20:31:27 GMT","version":"v1"},{"created":"Sat, 30 Jan 2021 23:57:29 GMT","version":"v2"}],"update_date":"2022-11-07","authors_parsed":[["Lemire","Daniel",""]],"tags":{"tag_1":{"name":"ClickHouse user","score":"A+","comment":"A good read, applicable to ClickHouse"},"28_03_2025":{"name":"professor X","score":10,"comment":"Didn't learn much","updates":[{"name":"professor X","comment":"Wolverine found more interesting"}]}}}
296296
```
297297

298-
We can now infer the types of the sub column tags.
298+
We can now infer the types of the subcolumn `tags`.
299299

300300
```sql
301301
SELECT JSONAllPathsWithTypes(tags)

0 commit comments

Comments
 (0)