You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: 'Frequently asked questions about ClickPipes for MongoDB.'
4
+
slug: /integrations/clickpipes/mongodb/faq
5
+
sidebar_position: 2
6
+
title: 'ClickPipes for MongoDB FAQ'
7
+
---
8
+
9
+
# ClickPipes for MongoDB FAQ
10
+
11
+
### Can I query for individual fields in the JSON datatype? {#can-i-query-for-individual-fields-in-the-json-datatype}
12
+
13
+
For direct field access, such as `{"user_id": 123}`, you can use **dot notation**:
14
+
```sql
15
+
SELECTdoc.user_idas user_id FROM your_table;
16
+
```
17
+
For direct field access of nested object fields, such as `{"address": { "city": "San Francisco", "state": "CA" }}`, use the `^` operator:
18
+
```sql
19
+
SELECT doc.^address.cityAS city FROM your_table;
20
+
```
21
+
For aggregations, cast the field to the appropriate type with the `CAST` function or `::` syntax:
22
+
```sql
23
+
SELECTsum(doc.shipping.cost::Float32) AS total_shipping_cost FROM t1;
24
+
```
25
+
To learn more about working with JSON, see our [Working with JSON guide](./quickstart).
26
+
27
+
### How do I flatten the nested MongoDB documents in ClickHouse? {#how-do-i-flatten-the-nested-mongodb-documents-in-clickhouse}
28
+
29
+
MongoDB documents are replicated as JSON type in ClickHouse by default, preserving the nested structure. You have several options to flatten this data. If you want to flatten the data to columns, you can use normal views, materialized views, or query-time access.
30
+
31
+
1.**Normal Views**: Use normal views to encapsulate flattening logic.
32
+
2.**Materialized Views**: For smaller datasets, you can use refreshable materialized with the [`FINAL` modifier](/sql-reference/statements/select/from#final-modifier) to periodically flatten and deduplicate data. For larger datasets, we recommend using incremental materialized views without `FINAL` to flatten the data in real-time, and then deduplicate data at query time.
33
+
3.**Query-time Access**: Instead of flattening, use dot notation to access nested fields directly in queries.
34
+
35
+
For detailed examples, see our [Working with JSON guide](./quickstart).
36
+
37
+
### Can I connect MongoDB databases that don't have a public IP or are in private networks? {#can-i-connect-mongodb-databases-that-dont-have-a-public-ip-or-are-in-private-networks}
38
+
39
+
We support AWS PrivateLink for connecting to MongoDB databases that don't have a public IP or are in private networks. Azure Private Link and GCP Private Service Connect are currently not supported.
40
+
41
+
### What happens if I delete a database/table from my MongoDB database? {#what-happens-if-i-delete-a-database-table-from-my-mongodb-database}
42
+
43
+
When you delete a database/table from MongoDB, ClickPipes will continue running but the dropped database/table will stop replicating changes. The corresponding tables in ClickHouse is preserved.
44
+
45
+
### How does MongoDB CDC Connector handle transactions? {#how-does-mongodb-cdc-connector-handle-transactions}
46
+
47
+
Each document change within a transaction is processed individually to ClickHouse. Changes are applied in the order they appear in the oplog; and only committed changes are replicated to ClickHouse. If a MongoDB transaction is rolled back, those changes won't appear in the change stream.
48
+
49
+
For more examples, see our [Working with JSON guide](./quickstart).
50
+
51
+
### How do I handle `resume of change stream was not possible, as the resume point may no longer be in the oplog.` error? {#resume-point-may-no-longer-be-in-the-oplog-error}
52
+
53
+
This error typically occurs when the oplog is truncated and ClickPipe is unable to resume the change stream at the expected point. To resolve this issue, [resync the ClickPipe](./resync.md). To avoid this issue from recurring, we recommend [increasing the oplog retention period](./source/atlas#enable-oplog-retention) (or [here](./source/generic#enable-oplog-retention) if you are on a self-managed MongoDB).
54
+
55
+
### How is replication managed? {#how-is-replication-managed}
56
+
57
+
We use MongoDB's native Change Streams API to track changes in the database. Change Streams API provides a resumable stream of database changes by leveraging MongoDB's oplog (operations log). ClickPipe uses MongoDB's resume tokens to track the position in the oplog and ensure every change is replicated to ClickHouse.
58
+
59
+
### Which read preference should I use? {#which-read-preference-should-i-use}
60
+
61
+
Which read preference to use depends on your specific use case. If you want to minimize the load on your primary node, we recommend using `secondaryPreferred` read preference. If you want to optimize ingestion latency, we recommend using `primaryPreferred` read preference. For more details, see [MongoDB documentation](https://www.mongodb.com/docs/manual/core/read-preference/#read-preference-modes-1).
62
+
63
+
### Does the MongoDB ClickPipe support Sharded Cluster? {#does-the-mongodb-clickpipe-support-sharded-cluster}
64
+
Yes, the MongoDB ClickPipe supports both Replica Set and Sharded Cluster.
@@ -55,15 +55,15 @@ The replicated tables use this standard schema:
55
55
```shell
56
56
┌─name───────────────┬─type──────────┐
57
57
│ _id │ String │
58
-
│ _full_document │ JSON │
58
+
│ doc │ JSON │
59
59
│ _peerdb_synced_at │ DateTime64(9) │
60
60
│ _peerdb_version │ Int64 │
61
61
│ _peerdb_is_deleted │ Int8 │
62
62
└────────────────────┴───────────────┘
63
63
```
64
64
65
65
-`_id`: Primary key from MongoDB
66
-
-`_full_document`: MongoDB document replicated as JSON data type
66
+
-`doc`: MongoDB document replicated as JSON data type
67
67
-`_peerdb_synced_at`: Records when the row was last synced
68
68
-`_peerdb_version`: Tracks the version of the row; incremented when the row is updated or deleted
69
69
-`_peerdb_is_deleted`: Marks whether the row is deleted
@@ -72,7 +72,7 @@ The replicated tables use this standard schema:
72
72
73
73
ClickPipes maps MongoDB collections into ClickHouse using the `ReplacingMergeTree` table engine family. With this engine, updates are modeled as inserts with a newer version (`_peerdb_version`) of the document for a given primary key (`_id`), enabling efficient handling of updates, replaces, and deletes as versioned inserts.
74
74
75
-
`ReplacingMergeTree` clears out duplicates asynchronously in the background. To guarantee the absence of duplicates for the same row, use the [`FINAL` modifier](https://clickhouse.com/docs/sql-reference/statements/select/from#final-modifier). For example:
75
+
`ReplacingMergeTree` clears out duplicates asynchronously in the background. To guarantee the absence of duplicates for the same row, use the [`FINAL` modifier](/sql-reference/statements/select/from#final-modifier). For example:
76
76
77
77
```sql
78
78
SELECT*FROM t1 FINAL;
@@ -99,21 +99,21 @@ You can directly query JSON fields using dot syntax:
When querying _nested object fields_ using dot syntax, make sure to add the [`^`](https://clickhouse.com/docs/sql-reference/data-types/newjson#reading-json-sub-objects-as-sub-columns) operator:
114
114
115
115
```sql title="Query"
116
-
SELECT_full_document.^shipping as shipping_info FROM t1;
116
+
SELECTdoc.^shipping as shipping_info FROM t1;
117
117
```
118
118
119
119
```shell title="Result"
@@ -127,7 +127,7 @@ SELECT _full_document.^shipping as shipping_info FROM t1;
127
127
In ClickHouse, each field in JSON has `Dynamic` type. Dynamic type allows ClickHouse to store values of any type without knowing the type in advance. You can verify this with the `toTypeName` function:
128
128
129
129
```sql title="Query"
130
-
SELECT toTypeName(_full_document.customer_id) AS type FROM t1;
130
+
SELECT toTypeName(doc.customer_id) AS type FROM t1;
131
131
```
132
132
133
133
```shell title="Result"
@@ -139,7 +139,7 @@ SELECT toTypeName(_full_document.customer_id) AS type FROM t1;
139
139
To examine the underlying data type(s) for a field, you can check with the `dynamicType` function. Note that it's possible to have different data types for the same field name in different rows:
140
140
141
141
```sql title="Query"
142
-
SELECT dynamicType(_full_document.customer_id) AS type FROM t1;
142
+
SELECT dynamicType(doc.customer_id) AS type FROM t1;
143
143
```
144
144
145
145
```shell title="Result"
@@ -153,7 +153,7 @@ SELECT dynamicType(_full_document.customer_id) AS type FROM t1;
153
153
**Example 1: Date parsing**
154
154
155
155
```sql title="Query"
156
-
SELECT parseDateTimeBestEffortOrNull(_full_document.order_date) AS order_date FROM t1;
156
+
SELECT parseDateTimeBestEffortOrNull(doc.order_date) AS order_date FROM t1;
157
157
```
158
158
159
159
```shell title="Result"
@@ -166,8 +166,8 @@ SELECT parseDateTimeBestEffortOrNull(_full_document.order_date) AS order_date FR
SELECT length(_full_document.items) AS item_count FROM t1;
184
+
SELECT length(doc.items) AS item_count FROM t1;
185
185
```
186
186
187
187
```shell title="Result"
@@ -195,14 +195,14 @@ SELECT length(_full_document.items) AS item_count FROM t1;
195
195
[Aggregation functions](https://clickhouse.com/docs/sql-reference/aggregate-functions/combinators) in ClickHouse don't work with dynamic type directly. For example, if you attempt to directly use the `sum` function on a dynamic type, you get the following error:
196
196
197
197
```sql
198
-
SELECTsum(_full_document.shipping.cost) AS shipping_cost FROM t1;
198
+
SELECTsum(doc.shipping.cost) AS shipping_cost FROM t1;
199
199
-- DB::Exception: Illegal type Dynamic of argument for aggregate function sum. (ILLEGAL_TYPE_OF_ARGUMENT)
200
200
```
201
201
202
202
To use aggregation functions, cast the field to the appropriate type with the `CAST` function or `::` syntax:
203
203
204
204
```sql title="Query"
205
-
SELECTsum(_full_document.shipping.cost::Float32) AS shipping_cost FROM t1;
205
+
SELECTsum(doc.shipping.cost::Float32) AS shipping_cost FROM t1;
206
206
```
207
207
208
208
```shell title="Result"
@@ -224,14 +224,14 @@ You can create normal views on top of the JSON table to encapsulate flattening/c
224
224
```sql
225
225
CREATEVIEWv1AS
226
226
SELECT
227
-
CAST(_full_document._id, 'String') AS object_id,
228
-
CAST(_full_document.order_id, 'String') AS order_id,
229
-
CAST(_full_document.customer_id, 'Int64') AS customer_id,
230
-
CAST(_full_document.status, 'String') AS status,
231
-
CAST(_full_document.total_amount, 'Decimal64(2)') AS total_amount,
232
-
CAST(parseDateTime64BestEffortOrNull(_full_document.order_date, 3), 'DATETIME(3)') AS order_date,
233
-
_full_document.^shipping AS shipping_info,
234
-
_full_document.itemsAS items
227
+
CAST(doc._id, 'String') AS object_id,
228
+
CAST(doc.order_id, 'String') AS order_id,
229
+
CAST(doc.customer_id, 'Int64') AS customer_id,
230
+
CAST(doc.status, 'String') AS status,
231
+
CAST(doc.total_amount, 'Decimal64(2)') AS total_amount,
232
+
CAST(parseDateTime64BestEffortOrNull(doc.order_date, 3), 'DATETIME(3)') AS order_date,
You can also create [Refreshable Materialized Views](https://clickhouse.com/docs/materialized-view/refreshable-materialized-view), which enable you to schedule query execution for deduplicating rows and storing the results in a flattened destination table. With each scheduled refresh, the destination table is replaced with the latest query results.
269
+
You can create [Refreshable Materialized Views](https://clickhouse.com/docs/materialized-view/refreshable-materialized-view), which enable you to schedule query execution for deduplicating rows and storing the results in a flattened destination table. With each scheduled refresh, the destination table is replaced with the latest query results.
270
270
271
271
The key advantage of this method is that the query using the `FINAL` keyword runs only once during the refresh, eliminating the need for subsequent queries on the destination table to use `FINAL`.
272
272
273
-
However, a drawback is that the data in the destination table is only as up-to-date as the most recent refresh. For many use cases, refresh intervals ranging from several minutes to a few hours provide a good balance between data freshness and query performance.
273
+
A drawback is that the data in the destination table is only as up-to-date as the most recent refresh. For many use cases, refresh intervals ranging from several minutes to a few hours provide a good balance between data freshness and query performance.
If you want to access flattened columns in real-time, you can create [Incremental Materialized Views](https://clickhouse.com/docs/materialized-view/incremental-materialized-view). If your table has frequent updates, it's not recommended to use the `FINAL` modifier in your materialized view as every update will trigger a merge. Instead, you can deduplicate the data at query time by building a normal view on top of the materialized view.
320
+
321
+
```sql
322
+
CREATETABLEflattened_t1 (
323
+
`_id` String,
324
+
`order_id` String,
325
+
`customer_id` Int64,
326
+
`status` String,
327
+
`total_amount`Decimal(18, 2),
328
+
`order_date` DateTime64(3),
329
+
`shipping_info` JSON,
330
+
`items` Dynamic,
331
+
`_peerdb_version` Int64,
332
+
`_peerdb_synced_at` DateTime64(9),
333
+
`_peerdb_is_deleted` Int8
334
+
)
335
+
ENGINE = ReplacingMergeTree()
336
+
PRIMARY KEY _id
337
+
ORDER BY _id;
338
+
339
+
CREATE MATERIALIZED VIEW imv TO flattened_t1 AS
340
+
SELECT
341
+
CAST(doc._id, 'String') AS _id,
342
+
CAST(doc.order_id, 'String') AS order_id,
343
+
CAST(doc.customer_id, 'Int64') AS customer_id,
344
+
CAST(doc.status, 'String') AS status,
345
+
CAST(doc.total_amount, 'Decimal64(2)') AS total_amount,
346
+
CAST(parseDateTime64BestEffortOrNull(doc.order_date, 3), 'DATETIME(3)') AS order_date,
347
+
doc.^shipping AS shipping_info,
348
+
doc.items,
349
+
_peerdb_version,
350
+
_peerdb_synced_at,
351
+
_peerdb_is_deleted
352
+
FROM t1;
353
+
354
+
CREATEVIEWflattened_t1_finalAS
355
+
SELECT*FROM flattened_t1 FINAL WHERE _peerdb_is_deleted =0;
356
+
```
357
+
358
+
You can now query the view `flattened_t1_final` as follows:
Copy file name to clipboardExpand all lines: docs/integrations/data-ingestion/clickpipes/mysql/index.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ import Image from '@theme/IdealImage';
20
20
<BetaBadge/>
21
21
22
22
:::info
23
-
Currently, ingesting data from MySQL to ClickHouse Cloud via ClickPipes is in public beta.
23
+
Ingesting data from MySQL to ClickHouse Cloud via ClickPipes is in public beta.
24
24
:::
25
25
26
26
You can use ClickPipes to ingest data from your source MySQL database into ClickHouse Cloud. The source MySQL database can be hosted on-premises or in the cloud using services like Amazon RDS, Google Cloud SQL, and others.
0 commit comments