diff --git a/docs/integrations/data-ingestion/clickpipes/aws-privatelink.md b/docs/integrations/data-ingestion/clickpipes/aws-privatelink.md index b67fbfc15de..96cd8189d99 100644 --- a/docs/integrations/data-ingestion/clickpipes/aws-privatelink.md +++ b/docs/integrations/data-ingestion/clickpipes/aws-privatelink.md @@ -33,21 +33,25 @@ data source types: - Kafka - Postgres - MySQL +- MongoDB ## Supported AWS PrivateLink endpoint types {#aws-privatelink-endpoint-types} ClickPipes reverse private endpoint can be configured with one of the following AWS PrivateLink approaches: -- [VPC resource](https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-access-resources.html) -- [MSK multi-VPC connectivity for MSK ClickPipe](https://docs.aws.amazon.com/msk/latest/developerguide/aws-access-mult-vpc.html) -- [VPC endpoint service](https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-share-your-services.html) +- [VPC resource](#vpc-resource) +- [MSK multi-VPC connectivity for MSK ClickPipe](#msk-multi-vpc) +- [VPC endpoint service](#vpc-endpoint-service) ### VPC resource {#vpc-resource} -Your VPC resources can be accessed in ClickPipes using PrivateLink and [AWS VPC Lattice](https://docs.aws.amazon.com/vpc-lattice/latest/ug/what-is-vpc-lattice.html). This approach doesn't require setting up a load balancer in front of your data source. +:::info +Cross-region is not supported. +::: + +Your [VPC resources](https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-access-resources.html) can be accessed in ClickPipes using PrivateLink and [AWS VPC Lattice](https://docs.aws.amazon.com/vpc-lattice/latest/ug/what-is-vpc-lattice.html). Unlike VPC endpoint service, this approach doesn't require setting up a load balancer in front of your data source. Resource configuration can be targeted with a specific host or RDS cluster ARN. -Cross-region is not supported. It's the preferred choice for Postgres CDC ingesting data from an RDS cluster. @@ -151,8 +155,7 @@ Follow our [MSK setup guide for ClickPipes](/knowledgebase/aws-privatelink-setup It requires setting up a NLB (Network Load Balancer) in front of your data source and configuring the VPC endpoint service to use the NLB. -VPC endpoint service can be [configured with a private DNS](https://docs.aws.amazon.com/vpc/latest/privatelink/manage-dns-names.html), -that will be accessible in a ClickPipes VPC. +VPC endpoint service can be [configured with a private DNS](https://docs.aws.amazon.com/vpc/latest/privatelink/manage-dns-names.html), that will be accessible in a ClickPipes VPC. It's a preferred choice for: diff --git a/docs/integrations/data-ingestion/clickpipes/mongodb/faq.md b/docs/integrations/data-ingestion/clickpipes/mongodb/faq.md index d542744890e..697f0f60036 100644 --- a/docs/integrations/data-ingestion/clickpipes/mongodb/faq.md +++ b/docs/integrations/data-ingestion/clickpipes/mongodb/faq.md @@ -52,7 +52,7 @@ For more examples, see our [Working with JSON guide](./quickstart). ### How do I handle `resume of change stream was not possible, as the resume point may no longer be in the oplog.` error? {#resume-point-may-no-longer-be-in-the-oplog-error} -This error typically occurs when the oplog is truncated and ClickPipe is unable to resume the change stream at the expected point. To resolve this issue, [resync the ClickPipe](./resync.md). To avoid this issue from recurring, we recommend [increasing the oplog retention period](./source/atlas#enable-oplog-retention) (or [here](./source/generic#enable-oplog-retention) if you are on a self-managed MongoDB). +This error typically occurs when the oplog is truncated and ClickPipe is unable to resume the change stream at the expected point. To resolve this issue, [resync the ClickPipe](./resync.md). To avoid this issue from recurring, we recommend increasing the oplog retention period. See instructions for [MongoDB Atlas](./source/atlas#enable-oplog-retention), [self-managed MongoDB](./source/generic#enable-oplog-retention), or [Amazon DocumentDB](./source/docdb#configure-change-stream-log-retention). ### How is replication managed? {#how-is-replication-managed} @@ -63,4 +63,21 @@ We use MongoDB's native Change Streams API to track changes in the database. Cha Which read preference to use depends on your specific use case. If you want to minimize the load on your primary node, we recommend using `secondaryPreferred` read preference. If you want to optimize ingestion latency, we recommend using `primaryPreferred` read preference. For more details, see [MongoDB documentation](https://www.mongodb.com/docs/manual/core/read-preference/#read-preference-modes-1). ### Does the MongoDB ClickPipe support Sharded Cluster? {#does-the-mongodb-clickpipe-support-sharded-cluster} + Yes, the MongoDB ClickPipe supports both Replica Set and Sharded Cluster. + +### Does MongoDB ClickPipe support Amazon DocumentDB? {#documentdb-support} + +Yes, MongoDB ClickPipe supports Amazon DocumentDB 5.0. See [Amazon DocumentDB source setup guide](./source/docdb.md) for details. + +### Does MongoDB ClickPipe support PrivateLink? {#privatelink-support} + +We support PrivateLink for MongoDB (and DocumentDB) cluster in AWS only. + +Note that unlike single-node relational database, MongoDB client requires successful replica set discovery to be able to respect the configured `ReadPreference`. This requires setting up PrivateLink with all the nodes in the cluster so the MongoDB client can successfully establish replica set connection, as well as redirect to another node when the connected node goes down. + +If you prefer to connect to a single node in your cluster, you can skip replica set discovery by specifying `/?directConnection=true` in the connection string during ClickPipes setup. The PrivateLink setup in this case will be similar to a single-node relational database, and is the simplest option for PrivateLink support. + +For replica set connection, you can set up PrivateLink for MongoDB with either VPC Resource or VPC Endpoint Service. If you go with VPC Resource, you would need to create a `GROUP` resource configuration, plus a `CHILD` resource configuration for each node in the cluster. If you go with VPC Endpoint Service, you would need to create a separate Endpoint Service (and a separate NLB) for each node in the cluster. + +See [AWS PrivateLink for ClickPipes](../aws-privatelink.md) documentation for more details. Please reach out to ClickHouse support for assistance. diff --git a/docs/integrations/data-ingestion/clickpipes/mongodb/index.md b/docs/integrations/data-ingestion/clickpipes/mongodb/index.md index 7c3be25166e..d882069bee2 100644 --- a/docs/integrations/data-ingestion/clickpipes/mongodb/index.md +++ b/docs/integrations/data-ingestion/clickpipes/mongodb/index.md @@ -15,6 +15,7 @@ import mongodb_connection_details from '@site/static/images/integrations/data-in import select_destination_db from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/select-destination-db.png' import ch_permissions from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/ch-permissions.jpg' import Image from '@theme/IdealImage'; +import ssh_tunnel from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/ssh-tunnel.jpg' # Ingesting data from MongoDB to ClickHouse (using CDC) @@ -38,6 +39,8 @@ To get started, you first need to ensure that your MongoDB database is correctly 2. [Generic MongoDB](./mongodb/source/generic) +3. [Amazon DocumentDB](./mongodb/source/docdb) + Once your source MongoDB database is set up, you can continue creating your ClickPipe. ## Create your ClickPipe {#create-your-clickpipe} @@ -67,6 +70,22 @@ Make sure you are logged in to your ClickHouse Cloud account. If you don't have Fill in connection details +#### (Optional) Set up SSH Tunneling {#optional-set-up-ssh-tunneling} + +You can specify SSH tunneling details if your source MongoDB database is not publicly accessible. + +1. Enable the "Use SSH Tunnelling" toggle. +2. Fill in the SSH connection details. + + SSH tunneling + +3. To use Key-based authentication, click on "Revoke and generate key pair" to generate a new key pair and copy the generated public key to your SSH server under `~/.ssh/authorized_keys`. +4. Click on "Verify Connection" to verify the connection. + +:::note +Make sure to whitelist [ClickPipes IP addresses](../clickpipes#list-of-static-ips) in your firewall rules for the SSH bastion host so that ClickPipes can establish the SSH tunnel. +::: + Once the connection details are filled in, click `Next`. #### Configure advanced settings {#advanced-settings} diff --git a/docs/integrations/data-ingestion/clickpipes/mongodb/source/docdb.md b/docs/integrations/data-ingestion/clickpipes/mongodb/source/docdb.md new file mode 100644 index 00000000000..7535eb9c954 --- /dev/null +++ b/docs/integrations/data-ingestion/clickpipes/mongodb/source/docdb.md @@ -0,0 +1,70 @@ +--- +sidebar_label: 'Amazon DocumentDB' +description: 'Step-by-step guide on how to set up Amazon DocumentDB as a source for ClickPipes' +slug: /integrations/clickpipes/mongodb/source/docdb +title: 'Amazon DocumentDB source setup guide' +doc_type: 'guide' +keywords: ['clickpipes', 'mongodb', 'documentdb', 'cdc', 'data ingestion', 'real-time sync'] +--- + +import docdb_select_parameter_group from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-select-parameter-group.png' +import docdb_modify_parameter_group from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-modify-parameter-group.png' +import docdb_apply_parameter_group from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-apply-parameter-group.png' +import docdb_parameter_group_status from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-parameter-group-status.png' +import Image from '@theme/IdealImage'; + +# Amazon DocumentDB source setup guide + +## Supported DocumentDB versions {#supported-documentdb-versions} + +ClickPipes supports DocumentDB version 5.0. + +## Configure change stream log retention {#configure-change-stream-log-retention} + +By default, Amazon DocumentDB has a 3-hour change stream log retention period, initial load may take much longer depending on existing data volume in your DocumentDB. We recommend setting the change stream log retention to 72 hours or longer to ensure that it is not truncated before the initial snapshot is completed. + +### Update change stream log retention via AWS Console {#update-change-stream-log-retention-via-aws-console} + +1. Click `Parameter groups` in the left panel, find the parameter group used by your DocumentDB cluster (if you are using the default parameter group, you will need to create a new parameter group first in order to modify it). +Select parameter group + +2. Search for `change_stream_log_retention_duration`, select and edit it to `259200` (72 hours) +Modify parameter group + +3. Click `Apply Changes` to apply the modified parameter group to your DocumentDB cluster immediately. You should see the status of the parameter group transition to `applying`, and then to `in-sync` when the change is applied. +Apply parameter group + +Parameter group status + +### Update change stream log retention via AWS CLI {#update-change-stream-log-retention-via-aws-cli} + +To check the current change stream log retention period via AWS CLI: +```shell +aws docdb describe-db-cluster-parameters --db-cluster-parameter-group-name --query "Parameters[?ParameterName=='change_stream_log_retention_duration'].{Name:ParameterName,Value:ParameterValue}" +``` + +To set the change stream log retention period to 72 hours via AWS CLI: +```shell +aws docdb modify-db-cluster-parameter-group --db-cluster-parameter-group-name --parameters "ParameterName=change_stream_log_retention_duration,ParameterValue=259200,ApplyMethod=immediate" +``` + +## Configure a database user {#configure-database-user} + +Connect to your DocumentDB cluster as an admin user and execute the following command to create a database user for MongoDB CDC ClickPipes: + +```javascript +db.getSiblingDB("admin").createUser({ + user: "clickpipes_user", + pwd: "some_secure_password", + roles: ["readAnyDatabase", "clusterMonitor"], +}) +``` + +:::note +Make sure to replace `clickpipes_user` and `some_secure_password` with your desired username and password. +::: + +## What's next? {#whats-next} + +You can now [create your ClickPipe](../index.md) and start ingesting data from your DocumentDB instance into ClickHouse Cloud. +Make sure to note down the connection details you used while setting up your DocumentDB cluster as you will need them during the ClickPipe creation process. diff --git a/scripts/aspell-ignore/en/aspell-dict.txt b/scripts/aspell-ignore/en/aspell-dict.txt index 47912a4586a..64050f52161 100644 --- a/scripts/aspell-ignore/en/aspell-dict.txt +++ b/scripts/aspell-ignore/en/aspell-dict.txt @@ -366,6 +366,7 @@ DistributedFilesToInsert DistributedProductMode DistributedSend DockerHub +DocumentDB Doron DoubleDelta Doxygen diff --git a/sidebars.js b/sidebars.js index 64045b9da0a..f39057fd543 100644 --- a/sidebars.js +++ b/sidebars.js @@ -698,6 +698,7 @@ const sidebars = { items: [ "integrations/data-ingestion/clickpipes/mongodb/source/atlas", "integrations/data-ingestion/clickpipes/mongodb/source/generic", + "integrations/data-ingestion/clickpipes/mongodb/source/docdb", ], }, ], diff --git a/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-apply-parameter-group.png b/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-apply-parameter-group.png new file mode 100644 index 00000000000..9b9d37bbb37 Binary files /dev/null and b/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-apply-parameter-group.png differ diff --git a/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-modify-parameter-group.png b/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-modify-parameter-group.png new file mode 100644 index 00000000000..f5a14022323 Binary files /dev/null and b/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-modify-parameter-group.png differ diff --git a/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-parameter-group-status.png b/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-parameter-group-status.png new file mode 100644 index 00000000000..777cd5b2311 Binary files /dev/null and b/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-parameter-group-status.png differ diff --git a/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-select-parameter-group.png b/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-select-parameter-group.png new file mode 100644 index 00000000000..a6948fa196d Binary files /dev/null and b/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-select-parameter-group.png differ