Skip to content

Commit a2b830e

Browse files
authored
Merge pull request #4410 from ClickHouse/pg-publication-ignore-delte
Add instruction to Postgres ClickPipes FAQ for not propagating deletes/truncates
2 parents 0d6ae78 + 9d22dfb commit a2b830e

File tree

1 file changed

+21
-11
lines changed
  • docs/integrations/data-ingestion/clickpipes/postgres

1 file changed

+21
-11
lines changed

docs/integrations/data-ingestion/clickpipes/postgres/faq.md

Lines changed: 21 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -203,23 +203,23 @@ For manually created publications, please add any tables you want to the publica
203203
If you're replicating from a Postgres read replica/hot standby, you will need to create your own publication on the primary instance, which will automatically propagate to the standby. The ClickPipe will not be able to manage the publication in this case as you're unable to create publications on a standby.
204204
:::
205205
206-
## Recommended `max_slot_wal_keep_size` settings {#recommended-max_slot_wal_keep_size-settings}
206+
### Recommended `max_slot_wal_keep_size` settings {#recommended-max_slot_wal_keep_size-settings}
207207
208208
- **At Minimum:** Set [`max_slot_wal_keep_size`](https://www.postgresql.org/docs/devel/runtime-config-replication.html#GUC-MAX-SLOT-WAL-KEEP-SIZE) to retain at least **two days' worth** of WAL data.
209209
- **For Large Databases (High Transaction Volume):** Retain at least **2-3 times** the peak WAL generation per day.
210210
- **For Storage-Constrained Environments:** Tune this conservatively to **avoid disk exhaustion** while ensuring replication stability.
211211

212-
### How to calculate the right value {#how-to-calculate-the-right-value}
212+
#### How to calculate the right value {#how-to-calculate-the-right-value}
213213

214214
To determine the right setting, measure the WAL generation rate:
215215

216-
#### For PostgreSQL 10+ {#for-postgresql-10}
216+
##### For PostgreSQL 10+ {#for-postgresql-10}
217217

218218
```sql
219219
SELECT pg_wal_lsn_diff(pg_current_wal_insert_lsn(), '0/0') / 1024 / 1024 AS wal_generated_mb;
220220
```
221221

222-
#### For PostgreSQL 9.6 and below: {#for-postgresql-96-and-below}
222+
##### For PostgreSQL 9.6 and below: {#for-postgresql-96-and-below}
223223

224224
```sql
225225
SELECT pg_xlog_location_diff(pg_current_xlog_insert_location(), '0/0') / 1024 / 1024 AS wal_generated_mb;
@@ -230,7 +230,7 @@ SELECT pg_xlog_location_diff(pg_current_xlog_insert_location(), '0/0') / 1024 /
230230
* Multiply that number by 2 or 3 to provide sufficient retention.
231231
* Set `max_slot_wal_keep_size` to the resulting value in MB or GB.
232232

233-
#### Example {#example}
233+
##### Example {#example}
234234

235235
If your database generates 100 GB of WAL per day, set:
236236

@@ -257,7 +257,7 @@ The most common cause of replication slot invalidation is a low `max_slot_wal_ke
257257

258258
In rare cases, we have seen this issue occur even when `max_slot_wal_keep_size` is not configured. This could be due to an intricate and rare bug in PostgreSQL, although the cause remains unclear.
259259

260-
## I am seeing out of memory (OOMs) on ClickHouse while my ClickPipe is ingesting data. Can you help? {#i-am-seeing-out-of-memory-ooms-on-clickhouse-while-my-clickpipe-is-ingesting-data-can-you-help}
260+
### I am seeing out of memory (OOMs) on ClickHouse while my ClickPipe is ingesting data. Can you help? {#i-am-seeing-out-of-memory-ooms-on-clickhouse-while-my-clickpipe-is-ingesting-data-can-you-help}
261261

262262
One common reason for OOMs on ClickHouse is that your service is undersized. This means that your current service configuration doesn't have enough resources (e.g., memory or CPU) to handle the ingestion load effectively. We strongly recommend scaling up the service to meet the demands of your ClickPipe data ingestion.
263263
@@ -267,15 +267,15 @@ Another reason we've observed is the presence of downstream Materialized Views w
267267

268268
- Another optimization for JOINs is to explicitly filter the tables through `subqueries` or `CTEs` and then perform the `JOIN` across these subqueries. This provides the planner with hints on how to efficiently filter rows and perform the `JOIN`.
269269

270-
## I am seeing an `invalid snapshot identifier` during the initial load. What should I do? {#i-am-seeing-an-invalid-snapshot-identifier-during-the-initial-load-what-should-i-do}
270+
### I am seeing an `invalid snapshot identifier` during the initial load. What should I do? {#i-am-seeing-an-invalid-snapshot-identifier-during-the-initial-load-what-should-i-do}
271271

272272
The `invalid snapshot identifier` error occurs when there is a connection drop between ClickPipes and your Postgres database. This can happen due to gateway timeouts, database restarts, or other transient issues.
273273

274274
It is recommended that you do not carry out any disruptive operations like upgrades or restarts on your Postgres database while Initial Load is in progress and ensure that the network connection to your database is stable.
275275

276276
To resolve this issue, you can trigger a resync from the ClickPipes UI. This will restart the initial load process from the beginning.
277277

278-
## What happens if I drop a publication in Postgres? {#what-happens-if-i-drop-a-publication-in-postgres}
278+
### What happens if I drop a publication in Postgres? {#what-happens-if-i-drop-a-publication-in-postgres}
279279

280280
Dropping a publication in Postgres will break your ClickPipe connection since the publication is required for the ClickPipe to pull changes from the source. When this happens, you'll typically receive an error alert indicating that the publication no longer exists.
281281
@@ -296,22 +296,32 @@ FOR TABLE <...>, <...>
296296
WITH (publish_via_partition_root = true);
297297
```
298298

299-
## What if I am seeing `Unexpected Datatype` errors or `Cannot parse type XX ...` {#what-if-i-am-seeing-unexpected-datatype-errors}
299+
### What if I am seeing `Unexpected Datatype` errors or `Cannot parse type XX ...` {#what-if-i-am-seeing-unexpected-datatype-errors}
300300

301301
This error typically occurs when the source Postgres database has a datatype which cannot be mapped during ingestion.
302302
For more specific issue, refer to the possibilities below.
303303

304-
## `Cannot parse type Decimal(XX, YY), expected non-empty binary data with size equal to or less than ...` {#cannot-parse-type-decimal-expected-non-empty-binary-data-with-size-equal-to-or-less-than}
304+
### `Cannot parse type Decimal(XX, YY), expected non-empty binary data with size equal to or less than ...` {#cannot-parse-type-decimal-expected-non-empty-binary-data-with-size-equal-to-or-less-than}
305305

306306
Postgres `NUMERIC`s have really high precision (up to 131072 digits before the decimal point; up to 16383 digits after the decimal point) and ClickHouse Decimal type allows maximum of (76 digits, 39 scale).
307307
The system assumes that _usually_ the size would not get that high and does an optimistic cast for the same as source table can have large number of rows or the row can come in during the CDC phase.
308308

309309
The current workaround would be to map the NUMERIC type to string on ClickHouse. To enable this please raise a ticket with the support team and this will be enabled for your ClickPipes.
310310

311-
## I'm seeing errors like `invalid memory alloc request size <XXX>` during replication/slot creation {#postgres-invalid-memalloc-bug}
311+
### I'm seeing errors like `invalid memory alloc request size <XXX>` during replication/slot creation {#postgres-invalid-memalloc-bug}
312312

313313
There was a bug introduced in Postgres patch versions 17.5/16.9/15.13/14.18/13.21 due to which certain workloads can cause an exponential increase in memory usage, leading to a memory allocation request >1GB which Postgres considers invalid. This bug [has been fixed](https://github.com/postgres/postgres/commit/d87d07b7ad3b782cb74566cd771ecdb2823adf6a) and will be in the next Postgres patch series (17.6...). Please check with your Postgres provider when this patch version will be available for upgrade. If an upgrade isn't immediately possible, a resync of the pipe will be needed as it hits the error.
314314
315+
### I need to maintain a complete historical record in ClickHouse, even when the data is deleted from the source Postgres database. Can I completely ignore DELETE and TRUNCATE operations from Postgres in ClickPipes? {#ignore-delete-truncate}
316+
317+
Yes! Before creating your Postgres ClickPipe, create a publication without DELETE operations. For example:
318+
```sql
319+
CREATE PUBLICATION <pub_name> FOR TABLES IN SCHEMA <schema_name> WITH (publish = 'insert,update');
320+
```
321+
Then when [setting up](https://clickhouse.com/docs/integrations/clickpipes/postgres#configuring-the-replication-settings) your Postgres ClickPipe, make sure this publication name is selected.
322+
323+
Note that TRUNCATE operations are ignored by ClickPipes and will not be replicated to ClickHouse.
324+
315325
### Why can I not replicate my table which has a dot in it? {#replicate-table-dot}
316326
PeerDB has a limitation currently where dots in source table identifiers - aka either schema name or table name - is not supported for replication as PeerDB cannot discern, in that case, what is the schema and what is the table as it splits on dot.
317327
Effort is being made to support input of schema and table separately to get around this limitation.

0 commit comments

Comments
 (0)