Skip to content

Commit 9b38a66

Browse files
Merge pull request #4425 from ClickHouse/db-pipes/more-on-pkey-delete
Postgres Pipes: Add more info in FAQ about replica identity and nulls in DELETEs
2 parents 8abd422 + 8cd83aa commit 9b38a66

File tree

1 file changed

+15
-1
lines changed
  • docs/integrations/data-ingestion/clickpipes/postgres

1 file changed

+15
-1
lines changed

docs/integrations/data-ingestion/clickpipes/postgres/faq.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,19 @@ Please refer to the [Postgres Generated Columns: Gotchas and Best Practices](./g
2424

2525
### Do tables need to have primary keys to be part of Postgres CDC? {#do-tables-need-to-have-primary-keys-to-be-part-of-postgres-cdc}
2626

27-
Yes, for CDC, tables must have either a primary key or a [REPLICA IDENTITY](https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-REPLICA-IDENTITY). The REPLICA IDENTITY can be set to FULL or configured to use a unique index.
27+
For a table to be replicated using ClickPipes for Postgres, it must have either a primary key or a [REPLICA IDENTITY](https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-REPLICA-IDENTITY) defined.
28+
29+
- **Primary Key**: The most straightforward approach is to define a primary key on the table. This provides a unique identifier for each row, which is crucial for tracking updates and deletions. You can have REPLICA IDENTITY set to `DEFAULT` (the default behavior) in this case.
30+
- **Replica Identity**: If a table does not have a primary key, you can set a replica identity. The replica identity can be set to `FULL`, which means that the entire row will be used to identify changes. Alternatively, you can set it to use a unique index if one exists on the table, and then set REPLICA IDENTITY to `USING INDEX index_name`.
31+
To set the replica identity to FULL, you can use the following SQL command:
32+
```sql
33+
ALTER TABLE your_table_name REPLICA IDENTITY FULL;
34+
```
35+
REPLICA IDENTITY FULL also enabled replication of unchanged TOAST columns. More on that [here](./toast).
36+
37+
Note that using `REPLICA IDENTITY FULL` can have performance implications and also faster WAL growth, especially for tables without a primary key and with frequent updates or deletes, as it requires more data to be logged for each change. If you have any doubts or need assistance with setting up primary keys or replica identities for your tables, please reach out to our support team for guidance.
38+
39+
It's important to note that if neither a primary key nor a replica identity is defined, ClickPipes will not be able to replicate changes for that table, and you may encounter errors during the replication process. Therefore, it's recommended to review your table schemas and ensure that they meet these requirements before setting up your ClickPipe.
2840

2941
### Do you support partitioned tables as part of Postgres CDC? {#do-you-support-partitioned-tables-as-part-of-postgres-cdc}
3042

@@ -53,6 +65,8 @@ ClickPipes for Postgres captures both INSERTs and UPDATEs from Postgres as new r
5365

5466
DELETEs from Postgres are propagated as new rows marked as deleted (using the `_peerdb_is_deleted` column). Since the deduplication process is asynchronous, you might temporarily see duplicates. To address this, you need to handle deduplication at the query layer.
5567

68+
Also note that by default, Postgres does not send column values of columns that are not part of the primary key or replica identity during DELETE operations. If you want to capture the full row data during DELETEs, you can set the [REPLICA IDENTITY](https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-REPLICA-IDENTITY) to FULL.
69+
5670
For more details, refer to:
5771

5872
* [ReplacingMergeTree table engine best practices](https://docs.peerdb.io/bestpractices/clickhouse_datamodeling#replacingmergetree-table-engine)

0 commit comments

Comments
 (0)