Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 52e4f0c

Browse files
author
Amit Kapila
committed
Allow specifying row filters for logical replication of tables.
This feature adds row filtering for publication tables. When a publication is defined or modified, an optional WHERE clause can be specified. Rows that don't satisfy this WHERE clause will be filtered out. This allows a set of tables to be partially replicated. The row filter is per table. A new row filter can be added simply by specifying a WHERE clause after the table name. The WHERE clause must be enclosed by parentheses. The row filter WHERE clause for a table added to a publication that publishes UPDATE and/or DELETE operations must contain only columns that are covered by REPLICA IDENTITY. The row filter WHERE clause for a table added to a publication that publishes INSERT can use any column. If the row filter evaluates to NULL, it is regarded as "false". The WHERE clause only allows simple expressions that don't have user-defined functions, user-defined operators, user-defined types, user-defined collations, non-immutable built-in functions, or references to system columns. These restrictions could be addressed in the future. If you choose to do the initial table synchronization, only data that satisfies the row filters is copied to the subscriber. If the subscription has several publications in which a table has been published with different WHERE clauses, rows that satisfy ANY of the expressions will be copied. If a subscriber is a pre-15 version, the initial table synchronization won't use row filters even if they are defined in the publisher. The row filters are applied before publishing the changes. If the subscription has several publications in which the same table has been published with different filters (for the same publish operation), those expressions get OR'ed together so that rows satisfying any of the expressions will be replicated. This means all the other filters become redundant if (a) one of the publications have no filter at all, (b) one of the publications was created using FOR ALL TABLES, (c) one of the publications was created using FOR ALL TABLES IN SCHEMA and the table belongs to that same schema. If your publication contains a partitioned table, the publication parameter publish_via_partition_root determines if it uses the partition's row filter (if the parameter is false, the default) or the root partitioned table's row filter. Psql commands \dRp+ and \d <table-name> will display any row filters. Author: Hou Zhijie, Euler Taveira, Peter Smith, Ajin Cherian Reviewed-by: Greg Nancarrow, Haiying Tang, Amit Kapila, Tomas Vondra, Dilip Kumar, Vignesh C, Alvaro Herrera, Andres Freund, Wei Wang Discussion: https://www.postgresql.org/message-id/flat/CAHE3wggb715X%2BmK_DitLXF25B%3DjE6xyNCH4YOwM860JR7HarGQ%40mail.gmail.com
1 parent ebf6c52 commit 52e4f0c

33 files changed

+3113
-236
lines changed

doc/src/sgml/catalogs.sgml

+9
Original file line numberDiff line numberDiff line change
@@ -6325,6 +6325,15 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
63256325
Reference to relation
63266326
</para></entry>
63276327
</row>
6328+
6329+
<row>
6330+
<entry role="catalog_table_entry"><para role="column_definition">
6331+
<structfield>prqual</structfield> <type>pg_node_tree</type>
6332+
</para>
6333+
<para>Expression tree (in <function>nodeToString()</function>
6334+
representation) for the relation's publication qualifying condition. Null
6335+
if there is no publication qualifying condition.</para></entry>
6336+
</row>
63286337
</tbody>
63296338
</tgroup>
63306339
</table>

doc/src/sgml/ref/alter_publication.sgml

+10-2
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ ALTER PUBLICATION <replaceable class="parameter">name</replaceable> RENAME TO <r
3030

3131
<phrase>where <replaceable class="parameter">publication_object</replaceable> is one of:</phrase>
3232

33-
TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [, ... ]
33+
TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ WHERE ( <replaceable class="parameter">expression</replaceable> ) ] [, ... ]
3434
ALL TABLES IN SCHEMA { <replaceable class="parameter">schema_name</replaceable> | CURRENT_SCHEMA } [, ... ]
3535
</synopsis>
3636
</refsynopsisdiv>
@@ -52,7 +52,9 @@ ALTER PUBLICATION <replaceable class="parameter">name</replaceable> RENAME TO <r
5252
remove one or more tables/schemas from the publication. Note that adding
5353
tables/schemas to a publication that is already subscribed to will require an
5454
<literal>ALTER SUBSCRIPTION ... REFRESH PUBLICATION</literal> action on the
55-
subscribing side in order to become effective.
55+
subscribing side in order to become effective. Note also that the combination
56+
of <literal>DROP</literal> with a <literal>WHERE</literal> clause is not
57+
allowed.
5658
</para>
5759

5860
<para>
@@ -110,6 +112,12 @@ ALTER PUBLICATION <replaceable class="parameter">name</replaceable> RENAME TO <r
110112
specified, the table and all its descendant tables (if any) are
111113
affected. Optionally, <literal>*</literal> can be specified after the table
112114
name to explicitly indicate that descendant tables are included.
115+
If the optional <literal>WHERE</literal> clause is specified, rows for
116+
which the <replaceable class="parameter">expression</replaceable>
117+
evaluates to false or null will not be published. Note that parentheses
118+
are required around the expression. The
119+
<replaceable class="parameter">expression</replaceable> is evaluated with
120+
the role used for the replication connection.
113121
</para>
114122
</listitem>
115123
</varlistentry>

doc/src/sgml/ref/alter_subscription.sgml

+5-2
Original file line numberDiff line numberDiff line change
@@ -163,8 +163,11 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
163163
<para>
164164
Specifies whether to copy pre-existing data in the publications
165165
that are being subscribed to when the replication starts.
166-
The default is <literal>true</literal>. (Previously-subscribed
167-
tables are not copied.)
166+
The default is <literal>true</literal>.
167+
</para>
168+
<para>
169+
Previously subscribed tables are not copied, even if a table's row
170+
filter <literal>WHERE</literal> clause has since been modified.
168171
</para>
169172
</listitem>
170173
</varlistentry>

doc/src/sgml/ref/create_publication.sgml

+37-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable>
2828

2929
<phrase>where <replaceable class="parameter">publication_object</replaceable> is one of:</phrase>
3030

31-
TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [, ... ]
31+
TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ WHERE ( <replaceable class="parameter">expression</replaceable> ) ] [, ... ]
3232
ALL TABLES IN SCHEMA { <replaceable class="parameter">schema_name</replaceable> | CURRENT_SCHEMA } [, ... ]
3333
</synopsis>
3434
</refsynopsisdiv>
@@ -78,6 +78,14 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable>
7878
publication, so they are never explicitly added to the publication.
7979
</para>
8080

81+
<para>
82+
If the optional <literal>WHERE</literal> clause is specified, rows for
83+
which the <replaceable class="parameter">expression</replaceable>
84+
evaluates to false or null will not be published. Note that parentheses
85+
are required around the expression. It has no effect on
86+
<literal>TRUNCATE</literal> commands.
87+
</para>
88+
8189
<para>
8290
Only persistent base tables and partitioned tables can be part of a
8391
publication. Temporary tables, unlogged tables, foreign tables,
@@ -225,6 +233,22 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable>
225233
disallowed on those tables.
226234
</para>
227235

236+
<para>
237+
A <literal>WHERE</literal> (i.e. row filter) expression must contain only
238+
columns that are covered by the <literal>REPLICA IDENTITY</literal>, in
239+
order for <command>UPDATE</command> and <command>DELETE</command> operations
240+
to be published. For publication of <command>INSERT</command> operations,
241+
any column may be used in the <literal>WHERE</literal> expression. The
242+
<literal>WHERE</literal> clause allows simple expressions that don't have
243+
user-defined functions, user-defined operators, user-defined types,
244+
user-defined collations, non-immutable built-in functions, or references to
245+
system columns.
246+
If your publication contains a partitioned table, the publication parameter
247+
<literal>publish_via_partition_root</literal> determines if it uses the
248+
partition's row filter (if the parameter is false, the default) or the root
249+
partitioned table's row filter.
250+
</para>
251+
228252
<para>
229253
For an <command>INSERT ... ON CONFLICT</command> command, the publication will
230254
publish the operation that actually results from the command. So depending
@@ -247,6 +271,11 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable>
247271
<para>
248272
<acronym>DDL</acronym> operations are not published.
249273
</para>
274+
275+
<para>
276+
The <literal>WHERE</literal> clause expression is executed with the role used
277+
for the replication connection.
278+
</para>
250279
</refsect1>
251280

252281
<refsect1>
@@ -259,6 +288,13 @@ CREATE PUBLICATION mypublication FOR TABLE users, departments;
259288
</programlisting>
260289
</para>
261290

291+
<para>
292+
Create a publication that publishes all changes from active departments:
293+
<programlisting>
294+
CREATE PUBLICATION active_departments FOR TABLE departments WHERE (active IS TRUE);
295+
</programlisting>
296+
</para>
297+
262298
<para>
263299
Create a publication that publishes all changes in all tables:
264300
<programlisting>

doc/src/sgml/ref/create_subscription.sgml

+26-1
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,11 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
208208
that are being subscribed to when the replication starts.
209209
The default is <literal>true</literal>.
210210
</para>
211+
<para>
212+
If the publications contain <literal>WHERE</literal> clauses, it
213+
will affect what data is copied. Refer to the
214+
<xref linkend="sql-createsubscription-notes" /> for details.
215+
</para>
211216
</listitem>
212217
</varlistentry>
213218

@@ -293,7 +298,7 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
293298
</variablelist>
294299
</refsect1>
295300

296-
<refsect1>
301+
<refsect1 id="sql-createsubscription-notes" xreflabel="Notes">
297302
<title>Notes</title>
298303

299304
<para>
@@ -319,6 +324,26 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
319324
the parameter <literal>create_slot = false</literal>. This is an
320325
implementation restriction that might be lifted in a future release.
321326
</para>
327+
328+
<para>
329+
If any table in the publication has a <literal>WHERE</literal> clause, rows
330+
for which the <replaceable class="parameter">expression</replaceable>
331+
evaluates to false or null will not be published. If the subscription has
332+
several publications in which the same table has been published with
333+
different <literal>WHERE</literal> clauses, a row will be published if any
334+
of the expressions (referring to that publish operation) are satisfied. In
335+
the case of different <literal>WHERE</literal> clauses, if one of the
336+
publications has no <literal>WHERE</literal> clause (referring to that
337+
publish operation) or the publication is declared as
338+
<literal>FOR ALL TABLES</literal> or
339+
<literal>FOR ALL TABLES IN SCHEMA</literal>, rows are always published
340+
regardless of the definition of the other expressions.
341+
If the subscriber is a <productname>PostgreSQL</productname> version before
342+
15 then any row filtering is ignored during the initial data synchronization
343+
phase. For this case, the user might want to consider deleting any initially
344+
copied data that would be incompatible with subsequent filtering.
345+
</para>
346+
322347
</refsect1>
323348

324349
<refsect1>

src/backend/catalog/pg_publication.c

+55-4
Original file line numberDiff line numberDiff line change
@@ -275,18 +275,57 @@ GetPubPartitionOptionRelations(List *result, PublicationPartOpt pub_partopt,
275275
return result;
276276
}
277277

278+
/*
279+
* Returns the relid of the topmost ancestor that is published via this
280+
* publication if any, otherwise returns InvalidOid.
281+
*
282+
* Note that the list of ancestors should be ordered such that the topmost
283+
* ancestor is at the end of the list.
284+
*/
285+
Oid
286+
GetTopMostAncestorInPublication(Oid puboid, List *ancestors)
287+
{
288+
ListCell *lc;
289+
Oid topmost_relid = InvalidOid;
290+
291+
/*
292+
* Find the "topmost" ancestor that is in this publication.
293+
*/
294+
foreach(lc, ancestors)
295+
{
296+
Oid ancestor = lfirst_oid(lc);
297+
List *apubids = GetRelationPublications(ancestor);
298+
List *aschemaPubids = NIL;
299+
300+
if (list_member_oid(apubids, puboid))
301+
topmost_relid = ancestor;
302+
else
303+
{
304+
aschemaPubids = GetSchemaPublications(get_rel_namespace(ancestor));
305+
if (list_member_oid(aschemaPubids, puboid))
306+
topmost_relid = ancestor;
307+
}
308+
309+
list_free(apubids);
310+
list_free(aschemaPubids);
311+
}
312+
313+
return topmost_relid;
314+
}
315+
278316
/*
279317
* Insert new publication / relation mapping.
280318
*/
281319
ObjectAddress
282-
publication_add_relation(Oid pubid, PublicationRelInfo *targetrel,
320+
publication_add_relation(Oid pubid, PublicationRelInfo *pri,
283321
bool if_not_exists)
284322
{
285323
Relation rel;
286324
HeapTuple tup;
287325
Datum values[Natts_pg_publication_rel];
288326
bool nulls[Natts_pg_publication_rel];
289-
Oid relid = RelationGetRelid(targetrel->relation);
327+
Relation targetrel = pri->relation;
328+
Oid relid = RelationGetRelid(targetrel);
290329
Oid pubreloid;
291330
Publication *pub = GetPublication(pubid);
292331
ObjectAddress myself,
@@ -311,10 +350,10 @@ publication_add_relation(Oid pubid, PublicationRelInfo *targetrel,
311350
ereport(ERROR,
312351
(errcode(ERRCODE_DUPLICATE_OBJECT),
313352
errmsg("relation \"%s\" is already member of publication \"%s\"",
314-
RelationGetRelationName(targetrel->relation), pub->name)));
353+
RelationGetRelationName(targetrel), pub->name)));
315354
}
316355

317-
check_publication_add_relation(targetrel->relation);
356+
check_publication_add_relation(targetrel);
318357

319358
/* Form a tuple. */
320359
memset(values, 0, sizeof(values));
@@ -328,6 +367,12 @@ publication_add_relation(Oid pubid, PublicationRelInfo *targetrel,
328367
values[Anum_pg_publication_rel_prrelid - 1] =
329368
ObjectIdGetDatum(relid);
330369

370+
/* Add qualifications, if available */
371+
if (pri->whereClause != NULL)
372+
values[Anum_pg_publication_rel_prqual - 1] = CStringGetTextDatum(nodeToString(pri->whereClause));
373+
else
374+
nulls[Anum_pg_publication_rel_prqual - 1] = true;
375+
331376
tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
332377

333378
/* Insert tuple into catalog. */
@@ -345,6 +390,12 @@ publication_add_relation(Oid pubid, PublicationRelInfo *targetrel,
345390
ObjectAddressSet(referenced, RelationRelationId, relid);
346391
recordDependencyOn(&myself, &referenced, DEPENDENCY_AUTO);
347392

393+
/* Add dependency on the objects mentioned in the qualifications */
394+
if (pri->whereClause)
395+
recordDependencyOnSingleRelExpr(&myself, pri->whereClause, relid,
396+
DEPENDENCY_NORMAL, DEPENDENCY_NORMAL,
397+
false);
398+
348399
/* Close the table. */
349400
table_close(rel, RowExclusiveLock);
350401

0 commit comments

Comments
 (0)