Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit ddd5f4f

Browse files
author
Amit Kapila
committed
Add a slot synchronization function.
This commit introduces a new SQL function pg_sync_replication_slots() which is used to synchronize the logical replication slots from the primary server to the physical standby so that logical replication can be resumed after a failover or planned switchover. A new 'synced' flag is introduced in pg_replication_slots view, indicating whether the slot has been synchronized from the primary server. On a standby, synced slots cannot be dropped or consumed, and any attempt to perform logical decoding on them will result in an error. The logical replication slots on the primary can be synchronized to the hot standby by using the 'failover' parameter of pg-create-logical-replication-slot(), or by using the 'failover' option of CREATE SUBSCRIPTION during slot creation, and then calling pg_sync_replication_slots() on standby. For the synchronization to work, it is mandatory to have a physical replication slot between the primary and the standby aka 'primary_slot_name' should be configured on the standby, and 'hot_standby_feedback' must be enabled on the standby. It is also necessary to specify a valid 'dbname' in the 'primary_conninfo'. If a logical slot is invalidated on the primary, then that slot on the standby is also invalidated. If a logical slot on the primary is valid but is invalidated on the standby, then that slot is dropped but will be recreated on the standby in the next pg_sync_replication_slots() call provided the slot still exists on the primary server. It is okay to recreate such slots as long as these are not consumable on standby (which is the case currently). This situation may occur due to the following reasons: - The 'max_slot_wal_keep_size' on the standby is insufficient to retain WAL records from the restart_lsn of the slot. - 'primary_slot_name' is temporarily reset to null and the physical slot is removed. The slot synchronization status on the standby can be monitored using the 'synced' column of pg_replication_slots view. A functionality to automatically synchronize slots by a background worker and allow logical walsenders to wait for the physical will be done in subsequent commits. Author: Hou Zhijie, Shveta Malik, Ajin Cherian based on an earlier version by Peter Eisentraut Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Nisha Moond, Kuroda Hayato, Amit Kapila Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
1 parent 06bd311 commit ddd5f4f

File tree

26 files changed

+1522
-37
lines changed

26 files changed

+1522
-37
lines changed

contrib/test_decoding/expected/permissions.out

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,9 @@ DETAIL: Only roles with the REPLICATION attribute may use replication slots.
6464
SELECT pg_drop_replication_slot('regression_slot');
6565
ERROR: permission denied to use replication slots
6666
DETAIL: Only roles with the REPLICATION attribute may use replication slots.
67+
SELECT pg_sync_replication_slots();
68+
ERROR: permission denied to use replication slots
69+
DETAIL: Only roles with the REPLICATION attribute may use replication slots.
6770
RESET ROLE;
6871
-- replication users can drop superuser created slots
6972
SET ROLE regress_lr_superuser;

contrib/test_decoding/expected/slot.out

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -425,6 +425,8 @@ SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', '
425425
init
426426
(1 row)
427427

428+
SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
429+
ERROR: cannot enable failover for a temporary replication slot
428430
SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
429431
?column?
430432
----------

contrib/test_decoding/sql/permissions.sql

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
2929
INSERT INTO lr_test VALUES('lr_superuser_init');
3030
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
3131
SELECT pg_drop_replication_slot('regression_slot');
32+
SELECT pg_sync_replication_slots();
3233
RESET ROLE;
3334

3435
-- replication users can drop superuser created slots

contrib/test_decoding/sql/slot.sql

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -181,6 +181,7 @@ SELECT pg_drop_replication_slot('copied_slot2_notemp');
181181
SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot', 'test_decoding', false, false, true);
182182
SELECT 'init' FROM pg_create_logical_replication_slot('failover_false_slot', 'test_decoding', false, false, false);
183183
SELECT 'init' FROM pg_create_logical_replication_slot('failover_default_slot', 'test_decoding', false, false);
184+
SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_temp_slot', 'test_decoding', true, false, true);
184185
SELECT 'init' FROM pg_create_physical_replication_slot('physical_slot');
185186

186187
SELECT slot_name, slot_type, failover FROM pg_replication_slots;

doc/src/sgml/config.sgml

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4612,8 +4612,13 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
46124612
<varname>primary_conninfo</varname> string, or in a separate
46134613
<filename>~/.pgpass</filename> file on the standby server (use
46144614
<literal>replication</literal> as the database name).
4615-
Do not specify a database name in the
4616-
<varname>primary_conninfo</varname> string.
4615+
</para>
4616+
<para>
4617+
For replication slot synchronization (see
4618+
<xref linkend="logicaldecoding-replication-slots-synchronization"/>),
4619+
it is also necessary to specify a valid <literal>dbname</literal>
4620+
in the <varname>primary_conninfo</varname> string. This will only be
4621+
used for slot synchronization. It is ignored for streaming.
46174622
</para>
46184623
<para>
46194624
This parameter can only be set in the <filename>postgresql.conf</filename>

doc/src/sgml/func.sgml

Lines changed: 34 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28075,7 +28075,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
2807528075
</row>
2807628076

2807728077
<row>
28078-
<entry role="func_table_entry"><para role="func_signature">
28078+
<entry id="pg-create-logical-replication-slot" role="func_table_entry"><para role="func_signature">
2807928079
<indexterm>
2808028080
<primary>pg_create_logical_replication_slot</primary>
2808128081
</indexterm>
@@ -28444,6 +28444,39 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
2844428444
record is flushed along with its transaction.
2844528445
</para></entry>
2844628446
</row>
28447+
28448+
<row>
28449+
<entry id="pg-sync-replication-slots" role="func_table_entry"><para role="func_signature">
28450+
<indexterm>
28451+
<primary>pg_sync_replication_slots</primary>
28452+
</indexterm>
28453+
<function>pg_sync_replication_slots</function> ()
28454+
<returnvalue>void</returnvalue>
28455+
</para>
28456+
<para>
28457+
Synchronize the logical failover replication slots from the primary
28458+
server to the standby server. This function can only be executed on the
28459+
standby server. Temporary synced slots, if any, cannot be used for
28460+
logical decoding and must be dropped after promotion. See
28461+
<xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
28462+
</para>
28463+
28464+
<caution>
28465+
<para>
28466+
If, after executing the function,
28467+
<link linkend="guc-hot-standby-feedback">
28468+
<varname>hot_standby_feedback</varname></link> is disabled on
28469+
the standby or the physical slot configured in
28470+
<link linkend="guc-primary-slot-name">
28471+
<varname>primary_slot_name</varname></link> is
28472+
removed, then it is possible that the necessary rows of the
28473+
synchronized slot will be removed by the VACUUM process on the primary
28474+
server, resulting in the synchronized slot becoming invalidated.
28475+
</para>
28476+
</caution>
28477+
</entry>
28478+
</row>
28479+
2844728480
</tbody>
2844828481
</tgroup>
2844928482
</table>

doc/src/sgml/logicaldecoding.sgml

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -358,6 +358,62 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
358358
So if a slot is no longer required it should be dropped.
359359
</para>
360360
</caution>
361+
362+
</sect2>
363+
364+
<sect2 id="logicaldecoding-replication-slots-synchronization">
365+
<title>Replication Slot Synchronization</title>
366+
<para>
367+
The logical replication slots on the primary can be synchronized to
368+
the hot standby by using the <literal>failover</literal> parameter of
369+
<link linkend="pg-create-logical-replication-slot">
370+
<function>pg_create_logical_replication_slot</function></link>, or by
371+
using the <link linkend="sql-createsubscription-params-with-failover">
372+
<literal>failover</literal></link> option of
373+
<command>CREATE SUBSCRIPTION</command> during slot creation, and then calling
374+
<link linkend="pg-sync-replication-slots">
375+
<function>pg_sync_replication_slots</function></link>
376+
on the standby. For the synchronization to work, it is mandatory to
377+
have a physical replication slot between the primary and the standby aka
378+
<link linkend="guc-primary-slot-name"><varname>primary_slot_name</varname></link>
379+
should be configured on the standby, and
380+
<link linkend="guc-hot-standby-feedback"><varname>hot_standby_feedback</varname></link>
381+
must be enabled on the standby. It is also necessary to specify a valid
382+
<literal>dbname</literal> in the
383+
<link linkend="guc-primary-conninfo"><varname>primary_conninfo</varname></link>.
384+
</para>
385+
386+
<para>
387+
The ability to resume logical replication after failover depends upon the
388+
<link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>synced</structfield>
389+
value for the synchronized slots on the standby at the time of failover.
390+
Only persistent slots that have attained synced state as true on the standby
391+
before failover can be used for logical replication after failover.
392+
Temporary synced slots cannot be used for logical decoding, therefore
393+
logical replication for those slots cannot be resumed. For example, if the
394+
synchronized slot could not become persistent on the standby due to a
395+
disabled subscription, then the subscription cannot be resumed after
396+
failover even when it is enabled.
397+
</para>
398+
399+
<para>
400+
To resume logical replication after failover from the synced logical
401+
slots, the subscription's 'conninfo' must be altered to point to the
402+
new primary server. This is done using
403+
<link linkend="sql-altersubscription-params-connection"><command>ALTER SUBSCRIPTION ... CONNECTION</command></link>.
404+
It is recommended that subscriptions are first disabled before promoting
405+
the standby and are re-enabled after altering the connection string.
406+
</para>
407+
<caution>
408+
<para>
409+
There is a chance that the old primary is up again during the promotion
410+
and if subscriptions are not disabled, the logical subscribers may
411+
continue to receive data from the old primary server even after promotion
412+
until the connection string is altered. This might result in data
413+
inconsistency issues, preventing the logical subscribers from being
414+
able to continue replication from the new primary server.
415+
</para>
416+
</caution>
361417
</sect2>
362418

363419
<sect2 id="logicaldecoding-explanation-output-plugins">

doc/src/sgml/protocol.sgml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2062,7 +2062,8 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
20622062
<term><literal>FAILOVER [ <replaceable class="parameter">boolean</replaceable> ]</literal></term>
20632063
<listitem>
20642064
<para>
2065-
If true, the slot is enabled to be synced to the standbys.
2065+
If true, the slot is enabled to be synced to the standbys
2066+
so that logical replication can be resumed after failover.
20662067
The default is false.
20672068
</para>
20682069
</listitem>
@@ -2162,7 +2163,8 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
21622163
<term><literal>FAILOVER [ <replaceable class="parameter">boolean</replaceable> ]</literal></term>
21632164
<listitem>
21642165
<para>
2165-
If true, the slot is enabled to be synced to the standbys.
2166+
If true, the slot is enabled to be synced to the standbys
2167+
so that logical replication can be resumed after failover.
21662168
</para>
21672169
</listitem>
21682170
</varlistentry>

doc/src/sgml/system-views.sgml

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2561,10 +2561,26 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
25612561
<structfield>failover</structfield> <type>bool</type>
25622562
</para>
25632563
<para>
2564-
True if this is a logical slot enabled to be synced to the standbys.
2565-
Always false for physical slots.
2564+
True if this is a logical slot enabled to be synced to the standbys
2565+
so that logical replication can be resumed from the new primary
2566+
after failover. Always false for physical slots.
25662567
</para></entry>
25672568
</row>
2569+
2570+
<row>
2571+
<entry role="catalog_table_entry"><para role="column_definition">
2572+
<structfield>synced</structfield> <type>bool</type>
2573+
</para>
2574+
<para>
2575+
True if this is a logical slot that was synced from a primary server.
2576+
On a hot standby, the slots with the synced column marked as true can
2577+
neither be used for logical decoding nor dropped manually. The value
2578+
of this column has no meaning on the primary server; the column value on
2579+
the primary is default false for all slots but may (if leftover from a
2580+
promoted standby) also be true.
2581+
</para></entry>
2582+
</row>
2583+
25682584
</tbody>
25692585
</tgroup>
25702586
</table>

src/backend/catalog/system_views.sql

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1024,7 +1024,8 @@ CREATE VIEW pg_replication_slots AS
10241024
L.safe_wal_size,
10251025
L.two_phase,
10261026
L.conflict_reason,
1027-
L.failover
1027+
L.failover,
1028+
L.synced
10281029
FROM pg_get_replication_slots() AS L
10291030
LEFT JOIN pg_database D ON (L.datoid = D.oid);
10301031

src/backend/replication/logical/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ OBJS = \
2525
proto.o \
2626
relation.o \
2727
reorderbuffer.o \
28+
slotsync.o \
2829
snapbuild.o \
2930
tablesync.o \
3031
worker.o

src/backend/replication/logical/logical.c

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -524,6 +524,18 @@ CreateDecodingContext(XLogRecPtr start_lsn,
524524
errmsg("replication slot \"%s\" was not created in this database",
525525
NameStr(slot->data.name))));
526526

527+
/*
528+
* Do not allow consumption of a "synchronized" slot until the standby
529+
* gets promoted.
530+
*/
531+
if (RecoveryInProgress() && slot->data.synced)
532+
ereport(ERROR,
533+
errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
534+
errmsg("cannot use replication slot \"%s\" for logical decoding",
535+
NameStr(slot->data.name)),
536+
errdetail("This slot is being synchronized from the primary server."),
537+
errhint("Specify another replication slot."));
538+
527539
/*
528540
* Check if slot has been invalidated due to max_slot_wal_keep_size. Avoid
529541
* "cannot get changes" wording in this errmsg because that'd be

src/backend/replication/logical/meson.build

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ backend_sources += files(
1111
'proto.c',
1212
'relation.c',
1313
'reorderbuffer.c',
14+
'slotsync.c',
1415
'snapbuild.c',
1516
'tablesync.c',
1617
'worker.c',

0 commit comments

Comments
 (0)