Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit b560a98

Browse files
author
Amit Kapila
committed
Doc: Add the new section "Logical Replication Failover".
This aids the users to ensure that the failover marked slots are synced to the standby and subscribers can continue replication even when the publisher node goes down. Author: Hou Zhijie, Shveta Malik, Amit Kapila Reviewed-by: Peter Smith, Bertrand Drouvot Discussion: https://postgr.es/m/OS0PR01MB57164D6F53FB4F6AD29AD9C594FB2@OS0PR01MB5716.jpnprd01.prod.outlook.com
1 parent 4b87917 commit b560a98

File tree

2 files changed

+103
-0
lines changed

2 files changed

+103
-0
lines changed

doc/src/sgml/high-availability.sgml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1487,6 +1487,15 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
14871487
Written administration procedures are advised.
14881488
</para>
14891489

1490+
<para>
1491+
If you have opted for logical replication slot synchronization (see
1492+
<xref linkend="logicaldecoding-replication-slots-synchronization"/>),
1493+
then before switching to the standby server, it is recommended to check
1494+
if the logical slots synchronized on the standby server are ready
1495+
for failover. This can be done by following the steps described in
1496+
<xref linkend="logical-replication-failover"/>.
1497+
</para>
1498+
14901499
<para>
14911500
To trigger failover of a log-shipping standby server, run
14921501
<command>pg_ctl promote</command> or call <function>pg_promote()</function>.

doc/src/sgml/logical-replication.sgml

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -687,6 +687,100 @@ ALTER SUBSCRIPTION
687687

688688
</sect1>
689689

690+
<sect1 id="logical-replication-failover">
691+
<title>Logical Replication Failover</title>
692+
693+
<para>
694+
To allow subscriber nodes to continue replicating data from the publisher
695+
node even when the publisher node goes down, there must be a physical standby
696+
corresponding to the publisher node. The logical slots on the primary server
697+
corresponding to the subscriptions can be synchronized to the standby server by
698+
specifying <literal>failover = true</literal> when creating subscriptions. See
699+
<xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
700+
Enabling the
701+
<link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
702+
parameter ensures a seamless transition of those subscriptions after the
703+
standby is promoted. They can continue subscribing to publications on the
704+
new primary server without losing data. Note that in the case of
705+
asynchronous replication, there remains a risk of data loss for transactions
706+
committed on the former primary server but have yet to be replicated to the new
707+
primary server.
708+
</para>
709+
710+
<para>
711+
Because the slot synchronization logic copies asynchronously, it is
712+
necessary to confirm that replication slots have been synced to the standby
713+
server before the failover happens. To ensure a successful failover, the
714+
standby server must be ahead of the subscriber. This can be achieved by
715+
configuring
716+
<link linkend="guc-standby-slot-names"><varname>standby_slot_names</varname></link>.
717+
</para>
718+
719+
<para>
720+
To confirm that the standby server is indeed ready for failover, follow these
721+
steps to verify that all necessary logical replication slots have been
722+
synchronized to the standby server:
723+
</para>
724+
725+
<procedure>
726+
<step performance="required">
727+
<para>
728+
On the subscriber node, use the following SQL to identify which slots
729+
should be synced to the standby that we plan to promote. This query will
730+
return the relevant replication slots, including the main slots and table
731+
synchronization slots associated with the failover-enabled subscriptions.
732+
Note that the table sync slot should be synced to the standby server only
733+
if the table copy is finished (See <xref linkend="catalog-pg-subscription-rel"/>).
734+
We don't need to ensure that the table sync slots are synced in other scenarios
735+
as they will either be dropped or re-created on the new primary server in those
736+
cases.
737+
<programlisting>
738+
test_sub=# SELECT
739+
array_agg(slot_name) AS slots
740+
FROM
741+
((
742+
SELECT r.srsubid AS subid, CONCAT('pg_', srsubid, '_sync_', srrelid, '_', ctl.system_identifier) AS slot_name
743+
FROM pg_control_system() ctl, pg_subscription_rel r, pg_subscription s
744+
WHERE r.srsubstate = 'f' AND s.oid = r.srsubid AND s.subfailover
745+
) UNION (
746+
SELECT s.oid AS subid, s.subslotname as slot_name
747+
FROM pg_subscription s
748+
WHERE s.subfailover
749+
))
750+
WHERE slot_name IS NOT NULL;
751+
slots
752+
-------
753+
{sub1,sub2,sub3}
754+
(1 row)
755+
</programlisting></para>
756+
</step>
757+
<step performance="required">
758+
<para>
759+
Check that the logical replication slots identified above exist on
760+
the standby server and are ready for failover.
761+
<programlisting>
762+
test_standby=# SELECT slot_name, (synced AND NOT temporary AND NOT conflicting) AS failover_ready
763+
FROM pg_replication_slots
764+
WHERE slot_name IN ('sub1','sub2','sub3');
765+
slot_name | failover_ready
766+
-------------+----------------
767+
sub1 | t
768+
sub2 | t
769+
sub3 | t
770+
(3 rows)
771+
</programlisting></para>
772+
</step>
773+
</procedure>
774+
775+
<para>
776+
If all the slots are present on the standby server and the result
777+
(<literal>failover_ready</literal>) of the above SQL query is true, then
778+
existing subscriptions can continue subscribing to publications now on the
779+
new primary server without losing data.
780+
</para>
781+
782+
</sect1>
783+
690784
<sect1 id="logical-replication-row-filter">
691785
<title>Row Filters</title>
692786

0 commit comments

Comments
 (0)