@@ -687,6 +687,100 @@ ALTER SUBSCRIPTION
687
687
688
688
</sect1>
689
689
690
+ <sect1 id="logical-replication-failover">
691
+ <title>Logical Replication Failover</title>
692
+
693
+ <para>
694
+ To allow subscriber nodes to continue replicating data from the publisher
695
+ node even when the publisher node goes down, there must be a physical standby
696
+ corresponding to the publisher node. The logical slots on the primary server
697
+ corresponding to the subscriptions can be synchronized to the standby server by
698
+ specifying <literal>failover = true</literal> when creating subscriptions. See
699
+ <xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
700
+ Enabling the
701
+ <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
702
+ parameter ensures a seamless transition of those subscriptions after the
703
+ standby is promoted. They can continue subscribing to publications on the
704
+ new primary server without losing data. Note that in the case of
705
+ asynchronous replication, there remains a risk of data loss for transactions
706
+ committed on the former primary server but have yet to be replicated to the new
707
+ primary server.
708
+ </para>
709
+
710
+ <para>
711
+ Because the slot synchronization logic copies asynchronously, it is
712
+ necessary to confirm that replication slots have been synced to the standby
713
+ server before the failover happens. To ensure a successful failover, the
714
+ standby server must be ahead of the subscriber. This can be achieved by
715
+ configuring
716
+ <link linkend="guc-standby-slot-names"><varname>standby_slot_names</varname></link>.
717
+ </para>
718
+
719
+ <para>
720
+ To confirm that the standby server is indeed ready for failover, follow these
721
+ steps to verify that all necessary logical replication slots have been
722
+ synchronized to the standby server:
723
+ </para>
724
+
725
+ <procedure>
726
+ <step performance="required">
727
+ <para>
728
+ On the subscriber node, use the following SQL to identify which slots
729
+ should be synced to the standby that we plan to promote. This query will
730
+ return the relevant replication slots, including the main slots and table
731
+ synchronization slots associated with the failover-enabled subscriptions.
732
+ Note that the table sync slot should be synced to the standby server only
733
+ if the table copy is finished (See <xref linkend="catalog-pg-subscription-rel"/>).
734
+ We don't need to ensure that the table sync slots are synced in other scenarios
735
+ as they will either be dropped or re-created on the new primary server in those
736
+ cases.
737
+ <programlisting>
738
+ test_sub=# SELECT
739
+ array_agg(slot_name) AS slots
740
+ FROM
741
+ ((
742
+ SELECT r.srsubid AS subid, CONCAT('pg_', srsubid, '_sync_', srrelid, '_', ctl.system_identifier) AS slot_name
743
+ FROM pg_control_system() ctl, pg_subscription_rel r, pg_subscription s
744
+ WHERE r.srsubstate = 'f' AND s.oid = r.srsubid AND s.subfailover
745
+ ) UNION (
746
+ SELECT s.oid AS subid, s.subslotname as slot_name
747
+ FROM pg_subscription s
748
+ WHERE s.subfailover
749
+ ))
750
+ WHERE slot_name IS NOT NULL;
751
+ slots
752
+ -------
753
+ {sub1,sub2,sub3}
754
+ (1 row)
755
+ </programlisting></para>
756
+ </step>
757
+ <step performance="required">
758
+ <para>
759
+ Check that the logical replication slots identified above exist on
760
+ the standby server and are ready for failover.
761
+ <programlisting>
762
+ test_standby=# SELECT slot_name, (synced AND NOT temporary AND NOT conflicting) AS failover_ready
763
+ FROM pg_replication_slots
764
+ WHERE slot_name IN ('sub1','sub2','sub3');
765
+ slot_name | failover_ready
766
+ -------------+----------------
767
+ sub1 | t
768
+ sub2 | t
769
+ sub3 | t
770
+ (3 rows)
771
+ </programlisting></para>
772
+ </step>
773
+ </procedure>
774
+
775
+ <para>
776
+ If all the slots are present on the standby server and the result
777
+ (<literal>failover_ready</literal>) of the above SQL query is true, then
778
+ existing subscriptions can continue subscribing to publications now on the
779
+ new primary server without losing data.
780
+ </para>
781
+
782
+ </sect1>
783
+
690
784
<sect1 id="logical-replication-row-filter">
691
785
<title>Row Filters</title>
692
786
0 commit comments