Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 47897ac

Browse files
author
Liudmila Mantrova
committed
DOC: bug fix for excluding nodes
1 parent cfb9b64 commit 47897ac

File tree

1 file changed

+48
-14
lines changed

1 file changed

+48
-14
lines changed

doc/src/sgml/multimaster.sgml

Lines changed: 48 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,8 @@
7777
<listitem>
7878
<para>
7979
<filename>multimaster</filename> can only replicate one database
80-
per cluster, which is specified in the <varname>multimaster.conn_strings</varname> variable. If you try to connect to a different database, <filename>multimaster</filename> will return a corresponding error message.
80+
per cluster, which is specified in the <varname>multimaster.conn_strings</varname> variable. If you connect to a different database,
81+
all operations will fail with the corresponding error message.
8182
</para>
8283
</listitem>
8384
<listitem>
@@ -126,7 +127,7 @@
126127
</listitem>
127128
</itemizedlist>
128129
<para>If you have any data that must be present on one of the nodes only, you can exclude a particular table from replication, as follows:
129-
<programlisting>SELECT * FROM <function>mtm.make_table_local</function>('table_name') </programlisting>
130+
<programlisting><function>mtm.make_table_local</function>('table_name') </programlisting>
130131
</para>
131132
</sect2>
132133

@@ -252,11 +253,24 @@
252253
<para>
253254
In case of a partial network split when different nodes have
254255
different connectivity, <filename>multimaster</filename> finds a
255-
fully connected subset of nodes and switches off other nodes. For
256+
fully connected subset of nodes and disconnects other nodes. For
256257
example, in a three-node cluster, if node A can access both B and
257258
C, but node B cannot access node C, <filename>multimaster</filename>
258259
isolates node C to ensure data consistency on nodes A and B.
259260
</para>
261+
<note>
262+
<para>
263+
If you try to access a disconnected node, <filename>multimaster</filename> returns an error
264+
message indicating the current status of the node. To prevent stale reads, read-only queries are also forbidden.
265+
Additionally, you can break connections between the disconnected node and the clients using the
266+
<link linkend="mtm-break-connection">multimaster.break_connection</link> variable.
267+
</para>
268+
</note>
269+
<para>
270+
If required, you can override this behavior for one of the nodes using the
271+
<link linkend="mtm-major-node">multimaster.major_node</link> variable.
272+
In this case, the node will continue working even if it is isolated.
273+
</para>
260274
<para>
261275
Each node maintains a data structure that keeps the information about the state of all
262276
nodes in relation to this node. You can get this data in the
@@ -700,7 +714,8 @@ multimaster.conn_strings = 'dbname=mydb user=myuser host=node1,dbname=mydb user=
700714
pg_ctl -D <replaceable>datadir</replaceable> -l <replaceable>pg.log</replaceable> start
701715
</programlisting>
702716
<para>
703-
All the cluster nodes get locked for write transactions until the new node retrieves all the updates that happened after you started making a base backup.
717+
When the node gets synchronized up to the minimum recovery lag,
718+
all the cluster nodes get locked for write transactions until the new node retrieves all the updates.
704719
When data recovery is complete, <filename>multimaster</filename> promotes the new node to the online state and includes it into the replication scheme.
705720
</para>
706721
</listitem>
@@ -737,15 +752,33 @@ pg_ctl -D <replaceable>datadir</replaceable> -l <replaceable>pg.log</replaceable
737752
SELECT mtm.stop_node(3);
738753
</programlisting>
739754
<para>
740-
This disables replication slots for node 3 on all cluster nodes and stops replication to
741-
this node.
755+
This excludes node 3 from the cluster and stops replication to
756+
this node. While the WAL lag between the node and the current cluster state
757+
is less than the <varname>multimaster.max_recovery_lag</varname> value,
758+
you can restore the node using the <function>mtm.recover_node</function> function.
759+
For details, see <xref linkend="multimaster-restoring-a-node-manually">.
742760
</para>
761+
<note>
743762
<para>
744-
If you simply shutdown a node, it will be excluded
763+
If you simply shut down a node, it will be excluded
745764
from the cluster as well. However, all transactions in the cluster
746765
will be frozen until other nodes detect the offline state of the node.
747766
This time interval is defined by the <literal>multimaster.heartbeat_recv_timeout</literal> parameter.
748767
</para>
768+
</note>
769+
<para>
770+
If you would like to permanently remove the node from the cluster, run the
771+
<literal>mtm.stop_node()</literal> function with the <literal>drop_slot</literal> parameter
772+
set to <literal>true</literal>:
773+
</para>
774+
<programlisting>
775+
SELECT mtm.stop_node(3, drop_slot true);
776+
</programlisting>
777+
<para>
778+
This disables replication slots for node 3 on all cluster nodes and stops replication to
779+
this node. If you would like to return the node to the cluster, you will have to add it
780+
as a new node. For details, see <xref linkend="multimaster-adding-new-nodes-to-the-cluster">.
781+
</para>
749782
</sect3>
750783
<sect3 id="multimaster-restoring-a-node-manually">
751784
<title>Restoring a Cluster Node</title>
@@ -786,7 +819,8 @@ pg_basebackup -D <replaceable>datadir</replaceable> -h node1 -x
786819
pg_ctl -D <replaceable>datadir</replaceable> -l <replaceable>pg.log</replaceable> start
787820
</programlisting>
788821
<para>
789-
All the cluster nodes get locked for write transactions until the restored node retrieves all the updates that happened after you started making a base backup.
822+
When the node gets synchronized up to the minimum recovery lag,
823+
all the cluster nodes get locked for write transactions until the restored node retrieves all the updates.
790824
When data recovery is complete, <filename>multimaster</filename> promotes the new node to the online state and includes it into the replication scheme.
791825
</para>
792826
</listitem>
@@ -882,7 +916,7 @@ pg_ctl -D <replaceable>datadir</replaceable> -l <replaceable>pg.log</replaceable
882916
you define this variable when setting up the cluster, <filename>multimaster</filename> checks that
883917
the cluster name is the same for all the cluster nodes.
884918
</para></listitem></varlistentry>
885-
<varlistentry>
919+
<varlistentry id="mtm-break-connection">
886920
<term><varname>multimaster.break_connection</varname>
887921
<indexterm><primary><varname>multimaster.break_connection</varname></primary>
888922
</indexterm>
@@ -896,7 +930,7 @@ pg_ctl -D <replaceable>datadir</replaceable> -l <replaceable>pg.log</replaceable
896930
</para>
897931
</listitem>
898932
</varlistentry>
899-
<varlistentry>
933+
<varlistentry id="mtm-major-node">
900934
<term><varname>multimaster.major_node</varname>
901935
<indexterm><primary><varname>multimaster.major_node</varname></primary>
902936
</indexterm>
@@ -909,7 +943,7 @@ pg_ctl -D <replaceable>datadir</replaceable> -l <replaceable>pg.log</replaceable
909943
</para>
910944
<important>
911945
<para>This parameter should be used with caution. Only one node in the cluster
912-
can have this parameter set to true. When set to <literal>true</literal> on several
946+
can have this parameter set to <literal>true</literal>. When set to <literal>true</literal> on several
913947
nodes, this parameter can cause the split-brain problem.
914948
</para>
915949
</important>
@@ -1080,12 +1114,12 @@ pg_ctl -D <replaceable>datadir</replaceable> -l <replaceable>pg.log</replaceable
10801114
</indexterm>
10811115
</term>
10821116
<listitem>
1083-
<para>Collects the data returned by the <function>mtm.get_cluster_state()</function> function from all available nodes. For this function to work, in addition to replication connections, <filename>pg_hba.conf</filename> must allow ordinary connections to the node with the specified connection string.
1117+
<para>Collects the data returned by the <link linkend="mtm-get-cluster-state"><function>mtm.get_cluster_state()</function></link> function from all available nodes. For this function to work, in addition to replication connections, <filename>pg_hba.conf</filename> must allow ordinary connections to the node with the specified connection string.
10841118
</para>
10851119
</listitem>
10861120
</varlistentry>
10871121

1088-
<varlistentry>
1122+
<varlistentry id="mtm-get-cluster-state">
10891123
<term>
10901124
<function>mtm.get_cluster_state()</function>
10911125
<indexterm>
@@ -1287,7 +1321,7 @@ pg_ctl -D <replaceable>datadir</replaceable> -l <replaceable>pg.log</replaceable
12871321
</para>
12881322
</listitem>
12891323
</varlistentry>
1290-
<varlistentry>
1324+
<varlistentry id="mtm-recover-node">
12911325
<term>
12921326
<function>mtm.recover_node(<parameter>node</parameter> <type>integer</type>)</function>
12931327
<indexterm>

0 commit comments

Comments
 (0)