Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit a1cc9db

Browse files
author
Liudmila Mantrova
committed
Doc fixes for multimaster - PGPRO-552
1 parent 4f2edb0 commit a1cc9db

File tree

1 file changed

+78
-75
lines changed

1 file changed

+78
-75
lines changed

doc/src/sgml/multimaster.sgml

Lines changed: 78 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
<sect1 id="multimaster">
22
<title>multimaster</title>
33
<para><emphasis role="strong">Table of Contents</emphasis></para>
4+
<para><link linkend="multimaster-usage">Limitations</link></para>
45
<para><link linkend="multimaster-architecture">Architecture</link></para>
56
<para><link linkend="multimaster-installation">Installation and Setup</link></para>
6-
<para><link linkend="multimaster-usage">Using multimaster for Data Replication</link></para>
77
<para><link linkend="multimaster-administration">Multi-Master Cluster Administration</link></para>
88
<para><link linkend="multimaster-reference">Reference</link></para>
99
<para><link linkend="multimaster-compatibility">Compatibility</link></para>
@@ -57,17 +57,80 @@
5757
<filename>multimaster</filename> uses three-phase commit protocol
5858
and heartbeats for failure discovery. A multi-master cluster of <replaceable>N</replaceable>
5959
nodes can continue working while the majority of the nodes are
60-
alive and reachable by other nodes. In most cases, three
61-
cluster nodes are enough to ensure high availability. When the node
60+
alive and reachable by other nodes. To be configured with <filename>multimaster</filename>, the cluster must include at least two nodes. In most cases, three
61+
cluster nodes are enough to ensure high availability. Since the data on all cluster nodes is the same, you do not typically need more than five cluster nodes.</para>
62+
<para>When a failed node
6263
is reconnected to the cluster, <filename>multimaster</filename> can automatically
6364
fast-forward the node to the actual state based on the
64-
Write-Ahead Log (<acronym>WAL</acronym>) data in the corresponding replication slot. If <acronym>WAL</acronym> data is no longer available for the time when the node was excluded from the cluster, you can restore the node using <application>pg_basebackup</application>.
65+
Write-Ahead Log (<acronym>WAL</acronym>) data in the corresponding replication slot. If <acronym>WAL</acronym> data is no longer available for the time when the node was excluded from the cluster, you can <link linkend="multimaster-restoring-a-node-manually">restore the node using <application>pg_basebackup</application></link>.
6566
</para>
6667
<important><para>When using <filename>multimaster</filename>, make sure to take its replication restrictions into account. For details, see <xref linkend="multimaster-usage">.</para></important>
6768
<para>
6869
To learn more about the <filename>multimaster</filename> internals, see
6970
<xref linkend="multimaster-architecture">.
7071
</para>
72+
73+
<sect2 id="multimaster-usage">
74+
<title>Limitations</title>
75+
<para>The <filename>multimaster</filename> extension takes care of the database replication in a fully automated way. You can perform write transactions on any node and work with temporary tables on each cluster node simultaneosly. However, make sure to take the following replication restrictions into account:</para>
76+
<itemizedlist>
77+
<listitem>
78+
<para>
79+
<filename>multimaster</filename> can only replicate one database
80+
per cluster, which is specified in the <varname>multimaster.conn_strings</varname> variable. If you try to connect to a different database, <filename>multimaster</filename> will return a corresponding error message.
81+
</para>
82+
</listitem>
83+
<listitem>
84+
<para>
85+
The replicated tables must have primary keys or replica identity. Otherwise,
86+
<filename>multimaster</filename> will not allow replication
87+
because of the logical replication restrictions. Unlogged tables are not replicated, as in the standard <productname>PostgreSQL</productname>.
88+
</para>
89+
<note><para>You can enable replication
90+
of tables without primary keys by setting the <varname>multimaster.ignore_tables_without_pk</varname> variable to <literal>false</literal>. However, take into account that
91+
<filename>multimaster</filename> does not allow update operations on such tables.</para></note>
92+
</listitem>
93+
<listitem>
94+
<para>
95+
Isolation level. The <filename>multimaster</filename> extension
96+
supports <emphasis><literal>read committed</literal></emphasis> and <emphasis><literal>repeatable read</literal></emphasis> isolation levels. <emphasis><literal>Serializable</literal></emphasis> isolation level is currently not supported.</para>
97+
<important>
98+
<para>Using <literal>repeatable read</literal> isolation level increases
99+
the probability of serialization failure at commit time. Unlike in the standard <productname>PostgreSQL</productname>, <literal>read committed</literal> level can also cause serialization failures on a multi-master cluster.</para>
100+
<para>When performing a write transaction, <filename>multimaster</filename> blocks the affected objects only on the node on which the transaction is performed. However, since write transactions are allowed on all nodes, other transactions can try to change the same objects on the neighbor nodes at the same time. In this case, the replication of the first transaction can fail because the affected objects on the neighbor nodes are already blocked by another transaction. Similarly, the latter transaction cannot be replicated to the first node. In this case, a distributed deadlock occurs. As a result, one of the transactions is automatically rolled back and needs to be repeated. The application must be ready to retry transactions.
101+
</para>
102+
<para>If your typical workload has too many rollbacks, it is recommended to use <literal>read committed</literal> isolation level. However, the <literal>read committed</literal> still does not guarantee the absence of deadlocks on a multi-master cluster. If using the <literal>read committed</literal> level does not help, you can try directing all the write transactions to a single node.</para>
103+
</important>
104+
</listitem>
105+
<listitem>
106+
<para>
107+
Sequence generation. To avoid conflicts between unique identifiers on different nodes, <filename>multimaster</filename> modifies the default behavior of sequence generators. For each node, ID generation is started with the node number and is incremented by the number of nodes. For example, in a three-node cluster, 1, 4, and 7 IDs are allocated to the objects written onto the first node, while 2, 5, and 8 IDs are reserved for the second node. If you change the number of nodes in the cluster, the incrementation interval for new IDs is adjusted accordingly. Thus, the generated sequence values are not monotonic.
108+
</para>
109+
</listitem>
110+
<listitem>
111+
<para>
112+
<acronym>DDL</acronym> replication. While <filename>multimaster</filename>
113+
replicates data on the logical level, <acronym>DDL</acronym> is replicated on the
114+
statement level, which results in distributed commits of the same
115+
statement on different nodes. As a result, complex <acronym>DDL</acronym>
116+
scenarios, such as stored procedures and temporary tables, may
117+
work differently as compared to the standard <productname>PostgreSQL</productname>.
118+
</para>
119+
</listitem>
120+
<listitem>
121+
<para>
122+
Commit latency. In the current implementation of logical
123+
replication, <filename>multimaster</filename> sends data to subscriber nodes only after the
124+
local commit, so you have to wait for transaction processing twice: first on the local node,
125+
and then on all the other nodes simultaneously. In the case of a heavy-write transaction, this may result in a noticeable delay.
126+
</para>
127+
</listitem>
128+
</itemizedlist>
129+
<para>If you have any data that must be present on one of the nodes only, you can exclude a particular table from replication, as follows:
130+
<programlisting>SELECT * FROM <function>mtm.make_table_local</function>(::regclass::oid) </programlisting>
131+
where <literal>regclass</literal> is the name of the table and <literal>oid</literal> is the unique table identifier.</para>
132+
</sect2>
133+
71134
<sect2 id="multimaster-architecture">
72135
<title>Architecture</title>
73136
<sect3 id="multimaster-replication">
@@ -80,7 +143,7 @@
80143
<filename>multimaster</filename> uses <link linkend="logicaldecoding-synchronous">logical replication</link> and the <link linkend="multimaster-credits">three-phase E3PC commit protocol</link>.
81144
</para>
82145
<para>
83-
When PostgeSQL loads the <filename>multimaster</filename> shared
146+
When <productname>&productname;</productname> loads the <filename>multimaster</filename> shared
84147
library, <filename>multimaster</filename> sets up a logical
85148
replication producer and consumer for each node, and hooks into
86149
the transaction commit pipeline. The typical data replication
@@ -125,19 +188,16 @@
125188
</listitem>
126189
</orderedlist>
127190
<important>
128-
<para>When performing a write transaction, <filename>multimaster</filename> blocks the affected objects only on the node on which the transaction is performed. However, since write transactions are allowed on all nodes, other transactions can try to change the same objects on the neighbor nodes at the same time. In this case, the replication of the first transaction can fail because the affected objects on the neighbor nodes are already blocked by another transaction. Similarly, the latter transaction cannot be replicated to the first node. In this case, a distributed deadlock occurs, and one of the transactions needs to be rolled back and repeated.
129-
</para>
130-
<para>
131-
If your typical workload has too many rollbacks, it is recommended to use <literal>read committed</literal> isolation level. If it does not help, you can try directing all the write transactions to a single node.</para>
191+
<para><filename>multimaster</filename> currently supports the <literal>read committed</literal> and <literal>repeatable read</literal> isolation levels only, which can cause unexpected serialization failures in your workload. For details, see <xref linkend="multimaster-usage">.</para>
132192
</important>
133193
<para>
134194
If a node crashes or gets disconnected from the cluster between
135195
the <literal>PREPARE</literal> and <literal>COMMIT</literal>
136196
phases, the <literal>PRECOMMIT</literal> phase ensures that the
137197
survived nodes have enough information to complete the prepared
138-
transaction. The <literal>PRECOMMITTED</literal> messages help you
139-
avoid the situation when the crashed node have already committed
140-
or aborted the transaction, but have not notified other nodes
198+
transaction. The <literal>PRECOMMITTED</literal> messages help
199+
avoid the situation when the crashed node has already committed
200+
or aborted the transaction, but has not notified other nodes
141201
about the transaction status. In a two-phase commit (2PC), such a
142202
transaction would block resources (hold locks) until the recovery
143203
of the crashed node. Otherwise, you could get data inconsistencies
@@ -242,20 +302,21 @@ If your typical workload has too many rollbacks, it is recommended to use <liter
242302
<application>pg_basebackup</application>.
243303
</para></note>
244304
<para><emphasis role="strong">See Also</emphasis></para>
245-
<para><link linkend="multimaster-restoring-a-node-manually">Restoring a Node Manually</link></para>
305+
<para><link linkend="multimaster-restoring-a-node-manually">Restoring a Cluster Node</link></para>
246306
</sect3>
247307
</sect2>
248308
<sect2 id="multimaster-installation">
249309
<title>Installation and Setup</title>
250310
<para>
251311
To use <filename>multimaster</filename>, you need to install
252312
<productname>&productname;</productname> on all nodes of your cluster. <productname>&productname;</productname> includes all the required dependencies and
253-
extensions.
313+
extensions.
254314
</para>
255315
<sect3 id="multimaster-setting-up-a-multi-master-cluster">
256316
<title>Setting up a Multi-Master Cluster</title>
257317
<para>After installing <productname>&productname;</productname> on all nodes, you need to
258-
configure the cluster with <filename>multimaster</filename>. Suppose
318+
configure the cluster with <filename>multimaster</filename>.</para>
319+
<para>Suppose
259320
you are setting up a cluster of three nodes, with
260321
<literal>node1</literal>, <literal>node2</literal>, and
261322
<literal>node3</literal> domain names. First, set up the database to be replicated, and make sure you have a user with superuser rights to perform replication:
@@ -365,7 +426,7 @@ multimaster.conn_strings = 'dbname=mydb user=myuser host=node1, dbname=mydb user
365426
# of connection strings
366427
# to neighbor nodes
367428
</programlisting>
368-
<para>The <literal>max_nodes</literal> variable defines the cluster size. In most cases, three cluster nodes are enough to ensure high availability. Since the data on all cluster nodes is the same, you typically do not need more than five cluster nodes.</para>
429+
<para>The <literal>max_nodes</literal> variable defines the cluster size. In most cases, three cluster nodes are enough to ensure high availability. Since the data on all cluster nodes is the same, you do not typically need more than five cluster nodes.</para>
369430
<important><para>The
370431
<literal>node_id</literal> variable takes natural
371432
numbers starting from 1, without any gaps in numbering.
@@ -507,64 +568,6 @@ SELECT * FROM mtm.get_cluster_state();
507568
</sect4>
508569
</sect3>
509570
</sect2>
510-
<sect2 id="multimaster-usage">
511-
<title>Using multimaster for Data Replication</title>
512-
<para>The <filename>multimaster</filename> extension takes care of the database replication in a fully automated way. You can perform write transactions on any node, and work with temporary tables on each cluster node simultaneosly. However, make sure to take the following replication restrictions into account:</para>
513-
<itemizedlist>
514-
<listitem>
515-
<para>
516-
<filename>multimaster</filename> can only replicate one database
517-
per cluster, which is specified in the <varname>multimaster.conn_strings</varname> variable. If you try to connect to a different database, <filename>multimaster</filename> will return a corresponding error message.
518-
</para>
519-
</listitem>
520-
<listitem>
521-
<para>
522-
The replicated tables must have primary keys or replica identity. Otherwise,
523-
<filename>multimaster</filename> will not allow replication
524-
because of the logical replication restrictions. Unlogged tables are not replicated, as in the standard <productname>PostgreSQL</productname>.
525-
</para>
526-
<note><para>You can enable replication
527-
of tables without primary keys by setting the <varname>multimaster.ignore_tables_without_pk</varname> variable to <literal>false</literal>. However, take into account that
528-
<filename>multimaster</filename> does not allow update operations on such tables.</para></note>
529-
</listitem>
530-
<listitem>
531-
<para>
532-
Isolation level. The <filename>multimaster</filename> extension
533-
supports <emphasis><literal>read committed</literal></emphasis> and <emphasis><literal>repeatable read</literal></emphasis> isolation levels. <emphasis><literal>Serializable</literal></emphasis> isolation level is currently not supported.</para>
534-
<important>
535-
<para>When performing a write transaction, <filename>multimaster</filename> blocks the affected objects only on the node on which the transaction is performed. However, since write transactions are allowed on all nodes, other transactions can try to change the same objects on the neighbor nodes at the same time. In this case, the replication of the first transaction can fail because the affected objects on the neighbor nodes are already blocked by another transaction. Similarly, the latter transaction cannot be replicated to the first node. In this case, a distributed deadlock occurs, and one of the transactions needs to be rolled back and repeated.
536-
</para>
537-
<para>If your typical workload has too many rollbacks, it is recommended to use <literal>read committed</literal> isolation level. If it does not help, you can try directing all the write transactions to a single node.</para>
538-
</important>
539-
</listitem>
540-
<listitem>
541-
<para>
542-
Sequence generation. To avoid conflicts between unique identifiers on different nodes, <filename>multimaster</filename> modifies the default behavior of sequence generators. For each node, ID generation is started with the node number and is incremented by the number of nodes. For example, in a three-node cluster, 1, 4, and 7 IDs are allocated to the objects written onto the first node, while 2, 5, and 8 IDs are reserved for the second node. If you change the number of nodes in the cluster, the incrementation interval for new IDs is adjusted accordingly.
543-
</para>
544-
</listitem>
545-
<listitem>
546-
<para>
547-
<acronym>DDL</acronym> replication. While <filename>multimaster</filename>
548-
replicates data on the logical level, <acronym>DDL</acronym> is replicated on the
549-
statement level, which results in distributed commits of the same
550-
statement on different nodes. As a result, complex <acronym>DDL</acronym>
551-
scenarios, such as stored procedures and temporary tables, may
552-
work differently as compared to the standard <productname>PostgreSQL</productname>.
553-
</para>
554-
</listitem>
555-
<listitem>
556-
<para>
557-
Commit latency. In the current implementation of logical
558-
replication, <filename>multimaster</filename> sends data to subscriber nodes only after the
559-
local commit, so you have to wait for transaction processing twice: first on the local node,
560-
and then on all the other nodes simultaneously. In the case of a heavy-write transaction, this may result in a noticeable delay.
561-
</para>
562-
</listitem>
563-
</itemizedlist>
564-
<para>If you have any data that must be present on one of the nodes only, you can exclude a particular table from replication, as follows:
565-
<programlisting>SELECT * FROM <function>mtm.make_table_local</function>(::regclass::oid) </programlisting>
566-
where <literal>regclass</literal> is the name of the table and <literal>oid</literal> is the unique table identifier.</para>
567-
</sect2>
568571
<sect2 id="multimaster-administration"><title>Multi-Master Cluster Administration</title>
569572
<itemizedlist>
570573
<listitem>
@@ -801,7 +804,7 @@ pg_ctl -D <replaceable>datadir</replaceable> -l <replaceable>pg.log</replaceable
801804
<varname>multimaster.arbiter_port</varname> variable.
802805
</para></listitem></varlistentry>
803806
<varlistentry><term><varname>multimaster.max_nodes</varname><indexterm><primary><varname>multimaster.max_nodes</varname></primary></indexterm></term><listitem><para>
804-
The maximum number of nodes allowed in the cluster. In most cases, three cluster nodes are enough to ensure high availability. Since the data on all cluster nodes is the same, you typically do not need more than five cluster nodes. The maximum possible number of nodes is limited to 64.</para><para>Default: the number of nodes specified in the <varname>multimaster.conn_strings</varname> variable
807+
The maximum number of nodes allowed in the cluster. In most cases, three cluster nodes are enough to ensure high availability. Since the data on all cluster nodes is the same, you do not typically need more than five cluster nodes. The maximum possible number of nodes is limited to 64.</para><para>Default: the number of nodes specified in the <varname>multimaster.conn_strings</varname> variable
805808
</para></listitem></varlistentry>
806809
<varlistentry><term><varname>multimaster.arbiter_port</varname><indexterm><primary><varname>multimaster.arbiter_port</varname></primary></indexterm></term><listitem><para>
807810
Port for the arbiter

0 commit comments

Comments
 (0)