You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
isolation cluster-wide, using multiversion concurrency control
41
-
(<acronym>MVCC</acronym>) at the repeatable read isolation level. Any write
41
+
(<acronym>MVCC</acronym>) at the <link linkend="xact-read-committed">read committed</link> or <link linkend="xact-repeatable-read">repeatable read</link> isolation levels. Any write
42
42
transaction is synchronously replicated to all nodes, which
43
43
increases commit latency for the time required for
44
44
synchronization. Read-only transactions and queries are executed
and heartbeats for failure discovery. A multi-master cluster of <replaceable>N</replaceable>
51
51
nodes can continue working while the majority of the nodes are
52
-
alive and reachable by other nodes. When the node is reconnected
53
-
to the cluster, <filename>multimaster</filename> can automatically
52
+
alive and reachable by other nodes. In most cases, three
53
+
cluster nodes are enough to ensure high availability. When the node
54
+
is reconnected to the cluster, <filename>multimaster</filename> can automatically
54
55
fast-forward the node to the actual state based on the
55
-
Write-Ahead Log (<acronym>WAL</acronym>) data in the corresponding replication slot. If <acronym>WAL</acronym> data is no longer available for the time when the node was excluded from the cluster, you can restore the node using <filename>pg_basebackup</filename>.
56
+
Write-Ahead Log (<acronym>WAL</acronym>) data in the corresponding replication slot. If <acronym>WAL</acronym> data is no longer available for the time when the node was excluded from the cluster, you can restore the node using <application>pg_basebackup</application>.
56
57
</para>
57
58
<important><para>When using <filename>multimaster</filename>, make sure to take its replication restrictions into account. For details, see <xref linkend="multimaster-usage">.</para></important>
<para>You must have superuser rights to set up a multi-master cluster.</para>
77
-
<para>
78
-
After installing <productname>&productname;</productname> on all nodes, you need to
77
+
<para>After installing <productname>&productname;</productname> on all nodes, you need to
79
78
configure the cluster with <filename>multimaster</filename>. Suppose
80
79
you are setting up a cluster of three nodes, with
81
80
<literal>node1</literal>, <literal>node2</literal>, and
82
-
<literal>node3</literal> domain names. First, set up the database to be replicated with
83
-
<filename>multimaster</filename>:
81
+
<literal>node3</literal> domain names. First, set up the database to be replicated, and make sure you have a user with superuser rights to perform replication:
84
82
</para>
85
83
<itemizedlist>
86
84
<listitem>
87
85
<para>
88
86
If you are starting from scratch, initialize a cluster,
89
87
create an empty database <literal>mydb</literal> and a new
90
-
user <literal>myuser</literal>, as usual. For details, see <xref linkend="creating-cluster">.
88
+
user <literal>myuser</literal>, on each node of the cluster. For details, see <xref linkend="creating-cluster">.
91
89
</para>
92
90
</listitem>
93
91
<listitem>
94
92
<para>
95
93
If you already have a database <literal>mydb</literal>
96
94
running on <literal>node1</literal>, initialize
97
95
new nodes from the working node using
98
-
<literal>pg_basebackup</literal>. On each cluster node you
99
-
are going to add, run:
96
+
<application>pg_basebackup</application>. On behalf of <literal>myuser</literal>, run the following command on each node you are going to add:
where <replaceable>datadir</> is the directory containing the database cluster. This directory is specified at the cluster initialization stage, or set in the <envar>PGDATA</envar> environment variable.
103
101
</para>
104
102
<para>
105
-
For details on using <literal>pg_basebackup</literal>, see
103
+
For details on using <application>pg_basebackup</application>, see
106
104
<xref linkend="app-pgbasebackup">.
107
105
</para>
108
106
</listitem>
109
107
</itemizedlist>
110
-
<para>Once the database is set up for replication, complete the following steps on each
108
+
<para>Once the database is set up, complete the following steps on each
111
109
cluster node:
112
110
</para>
113
111
<orderedlist>
114
112
<listitem>
115
113
<para>
116
-
Modify the <literal>postgresql.conf</literal> configuration
114
+
Modify the <filename>postgresql.conf</filename> configuration
<listitem><para>Change transaction isolation level to <literal>repeatable read</literal>:
123
+
<listitem><para>Specify the transaction isolation level for your cluster. <filename>multimaster</filename> currently supports <link linkend="xact-read-committed">read committed</link> and <link linkend="xact-repeatable-read">repeatable read</link> isolation levels.
<filename>multimaster</filename> supports only the <literal>repeatable read</literal> isolation level. You cannot set up <filename>multimaster</filename> with the default <literal>read committed</literal> level.
the probability of serialization failure at commit time. If such cases are not handled by your application, you are recommended to use <literal>read committed</literal> isolation level.</para></important>
<para>The <literal>max_nodes</literal> variable defines the cluster size. In most cases, three cluster nodes are enough to ensure high availability. Since the data on all cluster nodes is the same, you typically do not need more than five cluster nodes.</para>
189
189
<important><para>The
190
190
<literal>node_id</literal> variable takes natural
191
191
numbers starting from 1, without any gaps in numbering.
WAL for the disconnected node is overwritten. At this
309
-
point, automatic recovery is no longer possible. In this case, you can <link linkend="multimaster-restoring-a-node-manually">restore the node manually</link> by cloning the data from one of the alive nodes using <literal>pg_basebackup</literal>.
301
+
point, automatic recovery is no longer possible. In this case, you can <link linkend="multimaster-restoring-a-node-manually">restore the node manually</link> by cloning the data from one of the alive nodes using <application>pg_basebackup</application>.
310
302
</para>
311
303
</listitem>
312
304
<listitem>
@@ -342,7 +334,7 @@ SELECT * FROM mtm.get_cluster_state();
342
334
<listitem>
343
335
<para>
344
336
<filename>multimaster</filename> can only replicate one database
345
-
per cluster.
337
+
per cluster, which is specified in the <varname>multimaster.conn_strings</varname> variable. If you try to connect to a different database, <filename>multimaster</filename> will return a corresponding error message.
346
338
</para>
347
339
</listitem>
348
340
<listitem>
@@ -352,12 +344,22 @@ SELECT * FROM mtm.get_cluster_state();
352
344
because of the logical replication restrictions. Unlogged tables are not replicated, as in the standard <productname>PostgreSQL</productname>.
353
345
</para>
354
346
<note><para>You can enable replication
355
-
of tables without primary keys using the <varname>multimaster.ignore_tables_without_pk</varname> variable. However, take into account that
347
+
of tables without primary keys by setting the <varname>multimaster.ignore_tables_without_pk</varname> variable to <literal>false</literal>. However, take into account that
356
348
<filename>multimaster</filename> does not allow update operations on such tables.</para></note>
357
349
</listitem>
358
350
<listitem>
359
351
<para>
360
-
Sequence generation. To avoid conflicts between unique identifiers on different nodes, <filename>multimaster</filename> modifies the default behavior of sequence generators. For each node, ID generation is started with the node number and is incremented by the number of nodes. For example, in a three-node cluster, 1, 4, and 7 IDs are allocated to the objects written onto the first node, while 2, 5, and 8 IDs are reserved for the second node.
352
+
Isolation level. The <filename>multimaster</filename> extension
353
+
supports <emphasis>read committed</emphasis> and <emphasis>repeatable read</emphasis> isolation levels. <emphasis>Serializable</emphasis> isolation level is currently not supported.</para>
354
+
<important>
355
+
<para>When performing a write transaction, <filename>multimaster</filename> blocks the affected objects only on the node on which the transaction is performed. However, since write transactions are allowed on all nodes, other transactions can try to change the same objects on the neighbor nodes at the same time. In this case, the replication of the first transaction can fail because the affected objects on the neighbor nodes are already blocked by another transaction. Similarly, the latter transaction cannot be replicated to the first node. In this case, a distributed deadlock occurs, and one of the transactions needs to be rolled back and repeated.
356
+
</para>
357
+
<para>If your typical workload has too many rollbacks, it is recommended to use <literal>read committed</literal> isolation level. If it does not help, you can try directing all the write transactions to a single node.</para>
358
+
</important>
359
+
</listitem>
360
+
<listitem>
361
+
<para>
362
+
Sequence generation. To avoid conflicts between unique identifiers on different nodes, <filename>multimaster</filename> modifies the default behavior of sequence generators. For each node, ID generation is started with the node number and is incremented by the number of nodes. For example, in a three-node cluster, 1, 4, and 7 IDs are allocated to the objects written onto the first node, while 2, 5, and 8 IDs are reserved for the second node. If you change the number of nodes in the cluster, the incrementation interval for new IDs is adjusted accordingly.
361
363
</para>
362
364
</listitem>
363
365
<listitem>
@@ -378,14 +380,6 @@ of tables without primary keys using the <varname>multimaster.ignore_tables_with
378
380
and then on all the other nodes simultaneously. In the case of a heavy-write transaction, this may result in a noticeable delay.
379
381
</para>
380
382
</listitem>
381
-
<listitem>
382
-
<para>
383
-
Isolation level. The <filename>multimaster</filename> extension
384
-
currently supports only the <emphasis>repeatable read</emphasis> isolation level. This is stricter than the default <emphasis>read commited</emphasis> level, but also increases
385
-
the probability of serialization failure at commit time.
386
-
<emphasis>Serializable</emphasis> level is not supported yet.
387
-
</para>
388
-
</listitem>
389
383
</itemizedlist>
390
384
<para>If you have any data that must be present on one of the nodes only, you can exclude a particular table from replication, as follows:
391
385
<programlisting>SELECT * FROM <function>mtm.make_table_local</function>(::regclass::oid) </programlisting>
@@ -442,7 +436,7 @@ SELECT * FROM mtm.get_cluster_state();
442
436
With multimaster, you can add or drop cluster nodes without a
443
437
restart. To add a new node, you need to change the cluster
444
438
configuration on alive nodes, load all the data to the new node using
445
-
<literal>pg_basebackup</literal>, and start the node.
439
+
<application>pg_basebackup</application>, and start the node.
446
440
</para>
447
441
<para>
448
442
Suppose we have a working cluster of three nodes, with
@@ -480,14 +474,14 @@ SELECT * FROM mtm.add_node('dbname=mydb user=myuser host=node4');
<para>When performing a write transaction, <filename>multimaster</filename> blocks the affected objects on the node on which the transaction is performed. However, since write transactions are allowed on all nodes, other transactions can try to change the same objects on the neighbor nodes at the same time. In this case, the replication of the first transaction can fail because the affected objects on the neighbor nodes are already blocked by another transaction. Similarly, the latter transaction cannot be replicated to the first node. In this case, a distributed deadlock occurs, and one of the transactions needs to be rolled back and repeated.
657
+
</para>
658
+
<para>
659
+
If your typical workload has too many rollbacks, it is recommended to use read committed isolation level. If it does not help, you can try directing all the write transactions to a single node.</para>
660
+
</important>
661
661
<para>
662
662
If a node crashes or gets disconnected from the cluster between
663
663
the <literal>PREPARE</literal> and <literal>COMMIT</literal>
The maximum number of nodes allowed in the cluster. In most cases, three cluster nodes are enough to ensure high availability. Since the data on all cluster nodes is the same, you typically do not need more than five cluster nodes. The maximum possible number of nodes is limited to 64.</para><para>Default: the number of nodes specified in the <varname>multimaster.conn_strings</varname> variable
0 commit comments