Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit ed8df68

Browse files
author
Liudmila Mantrova
committed
Doc bugfixes in multimaster and release notes
1 parent 7c0e6d5 commit ed8df68

File tree

2 files changed

+83
-50
lines changed

2 files changed

+83
-50
lines changed

doc/src/sgml/multimaster.sgml

Lines changed: 53 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
</listitem>
1414
<listitem>
1515
<para>
16-
Synchronous logical replication and DDL Replication
16+
Synchronous logical replication and DDL replication
1717
</para>
1818
</listitem>
1919
<listitem>
@@ -38,7 +38,7 @@
3838
on each node. To ensure data consistency in the case of concurrent
3939
updates, <filename>multimaster</filename> enforces transaction
4040
isolation cluster-wide, using multiversion concurrency control
41-
(<acronym>MVCC</acronym>) at the repeatable read isolation level. Any write
41+
(<acronym>MVCC</acronym>) at the <link linkend="xact-read-committed">read committed</link> or <link linkend="xact-repeatable-read">repeatable read</link> isolation levels. Any write
4242
transaction is synchronously replicated to all nodes, which
4343
increases commit latency for the time required for
4444
synchronization. Read-only transactions and queries are executed
@@ -49,10 +49,11 @@
4949
<filename>multimaster</filename> uses three-phase commit protocol
5050
and heartbeats for failure discovery. A multi-master cluster of <replaceable>N</replaceable>
5151
nodes can continue working while the majority of the nodes are
52-
alive and reachable by other nodes. When the node is reconnected
53-
to the cluster, <filename>multimaster</filename> can automatically
52+
alive and reachable by other nodes. In most cases, three
53+
cluster nodes are enough to ensure high availability. When the node
54+
is reconnected to the cluster, <filename>multimaster</filename> can automatically
5455
fast-forward the node to the actual state based on the
55-
Write-Ahead Log (<acronym>WAL</acronym>) data in the corresponding replication slot. If <acronym>WAL</acronym> data is no longer available for the time when the node was excluded from the cluster, you can restore the node using <filename>pg_basebackup</filename>.
56+
Write-Ahead Log (<acronym>WAL</acronym>) data in the corresponding replication slot. If <acronym>WAL</acronym> data is no longer available for the time when the node was excluded from the cluster, you can restore the node using <application>pg_basebackup</application>.
5657
</para>
5758
<important><para>When using <filename>multimaster</filename>, make sure to take its replication restrictions into account. For details, see <xref linkend="multimaster-usage">.</para></important>
5859
<para>
@@ -73,47 +74,44 @@
7374
</para>
7475
<sect3 id="multimaster-setting-up-a-multi-master-cluster">
7576
<title>Setting up a Multi-Master Cluster</title>
76-
<para>You must have superuser rights to set up a multi-master cluster.</para>
77-
<para>
78-
After installing <productname>&productname;</productname> on all nodes, you need to
77+
<para>After installing <productname>&productname;</productname> on all nodes, you need to
7978
configure the cluster with <filename>multimaster</filename>. Suppose
8079
you are setting up a cluster of three nodes, with
8180
<literal>node1</literal>, <literal>node2</literal>, and
82-
<literal>node3</literal> domain names. First, set up the database to be replicated with
83-
<filename>multimaster</filename>:
81+
<literal>node3</literal> domain names. First, set up the database to be replicated, and make sure you have a user with superuser rights to perform replication:
8482
</para>
8583
<itemizedlist>
8684
<listitem>
8785
<para>
8886
If you are starting from scratch, initialize a cluster,
8987
create an empty database <literal>mydb</literal> and a new
90-
user <literal>myuser</literal>, as usual. For details, see <xref linkend="creating-cluster">.
88+
user <literal>myuser</literal>, on each node of the cluster. For details, see <xref linkend="creating-cluster">.
9189
</para>
9290
</listitem>
9391
<listitem>
9492
<para>
9593
If you already have a database <literal>mydb</literal>
9694
running on <literal>node1</literal>, initialize
9795
new nodes from the working node using
98-
<literal>pg_basebackup</literal>. On each cluster node you
99-
are going to add, run:
96+
<application>pg_basebackup</application>. On behalf of <literal>myuser</literal>, run the following command on each node you are going to add:
10097
<programlisting>
10198
pg_basebackup -D <replaceable>datadir</> -h node1 mydb
10299
</programlisting>
100+
where <replaceable>datadir</> is the directory containing the database cluster. This directory is specified at the cluster initialization stage, or set in the <envar>PGDATA</envar> environment variable.
103101
</para>
104102
<para>
105-
For details on using <literal>pg_basebackup</literal>, see
103+
For details on using <application>pg_basebackup</application>, see
106104
<xref linkend="app-pgbasebackup">.
107105
</para>
108106
</listitem>
109107
</itemizedlist>
110-
<para>Once the database is set up for replication, complete the following steps on each
108+
<para>Once the database is set up, complete the following steps on each
111109
cluster node:
112110
</para>
113111
<orderedlist>
114112
<listitem>
115113
<para>
116-
Modify the <literal>postgresql.conf</literal> configuration
114+
Modify the <filename>postgresql.conf</filename> configuration
117115
file, as follows:
118116
</para>
119117
<itemizedlist>
@@ -122,11 +120,12 @@ pg_basebackup -D <replaceable>datadir</> -h node1 mydb
122120
shared_preload_libraries = 'multimaster'
123121
</programlisting>
124122
</listitem>
125-
<listitem><para>Change transaction isolation level to <literal>repeatable read</literal>:
123+
<listitem><para>Specify the transaction isolation level for your cluster. <filename>multimaster</filename> currently supports <link linkend="xact-read-committed">read committed</link> and <link linkend="xact-repeatable-read">repeatable read</link> isolation levels.
126124
<programlisting>
127-
default_transaction_isolation = 'repeatable read'</programlisting>
128-
<filename>multimaster</filename> supports only the <literal>repeatable read</literal> isolation level. You cannot set up <filename>multimaster</filename> with the default <literal>read committed</literal> level.
125+
default_transaction_isolation = 'read committed'</programlisting>
129126
</para>
127+
<important><para>Using <literal>repeatable read</literal> isolation level increases
128+
the probability of serialization failure at commit time. If such cases are not handled by your application, you are recommended to use <literal>read committed</literal> isolation level.</para></important>
130129
</listitem>
131130
<listitem>
132131
<para>
@@ -182,10 +181,11 @@ multimaster.max_nodes = 3 # cluster size
182181
multimaster.node_id = 1 # the 1-based index of this node
183182
# in the cluster
184183
multimaster.conn_strings = 'dbname=mydb user=myuser host=node1, dbname=mydb user=myuser host=node2, dbname=mydb user=myuser host=node3'
185-
# comma-separated list
184+
# comma-separated list
186185
# of connection strings
187186
# to neighbor nodes
188187
</programlisting>
188+
<para>The <literal>max_nodes</literal> variable defines the cluster size. In most cases, three cluster nodes are enough to ensure high availability. Since the data on all cluster nodes is the same, you typically do not need more than five cluster nodes.</para>
189189
<important><para>The
190190
<literal>node_id</literal> variable takes natural
191191
numbers starting from 1, without any gaps in numbering.
@@ -207,16 +207,8 @@ multimaster.conn_strings = 'dbname=mydb user=myuser host=node1, dbname=mydb user
207207
</listitem>
208208
<listitem>
209209
<para>
210-
Allow replication in <filename>pg_hba.conf</filename>:
210+
Modify the <filename>pg_hba.conf</filename> file to allow replication to each cluster node on behalf of <literal>myuser</literal>.
211211
</para>
212-
<programlisting>
213-
host myuser all node1 trust
214-
host myuser all node2 trust
215-
host myuser all node3 trust
216-
host replication all node1 trust
217-
host replication all node2 trust
218-
host replication all node3 trust
219-
</programlisting>
220212
</listitem>
221213
<listitem>
222214
<para>
@@ -228,15 +220,15 @@ pg_ctl -D <replaceable>datadir</replaceable> -l <replaceable>pg.log</replaceable
228220
</listitem>
229221
</orderedlist>
230222
<para>
231-
When <productname>PostgreSQL</productname> is started on all nodes, connect to any node
232-
and create the <filename>multimaster</filename> extension to get access to all the <filename>multimaster</filename> features:
223+
When <productname>&productname;</productname> is started on all nodes, connect to any node
224+
and create the <filename>multimaster</filename> extension:
233225
<programlisting>
234226
psql -h node1
235227
CREATE EXTENSION multimaster;</programlisting></para>
236228
<para>The <command>CREATE EXTENSION</command> query is replicated to all the cluster nodes.</para>
237229
<para>
238230
To ensure that <filename>multimaster</filename> is enabled, check
239-
the <literal>mtm.get_cluster_state()</literal> view:
231+
the <structname>mtm.get_cluster_state()</structname> view:
240232
</para>
241233
<programlisting>
242234
SELECT * FROM mtm.get_cluster_state();
@@ -306,7 +298,7 @@ SELECT * FROM mtm.get_cluster_state();
306298
maximum size of WAL. Upon reaching the
307299
<varname>multimaster.max_recovery_lag</varname> threshold,
308300
WAL for the disconnected node is overwritten. At this
309-
point, automatic recovery is no longer possible. In this case, you can <link linkend="multimaster-restoring-a-node-manually">restore the node manually</link> by cloning the data from one of the alive nodes using <literal>pg_basebackup</literal>.
301+
point, automatic recovery is no longer possible. In this case, you can <link linkend="multimaster-restoring-a-node-manually">restore the node manually</link> by cloning the data from one of the alive nodes using <application>pg_basebackup</application>.
310302
</para>
311303
</listitem>
312304
<listitem>
@@ -342,7 +334,7 @@ SELECT * FROM mtm.get_cluster_state();
342334
<listitem>
343335
<para>
344336
<filename>multimaster</filename> can only replicate one database
345-
per cluster.
337+
per cluster, which is specified in the <varname>multimaster.conn_strings</varname> variable. If you try to connect to a different database, <filename>multimaster</filename> will return a corresponding error message.
346338
</para>
347339
</listitem>
348340
<listitem>
@@ -352,12 +344,22 @@ SELECT * FROM mtm.get_cluster_state();
352344
because of the logical replication restrictions. Unlogged tables are not replicated, as in the standard <productname>PostgreSQL</productname>.
353345
</para>
354346
<note><para>You can enable replication
355-
of tables without primary keys using the <varname>multimaster.ignore_tables_without_pk</varname> variable. However, take into account that
347+
of tables without primary keys by setting the <varname>multimaster.ignore_tables_without_pk</varname> variable to <literal>false</literal>. However, take into account that
356348
<filename>multimaster</filename> does not allow update operations on such tables.</para></note>
357349
</listitem>
358350
<listitem>
359351
<para>
360-
Sequence generation. To avoid conflicts between unique identifiers on different nodes, <filename>multimaster</filename> modifies the default behavior of sequence generators. For each node, ID generation is started with the node number and is incremented by the number of nodes. For example, in a three-node cluster, 1, 4, and 7 IDs are allocated to the objects written onto the first node, while 2, 5, and 8 IDs are reserved for the second node.
352+
Isolation level. The <filename>multimaster</filename> extension
353+
supports <emphasis>read committed</emphasis> and <emphasis>repeatable read</emphasis> isolation levels. <emphasis>Serializable</emphasis> isolation level is currently not supported.</para>
354+
<important>
355+
<para>When performing a write transaction, <filename>multimaster</filename> blocks the affected objects only on the node on which the transaction is performed. However, since write transactions are allowed on all nodes, other transactions can try to change the same objects on the neighbor nodes at the same time. In this case, the replication of the first transaction can fail because the affected objects on the neighbor nodes are already blocked by another transaction. Similarly, the latter transaction cannot be replicated to the first node. In this case, a distributed deadlock occurs, and one of the transactions needs to be rolled back and repeated.
356+
</para>
357+
<para>If your typical workload has too many rollbacks, it is recommended to use <literal>read committed</literal> isolation level. If it does not help, you can try directing all the write transactions to a single node.</para>
358+
</important>
359+
</listitem>
360+
<listitem>
361+
<para>
362+
Sequence generation. To avoid conflicts between unique identifiers on different nodes, <filename>multimaster</filename> modifies the default behavior of sequence generators. For each node, ID generation is started with the node number and is incremented by the number of nodes. For example, in a three-node cluster, 1, 4, and 7 IDs are allocated to the objects written onto the first node, while 2, 5, and 8 IDs are reserved for the second node. If you change the number of nodes in the cluster, the incrementation interval for new IDs is adjusted accordingly.
361363
</para>
362364
</listitem>
363365
<listitem>
@@ -378,14 +380,6 @@ of tables without primary keys using the <varname>multimaster.ignore_tables_with
378380
and then on all the other nodes simultaneously. In the case of a heavy-write transaction, this may result in a noticeable delay.
379381
</para>
380382
</listitem>
381-
<listitem>
382-
<para>
383-
Isolation level. The <filename>multimaster</filename> extension
384-
currently supports only the <emphasis>repeatable read</emphasis> isolation level. This is stricter than the default <emphasis>read commited</emphasis> level, but also increases
385-
the probability of serialization failure at commit time.
386-
<emphasis>Serializable</emphasis> level is not supported yet.
387-
</para>
388-
</listitem>
389383
</itemizedlist>
390384
<para>If you have any data that must be present on one of the nodes only, you can exclude a particular table from replication, as follows:
391385
<programlisting>SELECT * FROM <function>mtm.make_table_local</function>(::regclass::oid) </programlisting>
@@ -442,7 +436,7 @@ SELECT * FROM mtm.get_cluster_state();
442436
With multimaster, you can add or drop cluster nodes without a
443437
restart. To add a new node, you need to change the cluster
444438
configuration on alive nodes, load all the data to the new node using
445-
<literal>pg_basebackup</literal>, and start the node.
439+
<application>pg_basebackup</application>, and start the node.
446440
</para>
447441
<para>
448442
Suppose we have a working cluster of three nodes, with
@@ -480,14 +474,14 @@ SELECT * FROM mtm.add_node('dbname=mydb user=myuser host=node4');
480474
pg_basebackup -D <replaceable>datadir</replaceable> -h node1 -x
481475
</programlisting>
482476
<para>
483-
<literal>pg_basebackup</literal> copies the entire data
477+
<application>pg_basebackup</application> copies the entire data
484478
directory from <literal>node1</literal>, together with
485479
configuration settings.
486480
</para>
487481
</listitem>
488482
<listitem>
489483
<para>
490-
Update <literal>postgresql.conf</literal> settings on
484+
Update <filename>postgresql.conf</filename> settings on
491485
<literal>node4</literal>:
492486
</para>
493487
<programlisting>
@@ -579,7 +573,7 @@ SELECT * FROM mtm.recover_node(2);
579573
pg_basebackup -D <replaceable>datadir</replaceable> -h node1 -x
580574
</programlisting>
581575
<para>
582-
<literal>pg_basebackup</literal> copies the entire data
576+
<application>pg_basebackup</application> copies the entire data
583577
directory from <literal>node1</literal>, together with
584578
configuration settings.
585579
</para>
@@ -658,6 +652,12 @@ pg_ctl -D <replaceable>datadir</replaceable> -l <replaceable>pg.log</replaceable
658652
</para>
659653
</listitem>
660654
</orderedlist>
655+
<important>
656+
<para>When performing a write transaction, <filename>multimaster</filename> blocks the affected objects on the node on which the transaction is performed. However, since write transactions are allowed on all nodes, other transactions can try to change the same objects on the neighbor nodes at the same time. In this case, the replication of the first transaction can fail because the affected objects on the neighbor nodes are already blocked by another transaction. Similarly, the latter transaction cannot be replicated to the first node. In this case, a distributed deadlock occurs, and one of the transactions needs to be rolled back and repeated.
657+
</para>
658+
<para>
659+
If your typical workload has too many rollbacks, it is recommended to use read committed isolation level. If it does not help, you can try directing all the write transactions to a single node.</para>
660+
</important>
661661
<para>
662662
If a node crashes or gets disconnected from the cluster between
663663
the <literal>PREPARE</literal> and <literal>COMMIT</literal>
@@ -767,7 +767,7 @@ pg_ctl -D <replaceable>datadir</replaceable> -l <replaceable>pg.log</replaceable
767767
WAL lag is bigger than
768768
<varname>multimaster.max_recovery_lag</varname>, you can manually
769769
restore the node from one of the working nodes using
770-
<filename>pg_basebackup</filename>.
770+
<application>pg_basebackup</application>.
771771
</para></note>
772772
<para><emphasis role="strong">See Also</emphasis></para>
773773
<para><xref linkend="multimaster-restoring-a-node-manually"></para>
@@ -798,6 +798,9 @@ pg_ctl -D <replaceable>datadir</replaceable> -l <replaceable>pg.log</replaceable
798798
custom port for all connection strings using the
799799
<varname>multimaster.arbiter_port</varname> variable.
800800
</para></listitem></varlistentry>
801+
<varlistentry><term><varname>multimaster.max_nodes</varname><indexterm><primary><varname>multimaster.max_nodes</varname></primary></indexterm></term><listitem><para>
802+
The maximum number of nodes allowed in the cluster. In most cases, three cluster nodes are enough to ensure high availability. Since the data on all cluster nodes is the same, you typically do not need more than five cluster nodes. The maximum possible number of nodes is limited to 64.</para><para>Default: the number of nodes specified in the <varname>multimaster.conn_strings</varname> variable
803+
</para></listitem></varlistentry>
801804
<varlistentry><term><varname>multimaster.arbiter_port</varname><indexterm><primary><varname>multimaster.arbiter_port</varname></primary></indexterm></term><listitem><para>
802805
Port for the arbiter
803806
process to listen on. </para><para>Default: 5433
@@ -831,7 +834,7 @@ pg_ctl -D <replaceable>datadir</replaceable> -l <replaceable>pg.log</replaceable
831834
overflow. At this point, automatic recovery of the node is no longer
832835
possible. In this case, you can restore the node manually by cloning
833836
the data from one of the alive nodes using
834-
<filename>pg_basebackup</filename> or a similar tool. If you set this
837+
<application>pg_basebackup</application> or a similar tool. If you set this
835838
variable to zero, replication slot will not be dropped. </para><para>Default:
836839
10000000
837840
</para></listitem></varlistentry>

0 commit comments

Comments
 (0)