You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1.[Adding New Nodes to the Cluster](#adding-nodes-to-cluster)
8
+
1.[Excluding Nodes from the Cluster](#excluding-nodes-from-cluster)
10
9
11
10
12
11
13
12
## Installation
14
13
15
-
Multi-master consist of patched version of postgres and extension mmts, that provides most of the functionality, but depends on several modifications to postgres core.
14
+
Multi-master consists of a patched version of PostgreSQL and the `mmts` extension that provide most of the functionality, but depend on several modifications to PostgreSQL core.
16
15
17
16
18
17
### Building multimaster from Source Code
@@ -53,45 +52,47 @@ In this command, ```./configure``` is a standard PostgreSQL autotools script, so
53
52
54
53
### Docker
55
54
56
-
Directory contrib/mmts also includes docker-compose.yml that is capable of building multi-master and starting 3 node cluster.
55
+
The contrib/mmts directory also includes docker-compose.yml that is capable of building `multimaster` and starting a three-node cluster.
57
56
58
-
First of all we need to build PGPro EE docker image (We will remove this step when we'll merge _MULTIMASTER branch and start building packages). In the repo root run:
57
+
First of all we need to build PGPro EE docker image (We will remove this step when we'll merge _MULTIMASTER branch and start building packages). In the repo root, run:
59
58
```
60
59
docker build -t pgproent
61
60
```
62
61
63
-
Then following command will start cluster of 3 docker nodes listening on port 15432, 15433 and 15434 (edit docker-compose.yml to change start params):
62
+
Then following command will start a cluster of 3 docker nodes listening on ports 15432, 15433 and 15434 (edit docker-compose.yml to change start params):
64
63
```
65
64
cd contrib/mmts
66
65
docker-compose up
67
66
```
68
67
69
68
### PgPro packages
70
69
71
-
PostgresPro Enterprise have all necessary dependencies and extensions included, so it is enough just install packages.
70
+
To use `multimaster`, you need to install Postgres Pro Enterprise on all nodes of your cluster. Postgres Pro Enterprise includes all the required dependencies and extensions.
72
71
73
72
74
-
## Setting up empty cluster
73
+
## Setting up a Multi-Master Cluster
75
74
76
-
After installing software on all cluster nodes we can configure our cluster. Here we describe how to set up multimaster consisting of 3 nodes with empty database. Suppose our nodes accesible via domain names ```node1```, ```node2``` and ```node3```. Perform following steps on each node (sequentially or in parallel – doesn't matter):
75
+
After installing Postgres Pro Enterprise on all nodes, you need to configure the cluster with `multimaster`. Suppose you are setting up a cluster of three nodes, with ```node1```, ```node2```, and ```node3``` domain names.
76
+
To configure your cluster with `multimaster`, complete these steps on each cluster node:
77
77
78
-
1. As with usual postgres first of all we need to initialize directiory where postgres will store it files:
79
-
```
80
-
initdb -D ./datadir
81
-
```
82
-
In that directory we are interested in files ```postgresql.conf``` and ```pg_hba.conf``` that are responsible for a general and security configuration consequently.
83
-
84
-
1. Create database that will be used with multimaster. This will require intermediate launch of postgres.
78
+
1. Set up the database to be replicated with `multimaster`:
85
79
80
+
* If you are starting from scratch, initialize a cluster, create an empty database `mydb` and a new user `myuser`, as usual:
86
81
```
82
+
initdb -D ./datadir
87
83
pg_ctl -D ./datadir -l ./pg.log start
88
84
createdb myuser -h localhost
89
85
createdb mydb -O myuser -h localhost
90
86
pg_ctl -D ./datadir -l ./pg.log stop
91
87
```
92
88
93
-
1. Modify the ```postgresql.conf``` configuration file, as follows:
89
+
* If you already have a database `mydb` running on the `node1` server, initialize new nodes from the working node using `pg_basebackup`. On each cluster node you are going to add, run:
90
+
```
91
+
pg_basebackup -D ./datadir -h node1 mydb
92
+
```
93
+
For details, on `pg_basebackup`, see [pg_basebackup](https://www.postgresql.org/docs/9.6/static/app-pgbasebackup.html).
94
94
95
+
1. Modify the ```postgresql.conf``` configuration file, as follows:
95
96
* Set up PostgreSQL parameters related to replication.
96
97
97
98
```
@@ -101,27 +102,27 @@ After installing software on all cluster nodes we can configure our cluster. Her
101
102
max_wal_senders = 10 # at least the number of nodes
102
103
max_replication_slots = 10 # at least the number of nodes
103
104
```
104
-
You must change the replication level to `logical` as multimaster relies on logical replication. For a cluster of N nodes,and enable at least N WAL sender processes and replication slots. Since multimaster implicitly adds a PREPARE phase to each COMMIT transaction, make sure to set the number of prepared transactions to N*max_connections. Otherwise, prepared transactions may be queued.
105
+
You must change the replication level to `logical` as `multimaster` relies on logical replication. For a cluster of N nodes, enable at least N WAL sender processes and replication slots. Since `multimaster` implicitly adds a `PREPARE` phase to each `COMMIT` transaction, make sure to set the number of prepared transactions to N*max_connections. Otherwise, prepared transactions may be queued.
105
106
106
107
* Make sure you have enough background workers allocated for each node:
107
108
108
109
```
109
110
max_worker_processes = 250
110
111
```
111
-
For example, for a three-node cluster with max_connections = 100, multimaster may need up to 206 background workers at peak times: 200 workers for connections from the neighbour nodes, two workers for walsender processes, two workers for walreceiver processes, and two workers for the arbiter wender and receiver processes. When setting this parameter, remember that other modules may also use backround workers at the same time.
112
+
For example, for a three-node cluster with `max_connections` = 100, `multimaster` may need up to 206 background workers at peak times: 200 workers for connections from the neighbor nodes, two workers for walsender processes, two workers for walreceiver processes, and two workers for the arbiter sender and receiver processes. When setting this parameter, remember that other modules may also use backround workers at the same time.
112
113
113
-
* Add multimaster-specific options:
114
+
* Add `multimaster`-specific options:
114
115
115
-
```
116
+
```postgres
116
117
multimaster.max_nodes = 3 # cluster size
117
118
multimaster.node_id = 1 # the 1-based index of the node in the cluster
# comma-separated list of connection strings to neighbour nodes
120
+
# comma-separated list of connection strings to neighbor nodes
120
121
```
121
122
122
-
> **Important:** The `node_id` variable takes natural numbers starting from 1, without any gaps in numbering. For example, for a cluster of five nodes, set node IDs to 1, 2, 3, 4, and 5. In the `conn_strings` variable, make sure to list the nodes in the order of their IDs. Thus, the `conn_strings` variable must be the same on all nodes.
123
+
> **Important:** The `node_id` variable takes natural numbers starting from 1, without any gaps in numbering. For example, for a cluster of five nodes, set node IDs to 1, 2, 3, 4, and 5. In the `conn_strings` variable, make sure to list the nodes in the order of their IDs. The `conn_strings` variable must be the same on all nodes.
123
124
124
-
Depending on your network environment and usage patterns, you may want to tune other multimaster parameters. For details on all configuration parameters available, see [Tuning configuration params](#tuning-configuration-params).
125
+
Depending on your network environment and usage patterns, you may want to tune other `multimaster` parameters. For details on all configuration parameters available, see [Tuning Configuration Parameters](#tuning-configuration-parameters).
125
126
126
127
1. Allow replication in `pg_hba.conf`:
127
128
@@ -134,121 +135,127 @@ After installing software on all cluster nodes we can configure our cluster. Her
134
135
host replication all node3 trust
135
136
```
136
137
137
-
1. Finally start postgres:
138
+
1. Start PostgreSQL:
138
139
139
140
```
140
141
pg_ctl -D ./datadir -l ./pg.log start
141
142
```
142
143
143
-
1. When postgres is started on all nodes you can connect to any node and create multimaster extention to get acces to monitoring functions:
144
+
1. When PostgreSQL is started on all nodes, connect to any node and create the `multimaster` extension:
144
145
```
145
146
psql -h node1
146
147
> CREATE EXTENSION multimaster;
147
148
```
148
149
149
-
To enshure that everything is working check multimaster view ```mtm.get_cluster_state()```:
150
-
151
-
```
152
-
> select * from mtm.get_cluster_state();
153
-
```
154
-
155
-
Check that liveNodes in this view is equal to allNodes.
156
-
150
+
To ensure that `multimaster` is enabled, check the ```mtm.get_cluster_state()``` view:
151
+
```
152
+
> select * from mtm.get_cluster_state();
153
+
```
154
+
If `liveNodes` is equal to `allNodes`, you cluster is successfully configured and ready to use.
In case of preexisting database setup would be slightly different. Suppose we have running database on server ```node1``` and wan't to turn it to a multimaster by adding two new nodes ```node2``` and ```node3```. Instead of initializing new directory and creating database and user through a temporary launch, one need to initialize new node through pg_basebackup from working node.
159
+
While you can use `multimaster` with the default configuration, you may want to tune several parameters for faster failure detection or more reliable automatic recovery.
161
160
162
-
1. On each new node run:
161
+
### Setting Timeout for Failure Detection
162
+
To check availability of neighbour nodes, `multimaster` periodically sends heartbeat packets to all nodes:
163
163
164
-
```
165
-
pg_basebackup -D ./datadir -h node1 mydb
166
-
```
164
+
* The ```multimaster.heartbeat_send_timeout``` variable defines the time interval between sending the heartbeats. By default, this variable is set to 1000ms.
165
+
* The ```multimaster.heartbeat_recv_timeout``` variable sets the timeout after which If no hearbeats were received during this time, the node is assumed to be disconnected and is excluded from the cluster. By default, this variable is set to 10000 ms.
167
166
168
-
After that configure and atart multimaster from step 3 of previous section. See deteailed description of pg_basebackup options in the official [documentation](https://www.postgresql.org/docs/9.6/static/app-pgbasebackup.html).
167
+
It's good idea to set ```multimaster.heartbeat_send_timeout``` based on typical ping latencies between you nodes. Small recv/send ratio decreases the time of failure detection, but increases the probability of false-positive failure detection. When setting this parameter, take into account the typical packet loss ratio between your cluster nodes.
169
168
169
+
### Configuring Automatic Recovery Parameters
170
170
171
-
## Tuning configuration params
171
+
If one of your node fails, `multimaster` can automatically restore the node based on the WAL logs collected on other cluster nodes. To control the recovery, use the following variables:
172
172
173
-
While multimaster is usable with default configuration, several params may require tuning.
173
+
* ```multimaster.max_recovery_lag``` - sets the maximum size of WAL logs. Upon reaching the ```multimaster.max_recovery_lag``` threshold, WAL logs for the disconnected node are deleted. At this point, automatic recovery is no longer possible. If you need to restore the disconnected node, you have to clone it manually from one of the alive nodes using ```pg_basebackup```.
174
+
* ```multimaster.min_recovery_lag``` - sets the difference between the acceptor and donor nodes. When the disconnected node is fast-forwarded up to the ```multimaster.min_recovery_lag``` threshold, `multimaster` stops all new commits to the alive nodes to allow the node to fully catch up with the rest of the cluster. When the data is fully synchronized, the disconnected node is promoted to the online state, and the cluster resumes its work.
174
175
175
-
* Hearbeat timeouts — multimaster periodically send heartbeat packets to check availability of neighbour nodes. ```multimaster.heartbeat_send_timeout``` defines amount of time between sending heartbeats, while ```multimaster.heartbeat_recv_timeout``` sets amount of time following which node assumed to be disconnected if no hearbeats were received during this time. It's good idea to set ```multimaster.heartbeat_send_timeout``` based on typical ping latencies between you nodes. Small recv/senv ratio decreases time of failure detection, but increases probability of false positive failure detection, so tupical packet loss ratio between nodes should be taken into account.
176
+
By default, ```multimaster.max_recovery_lag``` is set to 1GB. Setting ```multimaster.max_recovery_lag``` to a larger value increases the timeframe for automatic recovery, but requires more disk space for WAL collection.
176
177
177
-
* Min/max recovery lag — when node is disconnected from the cluster other nodes will keep to collect WAL logs for disconnected node until size of WAL log will grow to ```multimaster.max_recovery_lag```. Upon reaching this threshold WAL logs for disconnected node will be deleted, automatic recovery will be no longer possible and disconnected node should be cloned manually from one of alive nodes by ```pg_basebackup```. Increasing ```multimaster.max_recovery_lag``` increases amount of time while automatic recovery is possible, but also increasing maximum disk usage during WAL collection. On the other hand ```multimaster.min_recovery_lag``` sets difference between acceptor and donor nodes before switching ordanary recovery to exclusive mode, when commits on donor node are stopped. This step is necessary to ensure that no new commits will happend during node promotion from recovery state to online state and nodes will be at sync after that.
178
+
## Monitoring Cluster Status
178
179
180
+
`multimaster` provides several views to check the current cluster state.
179
181
180
-
## Monitoring
181
182
182
-
Multimaster provides several views to check current cluster state. To access this functions ```multimaster``` extension should be created explicitely. Run in psql:
183
-
184
-
```sql
185
-
CREATE EXTENSION multimaster;
186
-
```
187
-
188
-
Then it is possible to check nodes specific information via ```mtm.get_nodes_state()```:
183
+
To check node-specific information, use the ```mtm.get_nodes_state()```:
189
184
190
185
```sql
191
186
select * from mtm.get_nodes_state();
192
187
```
193
188
194
-
and status of whole cluster can bee seen through:
189
+
To check the status of the whole cluster, use the ```mtm.get_cluster_state()``` view:
195
190
196
191
```sql
197
192
select * from mtm.get_cluster_state();
198
193
```
199
194
200
-
Read description of all monitoring functions at [functions](doc/functions.md)
195
+
For details on all the returned information, see [functions](doc/functions.md)
196
+
197
+
198
+
## Adding New Nodes to the Cluster
201
199
202
-
## Adding nodes to cluster
200
+
With mulmimaster, you can add or drop cluster nodes without restart. To add a new node, you need to change cluster configuration on alive nodes, load data to the new node using ```pg_basebackup```, and start the node.
203
201
204
-
Mulmimaster is able to add/drop cluster nodes without restart. To add new node one should change cluster configuration on alive nodes, than load data to a new node using ```pg_basebackup``` and start node.
202
+
Suppose we have a working cluster of three nodes, with ```node1```, ```node2```, and ```node3``` domain names. To add ```node4```, follow these steps:
205
203
206
-
Suppose we have working cluster of three nodes (```node1```, ```node2```, ```node3```) and want to add new ```node4``` to the cluster.
204
+
1. Figure out the required connection string that will be used to access the new node. For example, for the database `mydb`, user `myuser`, and the new node `node4`, the connection string is "dbname=mydb user=myuser host=node4".
207
205
208
-
1. First we need to figure out connection string that will be used to access new server. Let's assume that in our case that will be "dbname=mydb user=myuser host=node4". Run in psql connected to any live node:
206
+
1. In `psql` connected to any live node, run:
209
207
210
208
```sql
211
209
select * from mtm.add_node('dbname=mydb user=myuser host=node4');
212
210
```
213
211
214
-
this will change cluster configuration on all nodes and start replication slots for a new node.
212
+
This command changes the cluster configuration on all nodes and starts replication slots for a new node.
215
213
216
-
1. After calling ```mtm.add_node()``` we can copy data from alive node on new node:
214
+
1. Copy all data from one of the alive nodes to the new node:
217
215
218
216
```
219
217
node4> pg_basebackup -D ./datadir -h node1 -x
220
218
```
221
219
222
-
1. ```pg_basebackup``` will copy entire data directory from ```node1``` among with configs. So we need to change ```postgresql.conf``` for ```node4```:
220
+
```pg_basebackup``` copies the entire data directory from ```node1```, together with configuration settings.
221
+
222
+
1. Update ```postgresql.conf``` settings on ```node4```:
After switching onnode will recover recent transaction and change state to ONLINE. Node status can be checked via ```mtm.get_nodes_state()``` view on any cluster node.
236
+
When switched on, the node recovers the recent transaction and changes its state to `online`. Now the cluster is using the new node.
237
237
238
-
1. Now cluster is using new node, but we also should change ```multimaster.conn_strings``` and ```multimaster.max_nodes``` on old nodes to ensure that right configuration will be loaded in case of postgres restart.
238
+
1. Update configuration settings on all cluster nodes to ensure that the right configuration is loaded in case of PostgreSQL restart:
239
+
* Change ```multimaster.conn_strings``` and ```multimaster.max_nodes``` on old nodes
240
+
* Make sure the `pg_hba.conf` files allows replication to the new node.
239
241
242
+
**See Also**
243
+
[Setting up a Multi-Master Cluster](#setting-up-a-multi-master-cluster)
Generally it is okay to just shutdown cluster node but all transaction in cluster will be freezed while other nodes will detect that node is gone (this period is controlled by the ```multimaster.heartbeat_recv_timeout``` parameter). To avoid such situation it is possible to exclude node from cluster in advance.
247
+
## Excluding Nodes from the Cluster
244
248
245
-
For example to stop node 3 run on any other node:
249
+
To exclude a node from the cluster, use the `mtm.stop_node()` function. For example, to exclude node 3, run the following command on any other node:
246
250
247
251
```
248
252
select mtm.stop_node(3);
249
253
```
250
254
251
-
This will disable node 3 on all cluster nodes.
255
+
This disables node 3 on all cluster nodes and stops replication to this node.
256
+
257
+
In general, if you simply shutdown a node, it will be excluded from the cluster as well. However, all transactions in the cluster will be frozen until other nodes detect the node offline status. This timeframe is defined by the ```multimaster.heartbeat_recv_timeout``` parameter.
0 commit comments