Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 13b88d6

Browse files
Liudmila Mantrovakelvich
Liudmila Mantrova
authored andcommitted
Liudmila's review for PGPRO EE
1 parent 59f16a7 commit 13b88d6

File tree

1 file changed

+80
-73
lines changed

1 file changed

+80
-73
lines changed

doc/administration.md

Lines changed: 80 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,17 @@
11
# `Administration`
22

33
1. [Installation](#installation)
4-
1. [Setting up empty cluster](#setting-up-empty-cluster)
5-
1. [Setting up cluster from pre-existing database](#setting-up-cluster-from-pre-existing-database)
6-
1. [Tuning configuration params](#tuning-configuration-params)
7-
1. [Monitoring](#monitoring)
8-
1. [Adding nodes to cluster](#adding-nodes-to-cluster)
9-
1. [Excluding nodes from cluster](#excluding-nodes-from-cluster)
4+
1. [Setting up a Multi-Master Cluster](#setting-up-a-multi-master-cluster)
5+
1. [Tuning Configuration Parameterss](#tuning-configuration-parameters)
6+
1. [Monitoring Cluster Status](#monitoring-cluster-status)
7+
1. [Adding New Nodes to the Cluster](#adding-nodes-to-cluster)
8+
1. [Excluding Nodes from the Cluster](#excluding-nodes-from-cluster)
109

1110

1211

1312
## Installation
1413

15-
Multi-master consist of patched version of postgres and extension mmts, that provides most of the functionality, but depends on several modifications to postgres core.
14+
Multi-master consists of a patched version of PostgreSQL and the `mmts` extension that provide most of the functionality, but depend on several modifications to PostgreSQL core.
1615

1716

1817
### Building multimaster from Source Code
@@ -53,45 +52,47 @@ In this command, ```./configure``` is a standard PostgreSQL autotools script, so
5352

5453
### Docker
5554

56-
Directory contrib/mmts also includes docker-compose.yml that is capable of building multi-master and starting 3 node cluster.
55+
The contrib/mmts directory also includes docker-compose.yml that is capable of building `multimaster` and starting a three-node cluster.
5756

58-
First of all we need to build PGPro EE docker image (We will remove this step when we'll merge _MULTIMASTER branch and start building packages). In the repo root run:
57+
First of all we need to build PGPro EE docker image (We will remove this step when we'll merge _MULTIMASTER branch and start building packages). In the repo root, run:
5958
```
6059
docker build -t pgproent
6160
```
6261

63-
Then following command will start cluster of 3 docker nodes listening on port 15432, 15433 and 15434 (edit docker-compose.yml to change start params):
62+
Then following command will start a cluster of 3 docker nodes listening on ports 15432, 15433 and 15434 (edit docker-compose.yml to change start params):
6463
```
6564
cd contrib/mmts
6665
docker-compose up
6766
```
6867

6968
### PgPro packages
7069

71-
PostgresPro Enterprise have all necessary dependencies and extensions included, so it is enough just install packages.
70+
To use `multimaster`, you need to install Postgres Pro Enterprise on all nodes of your cluster. Postgres Pro Enterprise includes all the required dependencies and extensions.
7271

7372

74-
## Setting up empty cluster
73+
## Setting up a Multi-Master Cluster
7574

76-
After installing software on all cluster nodes we can configure our cluster. Here we describe how to set up multimaster consisting of 3 nodes with empty database. Suppose our nodes accesible via domain names ```node1```, ```node2``` and ```node3```. Perform following steps on each node (sequentially or in parallel – doesn't matter):
75+
After installing Postgres Pro Enterprise on all nodes, you need to configure the cluster with `multimaster`. Suppose you are setting up a cluster of three nodes, with ```node1```, ```node2```, and ```node3``` domain names.
76+
To configure your cluster with `multimaster`, complete these steps on each cluster node:
7777

78-
1. As with usual postgres first of all we need to initialize directiory where postgres will store it files:
79-
```
80-
initdb -D ./datadir
81-
```
82-
In that directory we are interested in files ```postgresql.conf``` and ```pg_hba.conf``` that are responsible for a general and security configuration consequently.
83-
84-
1. Create database that will be used with multimaster. This will require intermediate launch of postgres.
78+
1. Set up the database to be replicated with `multimaster`:
8579

80+
* If you are starting from scratch, initialize a cluster, create an empty database `mydb` and a new user `myuser`, as usual:
8681
```
82+
initdb -D ./datadir
8783
pg_ctl -D ./datadir -l ./pg.log start
8884
createdb myuser -h localhost
8985
createdb mydb -O myuser -h localhost
9086
pg_ctl -D ./datadir -l ./pg.log stop
9187
```
9288
93-
1. Modify the ```postgresql.conf``` configuration file, as follows:
89+
* If you already have a database `mydb` running on the `node1` server, initialize new nodes from the working node using `pg_basebackup`. On each cluster node you are going to add, run:
90+
```
91+
pg_basebackup -D ./datadir -h node1 mydb
92+
```
93+
For details, on `pg_basebackup`, see [pg_basebackup](https://www.postgresql.org/docs/9.6/static/app-pgbasebackup.html).
9494
95+
1. Modify the ```postgresql.conf``` configuration file, as follows:
9596
* Set up PostgreSQL parameters related to replication.
9697
9798
```
@@ -101,27 +102,27 @@ After installing software on all cluster nodes we can configure our cluster. Her
101102
max_wal_senders = 10 # at least the number of nodes
102103
max_replication_slots = 10 # at least the number of nodes
103104
```
104-
You must change the replication level to `logical` as multimaster relies on logical replication. For a cluster of N nodes,and enable at least N WAL sender processes and replication slots. Since multimaster implicitly adds a PREPARE phase to each COMMIT transaction, make sure to set the number of prepared transactions to N*max_connections. Otherwise, prepared transactions may be queued.
105+
You must change the replication level to `logical` as `multimaster` relies on logical replication. For a cluster of N nodes, enable at least N WAL sender processes and replication slots. Since `multimaster` implicitly adds a `PREPARE` phase to each `COMMIT` transaction, make sure to set the number of prepared transactions to N*max_connections. Otherwise, prepared transactions may be queued.
105106
106107
* Make sure you have enough background workers allocated for each node:
107108
108109
```
109110
max_worker_processes = 250
110111
```
111-
For example, for a three-node cluster with max_connections = 100, multimaster may need up to 206 background workers at peak times: 200 workers for connections from the neighbour nodes, two workers for walsender processes, two workers for walreceiver processes, and two workers for the arbiter wender and receiver processes. When setting this parameter, remember that other modules may also use backround workers at the same time.
112+
For example, for a three-node cluster with `max_connections` = 100, `multimaster` may need up to 206 background workers at peak times: 200 workers for connections from the neighbor nodes, two workers for walsender processes, two workers for walreceiver processes, and two workers for the arbiter sender and receiver processes. When setting this parameter, remember that other modules may also use backround workers at the same time.
112113
113-
* Add multimaster-specific options:
114+
* Add `multimaster`-specific options:
114115
115-
```
116+
```postgres
116117
multimaster.max_nodes = 3 # cluster size
117118
multimaster.node_id = 1 # the 1-based index of the node in the cluster
118119
multimaster.conn_strings = 'dbname=mydb user=myuser host=node1, dbname=mydb user=myuser host=node2, dbname=mydb user=myuser host=node3'
119-
# comma-separated list of connection strings to neighbour nodes
120+
# comma-separated list of connection strings to neighbor nodes
120121
```
121122
122-
> **Important:** The `node_id` variable takes natural numbers starting from 1, without any gaps in numbering. For example, for a cluster of five nodes, set node IDs to 1, 2, 3, 4, and 5. In the `conn_strings` variable, make sure to list the nodes in the order of their IDs. Thus, the `conn_strings` variable must be the same on all nodes.
123+
> **Important:** The `node_id` variable takes natural numbers starting from 1, without any gaps in numbering. For example, for a cluster of five nodes, set node IDs to 1, 2, 3, 4, and 5. In the `conn_strings` variable, make sure to list the nodes in the order of their IDs. The `conn_strings` variable must be the same on all nodes.
123124
124-
Depending on your network environment and usage patterns, you may want to tune other multimaster parameters. For details on all configuration parameters available, see [Tuning configuration params](#tuning-configuration-params).
125+
Depending on your network environment and usage patterns, you may want to tune other `multimaster` parameters. For details on all configuration parameters available, see [Tuning Configuration Parameters](#tuning-configuration-parameters).
125126
126127
1. Allow replication in `pg_hba.conf`:
127128
@@ -134,121 +135,127 @@ After installing software on all cluster nodes we can configure our cluster. Her
134135
host replication all node3 trust
135136
```
136137
137-
1. Finally start postgres:
138+
1. Start PostgreSQL:
138139
139140
```
140141
pg_ctl -D ./datadir -l ./pg.log start
141142
```
142143
143-
1. When postgres is started on all nodes you can connect to any node and create multimaster extention to get acces to monitoring functions:
144+
1. When PostgreSQL is started on all nodes, connect to any node and create the `multimaster` extension:
144145
```
145146
psql -h node1
146147
> CREATE EXTENSION multimaster;
147148
```
148149
149-
To enshure that everything is working check multimaster view ```mtm.get_cluster_state()```:
150-
151-
```
152-
> select * from mtm.get_cluster_state();
153-
```
154-
155-
Check that liveNodes in this view is equal to allNodes.
156-
150+
To ensure that `multimaster` is enabled, check the ```mtm.get_cluster_state()``` view:
151+
```
152+
> select * from mtm.get_cluster_state();
153+
```
154+
If `liveNodes` is equal to `allNodes`, you cluster is successfully configured and ready to use.
155+
See Also
156+
[Tuning Configuration Parameters](#tuning-configuration-parameters)
157157
158-
## Setting up cluster from pre-existing database
159158
160-
In case of preexisting database setup would be slightly different. Suppose we have running database on server ```node1``` and wan't to turn it to a multimaster by adding two new nodes ```node2``` and ```node3```. Instead of initializing new directory and creating database and user through a temporary launch, one need to initialize new node through pg_basebackup from working node.
159+
While you can use `multimaster` with the default configuration, you may want to tune several parameters for faster failure detection or more reliable automatic recovery.
161160
162-
1. On each new node run:
161+
### Setting Timeout for Failure Detection
162+
To check availability of neighbour nodes, `multimaster` periodically sends heartbeat packets to all nodes:
163163
164-
```
165-
pg_basebackup -D ./datadir -h node1 mydb
166-
```
164+
* The ```multimaster.heartbeat_send_timeout``` variable defines the time interval between sending the heartbeats. By default, this variable is set to 1000ms.
165+
* The ```multimaster.heartbeat_recv_timeout``` variable sets the timeout after which If no hearbeats were received during this time, the node is assumed to be disconnected and is excluded from the cluster. By default, this variable is set to 10000 ms.
167166
168-
After that configure and atart multimaster from step 3 of previous section. See deteailed description of pg_basebackup options in the official [documentation](https://www.postgresql.org/docs/9.6/static/app-pgbasebackup.html).
167+
It's good idea to set ```multimaster.heartbeat_send_timeout``` based on typical ping latencies between you nodes. Small recv/send ratio decreases the time of failure detection, but increases the probability of false-positive failure detection. When setting this parameter, take into account the typical packet loss ratio between your cluster nodes.
169168
169+
### Configuring Automatic Recovery Parameters
170170
171-
## Tuning configuration params
171+
If one of your node fails, `multimaster` can automatically restore the node based on the WAL logs collected on other cluster nodes. To control the recovery, use the following variables:
172172
173-
While multimaster is usable with default configuration, several params may require tuning.
173+
* ```multimaster.max_recovery_lag``` - sets the maximum size of WAL logs. Upon reaching the ```multimaster.max_recovery_lag``` threshold, WAL logs for the disconnected node are deleted. At this point, automatic recovery is no longer possible. If you need to restore the disconnected node, you have to clone it manually from one of the alive nodes using ```pg_basebackup```.
174+
* ```multimaster.min_recovery_lag``` - sets the difference between the acceptor and donor nodes. When the disconnected node is fast-forwarded up to the ```multimaster.min_recovery_lag``` threshold, `multimaster` stops all new commits to the alive nodes to allow the node to fully catch up with the rest of the cluster. When the data is fully synchronized, the disconnected node is promoted to the online state, and the cluster resumes its work.
174175
175-
* Hearbeat timeouts — multimaster periodically send heartbeat packets to check availability of neighbour nodes. ```multimaster.heartbeat_send_timeout``` defines amount of time between sending heartbeats, while ```multimaster.heartbeat_recv_timeout``` sets amount of time following which node assumed to be disconnected if no hearbeats were received during this time. It's good idea to set ```multimaster.heartbeat_send_timeout``` based on typical ping latencies between you nodes. Small recv/senv ratio decreases time of failure detection, but increases probability of false positive failure detection, so tupical packet loss ratio between nodes should be taken into account.
176+
By default, ```multimaster.max_recovery_lag``` is set to 1GB. Setting ```multimaster.max_recovery_lag``` to a larger value increases the timeframe for automatic recovery, but requires more disk space for WAL collection.
176177
177-
* Min/max recovery lag — when node is disconnected from the cluster other nodes will keep to collect WAL logs for disconnected node until size of WAL log will grow to ```multimaster.max_recovery_lag```. Upon reaching this threshold WAL logs for disconnected node will be deleted, automatic recovery will be no longer possible and disconnected node should be cloned manually from one of alive nodes by ```pg_basebackup```. Increasing ```multimaster.max_recovery_lag``` increases amount of time while automatic recovery is possible, but also increasing maximum disk usage during WAL collection. On the other hand ```multimaster.min_recovery_lag``` sets difference between acceptor and donor nodes before switching ordanary recovery to exclusive mode, when commits on donor node are stopped. This step is necessary to ensure that no new commits will happend during node promotion from recovery state to online state and nodes will be at sync after that.
178+
## Monitoring Cluster Status
178179
180+
`multimaster` provides several views to check the current cluster state.
179181
180-
## Monitoring
181182
182-
Multimaster provides several views to check current cluster state. To access this functions ```multimaster``` extension should be created explicitely. Run in psql:
183-
184-
```sql
185-
CREATE EXTENSION multimaster;
186-
```
187-
188-
Then it is possible to check nodes specific information via ```mtm.get_nodes_state()```:
183+
To check node-specific information, use the ```mtm.get_nodes_state()```:
189184
190185
```sql
191186
select * from mtm.get_nodes_state();
192187
```
193188
194-
and status of whole cluster can bee seen through:
189+
To check the status of the whole cluster, use the ```mtm.get_cluster_state()``` view:
195190
196191
```sql
197192
select * from mtm.get_cluster_state();
198193
```
199194
200-
Read description of all monitoring functions at [functions](doc/functions.md)
195+
For details on all the returned information, see [functions](doc/functions.md)
196+
197+
198+
## Adding New Nodes to the Cluster
201199
202-
## Adding nodes to cluster
200+
With mulmimaster, you can add or drop cluster nodes without restart. To add a new node, you need to change cluster configuration on alive nodes, load data to the new node using ```pg_basebackup```, and start the node.
203201
204-
Mulmimaster is able to add/drop cluster nodes without restart. To add new node one should change cluster configuration on alive nodes, than load data to a new node using ```pg_basebackup``` and start node.
202+
Suppose we have a working cluster of three nodes, with ```node1```, ```node2```, and ```node3``` domain names. To add ```node4```, follow these steps:
205203
206-
Suppose we have working cluster of three nodes (```node1```, ```node2```, ```node3```) and want to add new ```node4``` to the cluster.
204+
1. Figure out the required connection string that will be used to access the new node. For example, for the database `mydb`, user `myuser`, and the new node `node4`, the connection string is "dbname=mydb user=myuser host=node4".
207205
208-
1. First we need to figure out connection string that will be used to access new server. Let's assume that in our case that will be "dbname=mydb user=myuser host=node4". Run in psql connected to any live node:
206+
1. In `psql` connected to any live node, run:
209207
210208
```sql
211209
select * from mtm.add_node('dbname=mydb user=myuser host=node4');
212210
```
213211
214-
this will change cluster configuration on all nodes and start replication slots for a new node.
212+
This command changes the cluster configuration on all nodes and starts replication slots for a new node.
215213
216-
1. After calling ```mtm.add_node()``` we can copy data from alive node on new node:
214+
1. Copy all data from one of the alive nodes to the new node:
217215
218216
```
219217
node4> pg_basebackup -D ./datadir -h node1 -x
220218
```
221219
222-
1. ```pg_basebackup``` will copy entire data directory from ```node1``` among with configs. So we need to change ```postgresql.conf``` for ```node4```:
220+
```pg_basebackup``` copies the entire data directory from ```node1```, together with configuration settings.
221+
222+
1. Update ```postgresql.conf``` settings on ```node4```:
223223
224224
```
225225
multimaster.max_nodes = 4
226226
multimaster.node_id = 4
227227
multimaster.conn_strings = 'dbname=mydb user=myuser host=node1, dbname=mydb user=myuser host=node2, dbname=mydb user=myuser host=node3, dbname=mydb user=myuser host=node4'
228228
```
229229
230-
1. Now we can just start postgres on new node:
230+
1. Start PostgreSQL on the new node:
231231
232232
```
233233
node4> pg_ctl -D ./datadir -l ./pg.log start
234234
```
235235
236-
After switching on node will recover recent transaction and change state to ONLINE. Node status can be checked via ```mtm.get_nodes_state()``` view on any cluster node.
236+
When switched on, the node recovers the recent transaction and changes its state to `online`. Now the cluster is using the new node.
237237
238-
1. Now cluster is using new node, but we also should change ```multimaster.conn_strings``` and ```multimaster.max_nodes``` on old nodes to ensure that right configuration will be loaded in case of postgres restart.
238+
1. Update configuration settings on all cluster nodes to ensure that the right configuration is loaded in case of PostgreSQL restart:
239+
* Change ```multimaster.conn_strings``` and ```multimaster.max_nodes``` on old nodes
240+
* Make sure the `pg_hba.conf` files allows replication to the new node.
239241
242+
**See Also**
243+
[Setting up a Multi-Master Cluster](#setting-up-a-multi-master-cluster)
244+
[Monitoring Cluster Status](#monitoring-cluster-status)
240245
241-
## Excluding nodes from cluster
242246
243-
Generally it is okay to just shutdown cluster node but all transaction in cluster will be freezed while other nodes will detect that node is gone (this period is controlled by the ```multimaster.heartbeat_recv_timeout``` parameter). To avoid such situation it is possible to exclude node from cluster in advance.
247+
## Excluding Nodes from the Cluster
244248
245-
For example to stop node 3 run on any other node:
249+
To exclude a node from the cluster, use the `mtm.stop_node()` function. For example, to exclude node 3, run the following command on any other node:
246250
247251
```
248252
select mtm.stop_node(3);
249253
```
250254
251-
This will disable node 3 on all cluster nodes.
255+
This disables node 3 on all cluster nodes and stops replication to this node.
256+
257+
In general, if you simply shutdown a node, it will be excluded from the cluster as well. However, all transactions in the cluster will be frozen until other nodes detect the node offline status. This timeframe is defined by the ```multimaster.heartbeat_recv_timeout``` parameter.
258+
252259
253260
254261

0 commit comments

Comments
 (0)