You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/administration.md
+63-7Lines changed: 63 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -169,21 +169,77 @@ After that configure and atart multimaster from step 3 of previous section. See
169
169
170
170
## Tuning configuration params
171
171
172
-
While multimaster is usable with default configuration optins several params may require tuning.
172
+
While multimaster is usable with default configuration, several params may require tuning.
173
173
174
-
* Hearbeat timeouts — multimaster periodically send heartbeat packets to check availability of neighbour nodes. ```multimaster.heartbeat_send_timeout``` defines amount of time between sending heartbeats, while ```multimaster.heartbeat_recv_timeout``` sets amount of time following which node assumed to be disconnected if no hearbeats were received during this time. It's good idea to set ```multimaster.heartbeat_send_timeout``` based on typical ping latencies between you nodes. Small recv/senv ratio decraeases time of failure detection, but increases probability of false positive failure detection, so tupical packet loss ratio between nodes should be taken into account.
174
+
* Hearbeat timeouts — multimaster periodically send heartbeat packets to check availability of neighbour nodes. ```multimaster.heartbeat_send_timeout``` defines amount of time between sending heartbeats, while ```multimaster.heartbeat_recv_timeout``` sets amount of time following which node assumed to be disconnected if no hearbeats were received during this time. It's good idea to set ```multimaster.heartbeat_send_timeout``` based on typical ping latencies between you nodes. Small recv/senv ratio decreases time of failure detection, but increases probability of false positive failure detection, so tupical packet loss ratio between nodes should be taken into account.
175
175
176
-
* Min/max recovery lag — when node is disconnected from the cluster other nodes will keep to collect WAL logs for disconnected node until size of WAL log will grow to ```multimaster.max_recovery_lag```. Upon reaching this threshold WAL logs for disconnected node will be deleted, automatic recovery will be no longer possible and disconnected node should be cloned manually from one of alive node by ```pg_basebackup```. Increasing ```multimaster.max_recovery_lag``` increases amount of time while automatic recovery is possible, but also increasing maximum disk usage during WAL collection. On the other hand ```multimaster.min_recovery_lag``` sets difference between acceptor and donor nodes before switching ordanary recovery to exclusive mode, when commits on donor node are stopped. This step is necessary to ensure that no new commits will happend during node promotion from recovery state to online state and nodes will be at sync after that.
176
+
* Min/max recovery lag — when node is disconnected from the cluster other nodes will keep to collect WAL logs for disconnected node until size of WAL log will grow to ```multimaster.max_recovery_lag```. Upon reaching this threshold WAL logs for disconnected node will be deleted, automatic recovery will be no longer possible and disconnected node should be cloned manually from one of alive nodes by ```pg_basebackup```. Increasing ```multimaster.max_recovery_lag``` increases amount of time while automatic recovery is possible, but also increasing maximum disk usage during WAL collection. On the other hand ```multimaster.min_recovery_lag``` sets difference between acceptor and donor nodes before switching ordanary recovery to exclusive mode, when commits on donor node are stopped. This step is necessary to ensure that no new commits will happend during node promotion from recovery state to online state and nodes will be at sync after that.
177
177
178
178
179
179
## Monitoring
180
180
181
-
* `mtm.get_nodes_state()` -- show status of nodes on cluster
182
-
* `mtm.get_cluster_state()` -- show whole cluster status
183
-
* `mtm.get_cluster_info()` -- print some debug info
181
+
Multimaster provides several views to check current cluster state. To access this functions ```multimaster``` extension should be created explicitely. Run in psql:
184
182
185
-
Read description of all management functions at [functions](doc/functions.md)
183
+
```sql
184
+
CREATE EXTENSION multimaster;
185
+
```
186
+
187
+
Then it is possible to check nodes specific information via ```mtm.get_nodes_state()```:
188
+
189
+
```sql
190
+
select * from mtm.get_nodes_state();
191
+
```
192
+
193
+
and status of whole cluster can bee seen through:
194
+
195
+
```sql
196
+
select * from mtm.get_cluster_state();
197
+
```
198
+
199
+
Read description of all monitoring functions at [functions](doc/functions.md)
186
200
187
201
## Adding nodes to cluster
202
+
203
+
Mulmimaster is able to add/drop cluster nodes without restart. To add new node one should change cluster configuration on alive nodes, than load data to a new node using ```pg_basebackup``` and start node.
204
+
205
+
Suppose we have working cluster of three nodes (```node1```, ```node2```, ```node3```) and want to add new ```node4``` to the cluster.
206
+
207
+
1. First we need to figure out connection string that will be used to access new server. Let's assume that in our case that will be "dbname=mydb user=myuser host=node4". Run in psql connected to any live node:
208
+
209
+
```sql
210
+
select * from mtm.add_node('dbname=mydb user=myuser host=node4');
211
+
```
212
+
213
+
this will change cluster configuration on all nodes and start replication slots for a new node.
214
+
215
+
1. After calling ```mtm.add_node()``` we can copy data from alive node on new node:
216
+
217
+
```
218
+
node4> pg_basebackup -D ./datadir -h node1 -x
219
+
```
220
+
221
+
1. ```pg_basebackup``` will copy entire data directory from ```node1``` among with configs. So we need to change ```postgresql.conf``` for ```node4```:
After switching on node will recover recent transaction and change state to ONLINE. Node status can be checked via ```mtm.get_nodes_state()``` view on any cluster node.
236
+
237
+
1. Now cluster is using new node, but we also should change ```multimaster.conn_strings``` and ```multimaster.max_nodes``` on old nodes to ensure that right configuration will be loaded in case of postgres restart.
0 commit comments