Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 6b8e8fe

Browse files
committed
mmts readme to main repo #24
1 parent 2739552 commit 6b8e8fe

File tree

1 file changed

+181
-26
lines changed

1 file changed

+181
-26
lines changed

README.md

Lines changed: 181 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,197 @@
1-
# postgres_cluster
1+
# `Postgresql multi-master`
22

3-
[![Build Status](https://travis-ci.org/postgrespro/postgres_cluster.svg?branch=master)](https://travis-ci.org/postgrespro/postgres_cluster)
3+
Multi-master is an extension and set of patches to a Postegres database, that turns Postgres into a
4+
synchronous shared-nothing cluster to provide OLTP scalability and high availability with automatic
5+
disaster recovery.
46

5-
Various experiments with PostgreSQL clustering perfomed at PostgresPro.
67

7-
This is a mirror of postgres repo with several changes to the core and a few extra extensions.
88

9-
## Core changes:
9+
## Overview
1010

11-
* Transaction manager interface (eXtensible Transaction Manager, xtm). Generic interface to plug distributed transaction engines. More info on [postgres wiki](https://wiki.postgresql.org/wiki/DTM) and on [the email thread](http://www.postgresql.org/message-id/flat/F2766B97-555D-424F-B29F-E0CA0F6D1D74@postgrespro.ru).
11+
Multi-master replicates same database to all nodes in cluster and allows writes to each node. Transaction
12+
isolation is enforced cluster-wide, so in case of concurrent updates on different nodes database will use the
13+
same conflict resolution rules (mvcc with repeatable read isolation level) as single node uses for concurrent
14+
backends and always stays in consistent state. Any writing transaction will write to all nodes, hence increasing
15+
commit latency for amount of time proportional to roundtrip between nodes nedded for synchronization. Read only
16+
transactions and queries executed locally without measurable overhead. Replication mechanism itself based on
17+
logical decoding and earlier version of pglogical extension provided for community by 2ndQuadrant team.
18+
19+
Several changes was made in postgres core to implement mentioned functionality:
20+
* Transaction manager API. (eXtensible Transaction Manager, xtm). Generic interface to plug distributed
21+
transaction engines. More info on [postgres wiki](https://wiki.postgresql.org/wiki/DTM) and
22+
on [the email thread](http://www.postgresql.org/message-id/flat/F2766B97-555D-424F-B29F-E0CA0F6D1D74@postgrespro.ru).
1223
* Distributed deadlock detection API.
1324
* Logical decoding of transactions.
1425

15-
## New extensions:
26+
Cluster consisting of N nodes can continue to work while majority of initial nodes are alive and reachable by
27+
other nodes. This is done by using 3 phase commit protocol and heartbeats for failure discovery. Node that is
28+
brought back to cluster can be fast-forwaded to actual state automatically in case when transactions log still
29+
exists since the time when node was excluded from cluster (this depends on checkpoint configuration in postgres).
30+
31+
Read more about internals on [Architechture](/Architechture) page.
32+
33+
34+
35+
## Features
36+
37+
* Cluster-wide transaction isolation
38+
* Synchronous logical replication
39+
* DDL Replication
40+
* Distributed sequences
41+
* Fault tolerance
42+
* Automatic node recovery
43+
44+
45+
46+
## Limitations
47+
48+
* Commit latency.
49+
Current implementation of logical replication sends data to subscriber nodes only after local commit, so in case of
50+
heavy-write transaction user will wait for transaction processing two times: on local node and al other nodes
51+
(simultaneosly). We have plans to address this issue in future.
52+
53+
* DDL replication.
54+
While data is replicated on logical level, DDL replicated by statements performing distributed commit with the same
55+
statement. Some complex DDL scenarious including stored procedures and temp temp tables aren't working properly. We
56+
are working right now on proving full compatibility with ordinary postgres. Currently we are passing 141 of 164
57+
postgres regression tests.
58+
59+
* Isolation level.
60+
Multimaster currently support only _repeatable_ _read_ isolation level. This is stricter than default _read_commited_,
61+
but also increases probability of serialization failure during commit. _Serializable_ level isn't supported yet.
62+
63+
* One database per cluster.
64+
65+
66+
67+
## Installation
68+
69+
(Existing db?)
70+
71+
Multi-master consist of patched version of postgres and extension mmts, that provides most of functionality, but
72+
doesn't requiere changes to postgres core. To run multimaster one need to install postgres and several extensions
73+
to all nodes in cluster.
74+
75+
### Sources
76+
77+
Ensure that following prerequisites are installed:
78+
79+
debian based linux:
80+
81+
```sh
82+
apt-get install -y git make gcc libreadline-dev bison flex zlib1g-dev
83+
```
84+
85+
red hat based linux:
86+
87+
```sh
88+
yum groupinstall 'Development Tools'
89+
yum install git, automake, libtool, bison, flex readline-devel
90+
```
91+
92+
After that everything is ready to install postgres along with extensions
93+
94+
```sh
95+
git clone https://github.com/postgrespro/postgres_cluster.git
96+
cd postgres_cluster
97+
./configure && make && make -j 4 install
98+
cd ./contrib/raftable && make install
99+
cd ../../contrib/mmts && make install
100+
```
101+
102+
### Docker
103+
104+
Directort contrib/mmts also includes Dockerfile that is capable of building multi-master and starting 3 node cluster.
105+
106+
```sh
107+
cd contrib/mmts
108+
docker-compose build
109+
docker-compose up
110+
```
111+
112+
### PgPro packages
113+
114+
After things go more stable we will release prebuilt packages for major platforms.
115+
116+
117+
118+
## Configuration
119+
120+
1. Add these required options to the `postgresql.conf` of each instance in the cluster.
121+
122+
```sh
123+
max_prepared_transactions = 200 # should be > 0, because all
124+
# transactions are implicitly two-phase
125+
max_connections = 200
126+
max_worker_processes = 100 # at least (2 * n + p + 1)
127+
# this figure is calculated as:
128+
# 1 raftable worker
129+
# n-1 receiver
130+
# n-1 sender
131+
# 1 mtm-sender
132+
# 1 mtm-receiver
133+
# p workers in the pool
134+
max_parallel_degree = 0
135+
wal_level = logical # multimaster is build on top of
136+
# logical replication and will not work otherwise
137+
max_wal_senders = 10 # at least the number of nodes
138+
wal_sender_timeout = 0
139+
default_transaction_isolation = 'repeatable read'
140+
max_replication_slots = 10 # at least the number of nodes
141+
shared_preload_libraries = 'raftable,multimaster'
142+
multimaster.workers = 10
143+
multimaster.queue_size = 10485760 # 10mb
144+
multimaster.node_id = 1 # the 1-based index of the node in the cluster
145+
multimaster.conn_strings = 'dbname=... host=....0.0.1 port=... raftport=..., ...'
146+
# comma-separated list of connection strings
147+
multimaster.use_raftable = true
148+
multimaster.heartbeat_recv_timeout = 1000
149+
multimaster.heartbeat_send_timeout = 250
150+
multimaster.ignore_tables_without_pk = true
151+
multimaster.twopc_min_timeout = 2000
152+
```
153+
1. Allow replication in `pg_hba.conf`.
154+
155+
(link to full doc on config params)
156+
157+
## Management
158+
159+
`create extension mmts;` to gain access to these functions:
160+
161+
* `mtm.get_nodes_state()` -- show status of nodes on cluster
162+
* `mtm.get_cluster_state()` -- show whole cluster status
163+
* `mtm.get_cluster_info()` -- print some debug info
164+
* `mtm.make_table_local(relation regclass)` -- stop replication for a given table
165+
166+
(link to full doc on functions)
167+
168+
169+
170+
171+
## Tests
172+
173+
### Performance
174+
175+
(Show TPC-C here on 3 nodes)
176+
177+
### Fault tolerance
178+
179+
(Link to test/failure matrix)
16180

17-
The following table describes the features and the way they are implemented in our four main extensions:
181+
### Postgres compatibility
18182

19-
| |commit timestamps |snapshot sharing |
20-
|---------------------------:|:----------------------------:|:----------------------------------:|
21-
|**distributed transactions**|[`pg_tsdtm`](contrib/pg_tsdtm)|[`pg_dtm`](contrib/pg_dtm) |
22-
|**multimaster replication** |[`mmts`](contrib/mmts) |[`multimaster`](contrib/multimaster)|
183+
Regression: 141 of 164
184+
Isolation: n/a
23185

24-
### [`mmts`](contrib/mmts)
25-
An implementation of synchronous **multi-master replication** based on **commit timestamps**.
186+
To run tests:
187+
* `make -C contrib/mmts check` to run TAP-tests.
188+
* `make -C contrib/mmts xcheck` to run blockade tests. The blockade tests require `docker`, `blockade`, and some other packages installed, see [requirements.txt](tests2/requirements.txt) for the list. You might also want to gain superuser privileges to run these tests successfully.
26189

27-
### [`multimaster`](contrib/multimaster)
28-
An implementation of synchronous **multi-master replication** based on **snapshot sharing**.
29190

30-
### [`pg_dtm`](contrib/pg_dtm)
31-
An implementation of **distributed transaction** management based on **snapshot sharing**.
191+
docs:
32192

33-
### [`pg_tsdtm`](contrib/pg_tsdtm)
34-
An implementation of **distributed transaction** management based on **commit timestamps**.
193+
## Architechture
35194

36-
### [`arbiter`](contrib/arbiter)
37-
A distributed transaction management daemon.
38-
Used by `pg_dtm` and `multimaster`.
195+
## Configuration params
39196

40-
### [`raftable`](contrib/raftable)
41-
A key-value table replicated over Raft protocol.
42-
Used by `mmts`.
197+
## Management functions

0 commit comments

Comments
 (0)