Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit c275885

Browse files
author
Liudmila Mantrova
committed
DOC: reviewed the latest changes in pg_shardman docs
1 parent a87aa22 commit c275885

File tree

2 files changed

+86
-74
lines changed

2 files changed

+86
-74
lines changed

doc/src/sgml/pg_shardman.sgml

Lines changed: 83 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -11,18 +11,18 @@
1111
that enables sharding — splitting large tables into separate partitions, or shards,
1212
distributed between different servers. This extension aims for
1313
scalability and fault tolerance with ACID transactions support, targeting mainly
14-
<acronym>OLTP</acronym> workloads. <filename>pg_shardman</filename>, supports:
14+
<acronym>OLTP</acronym> workloads. <filename>pg_shardman</filename> offers the following features:
1515
<itemizedlist spacing="compact">
1616
<listitem>
1717
<para>
1818
Splitting tables into shards using hash partitioning provided
19-
by <link linkend="pg-pathman">pg_pathman</link> and move the shards
19+
by <link linkend="pg-pathman">pg_pathman</link> and moving the shards
2020
across cluster nodes to balance read/write load.
2121
</para>
2222
</listitem>
2323
<listitem>
2424
<para>
25-
Running read-write queries on any node, regardless of the actual data
25+
Running read/write queries on any node, regardless of the actual data
2626
location. Queries will be automatically redirected to the node
2727
holding the required data via <xref linkend="postgres-fdw">.
2828
</para>
@@ -33,24 +33,24 @@
3333
number of replicas to create for each
3434
shard. <filename>pg_shardman</filename> leverages logical
3535
replication to keep replicas up-to-date. Synchronous and
36-
asynchronous replication is supported.
36+
asynchronous replication types are supported.
3737
</para>
3838
</listitem>
3939
<listitem>
4040
<para>
41-
Atomic modification of data kept on multiple nodes in one
41+
Atomic modification of data stored on multiple nodes in a single
4242
transaction using 2PC commit.
4343
</para>
4444
</listitem>
4545
<listitem>
4646
<para>
47-
Cluster-wide REPEATABLE READ (as in Postgres) transaction isolation
48-
level.
47+
Support for cluster-wide <literal>Repeatable Read</literal>
48+
transaction isolation level.
4949
</para>
5050
</listitem>
5151
<listitem>
5252
<para>
53-
Manual failover with replica promotion
53+
Manual failover with replica promotion.
5454
</para>
5555
</listitem>
5656
</itemizedlist>
@@ -76,8 +76,9 @@
7676
</listitem>
7777
<listitem>
7878
<para>
79-
DDL support is very limited, see below. DCL support is basically
80-
absent, only one user is supported.
79+
<filename>pg_shardman</filename> provides only limited DDL support,
80+
as explained in <xref linkend="working-with-cluster-data">. DCL support is virtually
81+
absent, with only one user being supported.
8182
</para>
8283
</listitem>
8384
<listitem>
@@ -103,8 +104,8 @@
103104
</listitem>
104105
<listitem>
105106
<para>
106-
Two-phase commit doesn't involve replicas; data inconsistencies are
107-
possible if a node fails permanently.
107+
Two-phase commit does not involve replicas, so a permanent node failure
108+
may result in data inconsistencies.
108109
</para>
109110
</listitem>
110111
<listitem>
@@ -158,50 +159,58 @@
158159
<productname>&productname;</productname> as usual. However, to handle
159160
distributed transactions, <filename>pg_shardman</filename> relies on
160161
two-phase commit (<acronym>2PC</acronym>) protocol for atomicity and
161-
Clock-SI algorithm for transaction isolation, which are enabled with the
162-
following GUC variables:
162+
Clock-SI algorithm for transaction isolation.
163163
</para>
164164

165-
<itemizedlist>
166-
<listitem>
167165
<para>
168-
<varname>postgres_fdw.use_twophase</varname> variable turns on
169-
<acronym>2PC</acronym> support. With <acronym>2PC</acronym> enabled,
170-
each transaction will be <firstterm>prepared</firstterm> on all
171-
nodes before the commit. A successful <literal>PREPARE</literal>
172-
on the node means that this node is ready to commit the transaction,
173-
but it will only be committed if all other nodes can commit this
174-
transaction as well. Otherwise, the transaction will be aborted.
175-
This approach allows to ensure transaction atomicity across multiple nodes.
176-
2PC is never used if transaction did writes only on single node.
166+
To turn on <acronym>2PC</acronym> support, use the
167+
<varname>postgres_fdw.use_twophase</varname> variable.
168+
With <acronym>2PC</acronym> enabled, each transaction is
169+
<firstterm>prepared</firstterm> on all nodes before the commit.
170+
A successful <literal>PREPARE</literal> on the node means that
171+
this node is ready to commit the transaction, but it will only be
172+
committed if all the other nodes can commit this transaction as
173+
well. Otherwise, the transaction will be aborted. This approach
174+
allows to ensure transaction atomicity across multiple nodes.
177175
</para>
178176

179177
<para>
180-
This parameter can be changed at any time; the behavior for any one
181-
transaction is determined by the setting in effect when it commits.
182-
It is therefore possible to have some transactions commit with 2PC and
183-
some without.
178+
This parameter can be changed at any time. Since the transaction
179+
behavior is determined by the <varname>postgres_fdw.use_twophase</varname>
180+
setting at commit time, some distributed transactions may use
181+
2PC while others may not. Regardless of this setting, 2PC is never used
182+
for write transactions only affect a single node.
184183
</para>
185184

186185
<para>
187-
The well-known shortcoming of 2PC is that it is a blocking protocol: if
188-
transaction coordinator has failed, some transactions might hang until
189-
the resolution in PREPAREd state. <filename>pg_shardman</filename>
190-
provides <function>shardman.recover_xacts()</> function to resolve them.
186+
A well-known shortcoming of 2PC is that it is a blocking protocol: if
187+
the transaction coordinator has failed, some transactions might hang
188+
in the <literal>PREPARE</literal> state. <filename>pg_shardman</filename>
189+
provides <xref linkend="shardman-recover-xacts"> function to resolve them.
191190
</para>
191+
192+
<para>To ensure cluster-wide transaction isolation, use the following variables:
193+
</para>
194+
195+
<itemizedlist>
196+
<listitem>
197+
<para>
198+
<varname>track_global_snapshots</varname> variable enables a
199+
distributed transaction manager based on the Clock-SI
200+
algorithm, which provides cluster-wide transaction isolation at
201+
the <literal>Repeatable Read</literal> level. Changing this parameter
202+
requires a server restart.
203+
</para>
192204
</listitem>
193205
<listitem>
194206
<para>
195-
<varname>track_global_snapshots</varname> and
196-
<varname>postgres_fdw.use_global_snapshots</varname> variable control
197-
the behaviour of distributed transaction manager based on the Clock-SI
198-
algorithm, providing cluster-wide transaction isolation at
199-
the <literal>REPEATABLE READ</literal> level. To use
200-
it, <varname>track_global_snapshots</varname> must be set
201-
to <literal>on</>; change of this parameter requires server restart.
202-
<varname>postgres_fdw.use_global_snapshots</varname> defines whether to
203-
use global snapshot for current transaction. It can by changed at any
204-
time, its value is consulted during COMMIT.
207+
<varname>postgres_fdw.use_global_snapshots</varname> variable
208+
defines whether to use global snapshots for the current transaction.
209+
If you set this variable to <literal>on</literal>, you must also
210+
enable the distributed transaction manager using the
211+
<varname>track_global_snapshots</varname> variable. You can change
212+
the <varname>postgres_fdw.use_global_snapshots</varname> setting
213+
at any time; its value is consulted during transaction commit.
205214
</para>
206215
</listitem>
207216
</itemizedlist>
@@ -212,8 +221,8 @@
212221
replication, which can cause data inconsistencies if a node fails
213222
permanently. A part of a distributed transaction might get lost and cause
214223
a non-atomic result if the coordinator has prepared the transaction
215-
everywhere, started commiting it and then some node failed before
216-
committing the transaction on replica.
224+
everywhere, started commiting it, but one of the nodes failed before
225+
committing the transaction on a replica.
217226
</para>
218227

219228
<para>
@@ -430,10 +439,10 @@ max_prepared_transactions = 1000
430439
</listitem>
431440
<listitem>
432441
<para>
433-
Adjust the <xref linkend="guc-synchronous-commit"> variable to match your
434-
choice of replication mode: for synchronous replication, it must be set to
435-
on. Note that you can set it on per-transaction basis, making some
436-
transactions commit synchronously and some not.
442+
Make sure the <xref linkend="guc-synchronous-commit"> variable specifies the
443+
replication type of your choice. For synchronous replication, it must be set to
444+
<literal>on</literal>. You can change this setting at any time, so
445+
different transactions may use different replication modes.
437446

438447
<programlisting>
439448
synchronous_commit = on
@@ -481,7 +490,7 @@ CREATE EXTENSION pg_shardman CASCADE;
481490
is mandatory, as superuser privileges are required to configure logical
482491
replication between the nodes. The optional <varname>conn_string</varname>
483492
parameter is used for configuring FDWs; this allows to access the data
484-
without being superuser. If you omit this parameter, <filename>pg_shardman</filename>
493+
without superuser rights. If you omit this parameter, <filename>pg_shardman</filename>
485494
uses <varname>super_conn_string</varname> for all purposes.
486495
The <replaceable>repl_group</replaceable> parameter defines the
487496
replication group to add the node to. If you omit this, the node is
@@ -755,7 +764,7 @@ SELECT create_hash_partitions('films', 'id', 30, redundancy = 1);
755764
Once a table is sharded and filled with data, you can execute
756765
<acronym>DML</acronym> statements on this table from any <filename>pg_shardman</filename> worker
757766
node. <acronym>DML</acronym> statements can access more than one shard, remote or local.
758-
It is implemented using the standard <productname>&productname;</productname>
767+
It is implemented using the standard <productname>&project;</productname>
759768
inheritance mechanism: all partitions are derived from the parent table.
760769
If the required partition is located on another node, it will be
761770
accessed using <filename>postgres_fdw</filename>.
@@ -776,17 +785,18 @@ SELECT create_hash_partitions('films', 'id', 30, redundancy = 1);
776785
</note>
777786

778787
<para>
779-
DDL support is very limited. Function <xref linkend="shardman-alter-table">
780-
alters root table at all nodes and updates its definition in metadata; it
781-
can be used for column addition, removal and renaming. However, changing
788+
DDL support is very limited. Function <xref linkend="shardman-alter-table">
789+
alters root table on all nodes and updates its definition in cluster metadata; it
790+
can be used to add, remove, or rename columns. However, changing
782791
column used as sharding key is not supported. NOT NULL constraint can also
783792
be created using this function. Foreign keys pointing to sharded tables are
784793
not supported. Though it is possible to create UNIQUE constraint, it will be
785794
enforced only on per-partition basis. Indexes which were created on table
786795
before sharding it will be included in table definiton and created on
787796
partitions and replicas too. <xref linkend="shardman-forall"> executes SQL
788-
statement on all nodes; it can be used to create indexes later. For that it
789-
is necessary to create an index on each partition separately, like described in <ulink url="https://github.com/postgrespro/pg_pathman/wiki/How-do-I-create-indexes"><filename>pg_pathman wiki</filename></ulink>.
797+
statement on all nodes; it can be used to create indexes later. For that it
798+
is necessary to create an index on each partition separately, as described
799+
in <ulink url="https://github.com/postgrespro/pg_pathman/wiki/How-do-I-create-indexes"><filename>pg_pathman wiki</filename></ulink>.
790800

791801
</para>
792802

@@ -1149,9 +1159,9 @@ excludes this node from the cluster, as follows:
11491159
<listitem>
11501160
<para>
11511161
When set to <literal>on</literal>, <filename>pg_shardman</filename>
1152-
adds replicas to list of <xref linkend="guc-synchronous-standby-names">,
1153-
enabling synchronous replication. This parameter should not be changed
1154-
if any replicas exist.
1162+
adds replicas to the list of <xref linkend="guc-synchronous-standby-names">,
1163+
enabling synchronous replication. This parameter should not be changed
1164+
if any replicas exist.
11551165
</para>
11561166
<para>
11571167
Default: <literal>off</literal>
@@ -1174,9 +1184,10 @@ excludes this node from the cluster, as follows:
11741184
Connection string for the shardlord. You can use all the options that
11751185
libpq accepts in connection strings, as described in <xref linkend="libpq-paramkeywords">.
11761186
You must ensure that the node is accessed on behalf of a superuser.
1177-
This GUC must be set on shardlord itself; you can also optionally set
1178-
it on worker nodes, in that case you can execute cluster management
1179-
functions on workers and they will be redirected to shardlord automatically.
1187+
This variable must be set on the shardlord itself. You can also optionally set
1188+
it on worker nodes if you would like to execute cluster management
1189+
functions on worker nodes. In this case, these commands will be automatically
1190+
redirected to the shardlord.
11801191
</para>
11811192
<para>
11821193
This parameter can only be set in the <filename>postgresql.conf</>
@@ -1721,24 +1732,28 @@ excludes this node from the cluster, as follows:
17211732
</term>
17221733
<listitem>
17231734
<para>
1724-
Execute SQL statement on all nodes.
1735+
Execute an SQL statement on all nodes.
17251736
</para>
17261737
<para>Arguments:
17271738
</para>
17281739
<itemizedlist spacing="compact">
17291740
<listitem>
17301741
<para>
1731-
<parameter>sql</parameter> &mdash; Statement to execute.
1742+
<parameter>sql</parameter> &mdash; the statement to execute.
17321743
</para>
17331744
</listitem>
17341745
<listitem>
17351746
<para>
1736-
<parameter>use_2pc</parameter> &mdash; use two-phase commit?
1747+
<parameter>use_2pc</parameter> &mdash; defines whether to use
1748+
two-phase commit. The default value is inherited from
1749+
the <xref linkend="guc-postgres-fdw-use-twophase"> setting.
17371750
</para>
17381751
</listitem>
17391752
<listitem>
17401753
<para>
1741-
<parameter>including_shardlord</parameter> &mdash; run statement on shardlord too?
1754+
<parameter>including_shardlord</parameter> &mdash; defines whether
1755+
to run the SQL statement on the shardlord. By default, the statement
1756+
is executed on worker nodes only.
17421757
</para>
17431758
</listitem>
17441759
</itemizedlist>
@@ -1750,7 +1765,7 @@ excludes this node from the cluster, as follows:
17501765
<function>shardman.alter_table(<parameter>relation</parameter> <type>regclass</type>, <parameter>alter_clause</parameter> <type>text</type>)
17511766
</function>
17521767
<indexterm>
1753-
<primary><function>shardman.forall()</></primary>
1768+
<primary><function>shardman.alter_table()</></primary>
17541769
</indexterm>
17551770
</term>
17561771
<listitem>
@@ -1774,7 +1789,7 @@ excludes this node from the cluster, as follows:
17741789
<para>
17751790
Example:
17761791
<programlisting>
1777-
SELECT shardman.alter_table('films', 'add column author text');
1792+
SELECT shardman.alter_table('films', 'ADD COLUMN author text');
17781793
</programlisting>
17791794

17801795
</para>

doc/src/sgml/postgres-fdw.sgml

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -635,12 +635,9 @@
635635
</term>
636636
<listitem>
637637
<para>
638-
Enables/disables distributed transaction manager based on
639-
Clock-SI algorithm. When enabled, provides cluster-wide transaction
640-
isolation at the <literal>REPEATABLE READ</literal> level.
641-
<varname>track_global_snapshots</varname> must be set to
642-
<literal>true</literal> when enabling this. It can by changed at any
643-
time, its value is consulted during COMMIT.
638+
Enables/disables global snapshots for the current transaction
639+
if <xref linkend="guc-track-global-snapshots"> is set to <literal>true</literal>.
640+
This setting can be changed at any time, its value is consulted during transaction commit.
644641
</para>
645642
<para>
646643
Default: <literal>false</literal>

0 commit comments

Comments
 (0)