Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 86989a4

Browse files
Antonin HouskaCommitfest Bot
Antonin Houska
authored and
Commitfest Bot
committed
Add REPACK command.
The existing CLUSTER command as well as VACUUM with the FULL option both reclaim unused space by rewriting table. Now that we want to enhance this functionality (in particular, by adding a new option CONCURRENTLY), we should enhance both commands because they are both implemented by the same function (cluster.c:cluster_rel). However, adding the same option to two different commands is not very user-friendly. Therefore it was decided to create a new command and to declare both CLUSTER command and the FULL option of VACUUM deprecated. Future enhancements to this rewriting code will only affect the new command. Like CLUSTER, the REPACK command reorders the table according to the specified index. Unlike CLUSTER, REPACK does not require the index: if only table is specified, the command acts as VACUUM FULL. As we don't want to remove CLUSTER and VACUUM FULL yet, there are three callers of the cluster_rel() function now: REPACK, CLUSTER and VACUUM FULL. When we need to distinguish who is calling this function (mostly for logging, but also for progress reporting), we can no longer use the OID of the clustering index: both REPACK and VACUUM FULL can pass InvalidOid. Therefore, this patch introduces a new enumeration type ClusterCommand, and adds an argument of this type to the cluster_rel() function and to all the functions that need to distinguish the caller. Like CLUSTER and VACUUM FULL, the REPACK COMMAND without arguments processes all the tables on which the current user has the MAINTAIN privilege. A new view pg_stat_progress_repack view is added to monitor the progress of REPACK. Currently it displays the same information as pg_stat_progress_cluster (except that column names might differ), but it'll also display the status of the REPACK CONCURRENTLY command in the future, so the view definitions will eventually diverge. Regarding user documentation, the patch moves the information on clustering from cluster.sgml to the new file repack.sgml. cluster.sgml now contains a link that points to the related section of repack.sgml. A note on deprecation and a link to repack.sgml are added to both cluster.sgml and vacuum.sgml.
1 parent 8a51027 commit 86989a4

File tree

25 files changed

+1259
-207
lines changed

25 files changed

+1259
-207
lines changed

doc/src/sgml/monitoring.sgml

Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -400,6 +400,14 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
400400
</entry>
401401
</row>
402402

403+
<row>
404+
<entry><structname>pg_stat_progress_repack</structname><indexterm><primary>pg_stat_progress_repack</primary></indexterm></entry>
405+
<entry>One row for each backend running
406+
<command>REPACK</command>, showing current progress. See
407+
<xref linkend="repack-progress-reporting"/>.
408+
</entry>
409+
</row>
410+
403411
<row>
404412
<entry><structname>pg_stat_progress_basebackup</structname><indexterm><primary>pg_stat_progress_basebackup</primary></indexterm></entry>
405413
<entry>One row for each WAL sender process streaming a base backup,
@@ -5943,6 +5951,228 @@ FROM pg_stat_get_backend_idset() AS backendid;
59435951
</table>
59445952
</sect2>
59455953

5954+
<sect2 id="repack-progress-reporting">
5955+
<title>REPACK Progress Reporting</title>
5956+
5957+
<indexterm>
5958+
<primary>pg_stat_progress_repack</primary>
5959+
</indexterm>
5960+
5961+
<para>
5962+
Whenever <command>REPACK</command> is running,
5963+
the <structname>pg_stat_progress_repack</structname> view will contain a
5964+
row for each backend that is currently running the command. The tables
5965+
below describe the information that will be reported and provide
5966+
information about how to interpret it.
5967+
</para>
5968+
5969+
<table id="pg-stat-progress-repack-view" xreflabel="pg_stat_progress_repack">
5970+
<title><structname>pg_stat_progress_repack</structname> View</title>
5971+
<tgroup cols="1">
5972+
<thead>
5973+
<row>
5974+
<entry role="catalog_table_entry"><para role="column_definition">
5975+
Column Type
5976+
</para>
5977+
<para>
5978+
Description
5979+
</para></entry>
5980+
</row>
5981+
</thead>
5982+
5983+
<tbody>
5984+
<row>
5985+
<entry role="catalog_table_entry"><para role="column_definition">
5986+
<structfield>pid</structfield> <type>integer</type>
5987+
</para>
5988+
<para>
5989+
Process ID of backend.
5990+
</para></entry>
5991+
</row>
5992+
5993+
<row>
5994+
<entry role="catalog_table_entry"><para role="column_definition">
5995+
<structfield>datid</structfield> <type>oid</type>
5996+
</para>
5997+
<para>
5998+
OID of the database to which this backend is connected.
5999+
</para></entry>
6000+
</row>
6001+
6002+
<row>
6003+
<entry role="catalog_table_entry"><para role="column_definition">
6004+
<structfield>datname</structfield> <type>name</type>
6005+
</para>
6006+
<para>
6007+
Name of the database to which this backend is connected.
6008+
</para></entry>
6009+
</row>
6010+
6011+
<row>
6012+
<entry role="catalog_table_entry"><para role="column_definition">
6013+
<structfield>relid</structfield> <type>oid</type>
6014+
</para>
6015+
<para>
6016+
OID of the table being repacked.
6017+
</para></entry>
6018+
</row>
6019+
6020+
<row>
6021+
<entry role="catalog_table_entry"><para role="column_definition">
6022+
<structfield>command</structfield> <type>text</type>
6023+
</para>
6024+
<para>
6025+
The command that is running. Currently, the only value
6026+
is <literal>REPACK</literal>.
6027+
</para></entry>
6028+
</row>
6029+
6030+
<row>
6031+
<entry role="catalog_table_entry"><para role="column_definition">
6032+
<structfield>phase</structfield> <type>text</type>
6033+
</para>
6034+
<para>
6035+
Current processing phase. See <xref linkend="repack-phases"/>.
6036+
</para></entry>
6037+
</row>
6038+
6039+
<row>
6040+
<entry role="catalog_table_entry"><para role="column_definition">
6041+
<structfield>repack_index_relid</structfield> <type>oid</type>
6042+
</para>
6043+
<para>
6044+
If the table is being scanned using an index, this is the OID of the
6045+
index being used; otherwise, it is zero.
6046+
</para></entry>
6047+
</row>
6048+
6049+
<row>
6050+
<entry role="catalog_table_entry"><para role="column_definition">
6051+
<structfield>heap_tuples_scanned</structfield> <type>bigint</type>
6052+
</para>
6053+
<para>
6054+
Number of heap tuples scanned.
6055+
This counter only advances when the phase is
6056+
<literal>seq scanning heap</literal>,
6057+
<literal>index scanning heap</literal>
6058+
or <literal>writing new heap</literal>.
6059+
</para></entry>
6060+
</row>
6061+
6062+
<row>
6063+
<entry role="catalog_table_entry"><para role="column_definition">
6064+
<structfield>heap_tuples_written</structfield> <type>bigint</type>
6065+
</para>
6066+
<para>
6067+
Number of heap tuples written.
6068+
This counter only advances when the phase is
6069+
<literal>seq scanning heap</literal>,
6070+
<literal>index scanning heap</literal>
6071+
or <literal>writing new heap</literal>.
6072+
</para></entry>
6073+
</row>
6074+
6075+
<row>
6076+
<entry role="catalog_table_entry"><para role="column_definition">
6077+
<structfield>heap_blks_total</structfield> <type>bigint</type>
6078+
</para>
6079+
<para>
6080+
Total number of heap blocks in the table. This number is reported
6081+
as of the beginning of <literal>seq scanning heap</literal>.
6082+
</para></entry>
6083+
</row>
6084+
6085+
<row>
6086+
<entry role="catalog_table_entry"><para role="column_definition">
6087+
<structfield>heap_blks_scanned</structfield> <type>bigint</type>
6088+
</para>
6089+
<para>
6090+
Number of heap blocks scanned. This counter only advances when the
6091+
phase is <literal>seq scanning heap</literal>.
6092+
</para></entry>
6093+
</row>
6094+
6095+
<row>
6096+
<entry role="catalog_table_entry"><para role="column_definition">
6097+
<structfield>index_rebuild_count</structfield> <type>bigint</type>
6098+
</para>
6099+
<para>
6100+
Number of indexes rebuilt. This counter only advances when the phase
6101+
is <literal>rebuilding index</literal>.
6102+
</para></entry>
6103+
</row>
6104+
</tbody>
6105+
</tgroup>
6106+
</table>
6107+
6108+
<table id="repack-phases">
6109+
<title>REPACK Phases</title>
6110+
<tgroup cols="2">
6111+
<colspec colname="col1" colwidth="1*"/>
6112+
<colspec colname="col2" colwidth="2*"/>
6113+
<thead>
6114+
<row>
6115+
<entry>Phase</entry>
6116+
<entry>Description</entry>
6117+
</row>
6118+
</thead>
6119+
6120+
<tbody>
6121+
<row>
6122+
<entry><literal>initializing</literal></entry>
6123+
<entry>
6124+
The command is preparing to begin scanning the heap. This phase is
6125+
expected to be very brief.
6126+
</entry>
6127+
</row>
6128+
<row>
6129+
<entry><literal>seq scanning heap</literal></entry>
6130+
<entry>
6131+
The command is currently scanning the table using a sequential scan.
6132+
</entry>
6133+
</row>
6134+
<row>
6135+
<entry><literal>index scanning heap</literal></entry>
6136+
<entry>
6137+
<command>REPACK</command> is currently scanning the table using an index scan.
6138+
</entry>
6139+
</row>
6140+
<row>
6141+
<entry><literal>sorting tuples</literal></entry>
6142+
<entry>
6143+
<command>REPACK</command> is currently sorting tuples.
6144+
</entry>
6145+
</row>
6146+
<row>
6147+
<entry><literal>writing new heap</literal></entry>
6148+
<entry>
6149+
<command>REPACK</command> is currently writing the new heap.
6150+
</entry>
6151+
</row>
6152+
<row>
6153+
<entry><literal>swapping relation files</literal></entry>
6154+
<entry>
6155+
The command is currently swapping newly-built files into place.
6156+
</entry>
6157+
</row>
6158+
<row>
6159+
<entry><literal>rebuilding index</literal></entry>
6160+
<entry>
6161+
The command is currently rebuilding an index.
6162+
</entry>
6163+
</row>
6164+
<row>
6165+
<entry><literal>performing final cleanup</literal></entry>
6166+
<entry>
6167+
The command is performing final cleanup. When this phase is
6168+
completed, <command>REPACK</command> will end.
6169+
</entry>
6170+
</row>
6171+
</tbody>
6172+
</tgroup>
6173+
</table>
6174+
</sect2>
6175+
59466176
<sect2 id="copy-progress-reporting">
59476177
<title>COPY Progress Reporting</title>
59486178

doc/src/sgml/ref/allfiles.sgml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,7 @@ Complete list of usable sgml source files in this directory.
167167
<!ENTITY refreshMaterializedView SYSTEM "refresh_materialized_view.sgml">
168168
<!ENTITY reindex SYSTEM "reindex.sgml">
169169
<!ENTITY releaseSavepoint SYSTEM "release_savepoint.sgml">
170+
<!ENTITY repack SYSTEM "repack.sgml">
170171
<!ENTITY reset SYSTEM "reset.sgml">
171172
<!ENTITY revoke SYSTEM "revoke.sgml">
172173
<!ENTITY rollback SYSTEM "rollback.sgml">

doc/src/sgml/ref/cluster.sgml

Lines changed: 17 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -42,17 +42,23 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
4242
<replaceable class="parameter">table_name</replaceable>.
4343
</para>
4444

45-
<para>
46-
When a table is clustered, it is physically reordered
47-
based on the index information. Clustering is a one-time operation:
48-
when the table is subsequently updated, the changes are
49-
not clustered. That is, no attempt is made to store new or
50-
updated rows according to their index order. (If one wishes, one can
51-
periodically recluster by issuing the command again. Also, setting
52-
the table's <literal>fillfactor</literal> storage parameter to less than
53-
100% can aid in preserving cluster ordering during updates, since updated
54-
rows are kept on the same page if enough space is available there.)
55-
</para>
45+
<warning>
46+
<para>
47+
The <command>CLUSTER</command> command is deprecated in favor of
48+
<xref linkend="sql-repack"/>.
49+
</para>
50+
</warning>
51+
52+
<note>
53+
<para>
54+
<xref linkend="sql-repack-notes-on-clustering"/> explain how clustering
55+
works, whether it is initiated by <command>CLUSTER</command> or
56+
by <command>REPACK</command>. The notable difference between the two is
57+
that <command>REPACK</command> does not remember the index used last
58+
time. Thus if you don't specify an index, <command>REPACK</command>
59+
rewrites the table but does not try to cluster it.
60+
</para>
61+
</note>
5662

5763
<para>
5864
When a table is clustered, <productname>PostgreSQL</productname>
@@ -136,63 +142,12 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
136142
on the table.
137143
</para>
138144

139-
<para>
140-
In cases where you are accessing single rows randomly
141-
within a table, the actual order of the data in the
142-
table is unimportant. However, if you tend to access some
143-
data more than others, and there is an index that groups
144-
them together, you will benefit from using <command>CLUSTER</command>.
145-
If you are requesting a range of indexed values from a table, or a
146-
single indexed value that has multiple rows that match,
147-
<command>CLUSTER</command> will help because once the index identifies the
148-
table page for the first row that matches, all other rows
149-
that match are probably already on the same table page,
150-
and so you save disk accesses and speed up the query.
151-
</para>
152-
153-
<para>
154-
<command>CLUSTER</command> can re-sort the table using either an index scan
155-
on the specified index, or (if the index is a b-tree) a sequential
156-
scan followed by sorting. It will attempt to choose the method that
157-
will be faster, based on planner cost parameters and available statistical
158-
information.
159-
</para>
160-
161145
<para>
162146
While <command>CLUSTER</command> is running, the <xref
163147
linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
164148
pg_temp</literal>.
165149
</para>
166150

167-
<para>
168-
When an index scan is used, a temporary copy of the table is created that
169-
contains the table data in the index order. Temporary copies of each
170-
index on the table are created as well. Therefore, you need free space on
171-
disk at least equal to the sum of the table size and the index sizes.
172-
</para>
173-
174-
<para>
175-
When a sequential scan and sort is used, a temporary sort file is
176-
also created, so that the peak temporary space requirement is as much
177-
as double the table size, plus the index sizes. This method is often
178-
faster than the index scan method, but if the disk space requirement is
179-
intolerable, you can disable this choice by temporarily setting <xref
180-
linkend="guc-enable-sort"/> to <literal>off</literal>.
181-
</para>
182-
183-
<para>
184-
It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to
185-
a reasonably large value (but not more than the amount of RAM you can
186-
dedicate to the <command>CLUSTER</command> operation) before clustering.
187-
</para>
188-
189-
<para>
190-
Because the planner records statistics about the ordering of
191-
tables, it is advisable to run <link linkend="sql-analyze"><command>ANALYZE</command></link>
192-
on the newly clustered table.
193-
Otherwise, the planner might make poor choices of query plans.
194-
</para>
195-
196151
<para>
197152
Because <command>CLUSTER</command> remembers which indexes are clustered,
198153
one can cluster the tables one wants clustered manually the first time,

0 commit comments

Comments
 (0)