Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 6f44213

Browse files
michail-nikolaevCommitfest Bot
authored and
Commitfest Bot
committed
Use auxiliary indexes for concurrent index operations
Replace the second table full scan in concurrent index builds with an auxiliary index approach: - create a STIR auxiliary index with the same predicate (if exists) as in main index - use it to track tuples inserted during the first phase - merge auxiliary index with main index during validation to catch up new index with any tuples missed during the first phase - automatically drop auxiliary when main index is ready To merge main and auxiliary indexes: - index_bulk_delete called for both, TIDs put into tuplesort - both tuplesort are being sorted - both tuplesort scanned with two pointers looking for the TIDs present in auxiliary index, but absent in main one - all such TIDs are put into tuplestore - all TIDs in tuplestore are fetched using the stream, tuplestore used in heapam_index_validate_scan_read_stream_next to provide the next page to prefetch - if fetched tuple is alive - it is inserted into the main index This eliminates the need for a second full table scan during validation, improving performance especially for large tables. Affects both CREATE INDEX CONCURRENTLY and REINDEX INDEX CONCURRENTLY operations.
1 parent 3d30a78 commit 6f44213

File tree

19 files changed

+1101
-348
lines changed

19 files changed

+1101
-348
lines changed

doc/src/sgml/monitoring.sgml

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6314,6 +6314,18 @@ FROM pg_stat_get_backend_idset() AS backendid;
63146314
information for this phase.
63156315
</entry>
63166316
</row>
6317+
<row>
6318+
<entry><literal>waiting for writers to use auxiliary index</literal></entry>
6319+
<entry>
6320+
<command>CREATE INDEX CONCURRENTLY</command> or <command>REINDEX CONCURRENTLY</command> is waiting for transactions
6321+
with write locks that can potentially see the table to finish, to ensure use of auxiliary index for new tuples in
6322+
future transactions.
6323+
This phase is skipped when not in concurrent mode.
6324+
Columns <structname>lockers_total</structname>, <structname>lockers_done</structname>
6325+
and <structname>current_locker_pid</structname> contain the progress
6326+
information for this phase.
6327+
</entry>
6328+
</row>
63176329
<row>
63186330
<entry><literal>building index</literal></entry>
63196331
<entry>
@@ -6354,13 +6366,12 @@ FROM pg_stat_get_backend_idset() AS backendid;
63546366
</entry>
63556367
</row>
63566368
<row>
6357-
<entry><literal>index validation: scanning table</literal></entry>
6369+
<entry><literal>index validation: merging indexes</literal></entry>
63586370
<entry>
6359-
<command>CREATE INDEX CONCURRENTLY</command> is scanning the table
6360-
to validate the index tuples collected in the previous two phases.
6371+
<command>CREATE INDEX CONCURRENTLY</command> merging content of auxiliary index with the target index.
63616372
This phase is skipped when not in concurrent mode.
6362-
Columns <structname>blocks_total</structname> (set to the total size of the table)
6363-
and <structname>blocks_done</structname> contain the progress information for this phase.
6373+
Columns <structname>tuples_total</structname> (set to the number of tuples to be merged)
6374+
and <structname>tuples_done</structname> contain the progress information for this phase.
63646375
</entry>
63656376
</row>
63666377
<row>
@@ -6377,8 +6388,9 @@ FROM pg_stat_get_backend_idset() AS backendid;
63776388
<row>
63786389
<entry><literal>waiting for readers before marking dead</literal></entry>
63796390
<entry>
6380-
<command>REINDEX CONCURRENTLY</command> is waiting for transactions
6381-
with read locks on the table to finish, before marking the old index dead.
6391+
<command>CREATE INDEX CONCURRENTLY</command> is waiting for transactions
6392+
with read locks on the table to finish, before marking the auxiliary index as dead.
6393+
<command>REINDEX CONCURRENTLY</command> is also waiting before marking the old index as dead.
63826394
This phase is skipped when not in concurrent mode.
63836395
Columns <structname>lockers_total</structname>, <structname>lockers_done</structname>
63846396
and <structname>current_locker_pid</structname> contain the progress

doc/src/sgml/ref/create_index.sgml

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -620,25 +620,25 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
620620
out writes. This method is invoked by specifying the
621621
<literal>CONCURRENTLY</literal> option of <command>CREATE INDEX</command>.
622622
When this option is used,
623-
<productname>PostgreSQL</productname> must perform two scans of the table, and in
624-
addition it must wait for all existing transactions that could potentially
625-
modify or use the index to terminate. Thus
626-
this method requires more total work than a standard index build and takes
623+
<productname>PostgreSQL</productname> must perform table scan followed by
624+
validation phase, and in addition it must wait for all existing transactions
625+
that could potentially modify or use the index to terminate. Thus
626+
this method requires more total work than a standard index build and may take
627627
significantly longer to complete. However, since it allows normal
628628
operations to continue while the index is built, this method is useful for
629629
adding new indexes in a production environment. Of course, the extra CPU
630630
and I/O load imposed by the index creation might slow other operations.
631631
</para>
632632

633633
<para>
634-
In a concurrent index build, the index is actually entered as an
635-
<quote>invalid</quote> index into
636-
the system catalogs in one transaction, then two table scans occur in
637-
two more transactions. Before each table scan, the index build must
634+
In a concurrent index build, the main and auxiliary indexes is actually
635+
entered as an <quote>invalid</quote> index into
636+
the system catalogs in one transaction, then two phases occur in
637+
multiple transactions. Before each phase, the index build must
638638
wait for existing transactions that have modified the table to terminate.
639-
After the second scan, the index build must wait for any transactions
639+
After the second phase, the index build must wait for any transactions
640640
that have a snapshot (see <xref linkend="mvcc"/>) predating the second
641-
scan to terminate, including transactions used by any phase of concurrent
641+
phase to terminate, including transactions used by any phase of concurrent
642642
index builds on other tables, if the indexes involved are partial or have
643643
columns that are not simple column references.
644644
Then finally the index can be marked <quote>valid</quote> and ready for use,
@@ -651,10 +651,11 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
651651
<para>
652652
If a problem arises while scanning the table, such as a deadlock or a
653653
uniqueness violation in a unique index, the <command>CREATE INDEX</command>
654-
command will fail but leave behind an <quote>invalid</quote> index. This index
655-
will be ignored for querying purposes because it might be incomplete;
656-
however it will still consume update overhead. The <application>psql</application>
657-
<command>\d</command> command will report such an index as <literal>INVALID</literal>:
654+
command will fail but leave behind an <quote>invalid</quote> index and its
655+
associated auxiliary index. These indexes
656+
will be ignored for querying purposes because they might be incomplete;
657+
however they will still consume update overhead. The <application>psql</application>
658+
<command>\d</command> command will report such indexes as <literal>INVALID</literal>:
658659

659660
<programlisting>
660661
postgres=# \d tab
@@ -664,11 +665,12 @@ postgres=# \d tab
664665
col | integer | | |
665666
Indexes:
666667
"idx" btree (col) INVALID
668+
"idx_ccaux" stir (col) INVALID
667669
</programlisting>
668670

669671
The recommended recovery
670-
method in such cases is to drop the index and try again to perform
671-
<command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is
672+
method in such cases is to drop these indexes and try again to perform
673+
<command>CREATE INDEX CONCURRENTLY</command>. (Another possibility is
672674
to rebuild the index with <command>REINDEX INDEX CONCURRENTLY</command>).
673675
</para>
674676

doc/src/sgml/ref/reindex.sgml

Lines changed: 25 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -368,9 +368,8 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
368368
<productname>PostgreSQL</productname> supports rebuilding indexes with minimum locking
369369
of writes. This method is invoked by specifying the
370370
<literal>CONCURRENTLY</literal> option of <command>REINDEX</command>. When this option
371-
is used, <productname>PostgreSQL</productname> must perform two scans of the table
372-
for each index that needs to be rebuilt and wait for termination of
373-
all existing transactions that could potentially use the index.
371+
is used, <productname>PostgreSQL</productname> must perform several steps to ensure data
372+
consistency while allowing normal operations to continue.
374373
This method requires more total work than a standard index
375374
rebuild and takes significantly longer to complete as it needs to wait
376375
for unfinished transactions that might modify the index. However, since
@@ -388,7 +387,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
388387
<orderedlist>
389388
<listitem>
390389
<para>
391-
A new transient index definition is added to the catalog
390+
A new transient index definition and an auxiliary index are added to the catalog
392391
<literal>pg_index</literal>. This definition will be used to replace
393392
the old index. A <literal>SHARE UPDATE EXCLUSIVE</literal> lock at
394393
session level is taken on the indexes being reindexed as well as their
@@ -398,7 +397,15 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
398397

399398
<listitem>
400399
<para>
401-
A first pass to build the index is done for each new index. Once the
400+
The auxiliary index is marked as "ready for inserts", making
401+
it visible to other sessions. This index efficiently tracks all new
402+
tuples during the reindex process.
403+
</para>
404+
</listitem>
405+
406+
<listitem>
407+
<para>
408+
The new main index is built by scanning the table. Once the
402409
index is built, its flag <literal>pg_index.indisready</literal> is
403410
switched to <quote>true</quote> to make it ready for inserts, making it
404411
visible to other sessions once the transaction that performed the build
@@ -409,9 +416,9 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
409416

410417
<listitem>
411418
<para>
412-
Then a second pass is performed to add tuples that were added while the
413-
first pass was running. This step is also done in a separate
414-
transaction for each index.
419+
A validation phase merges any missing entries from the auxiliary index
420+
into the main index, ensuring all concurrent changes are captured.
421+
This step is also done in a separate transaction for each index.
415422
</para>
416423
</listitem>
417424

@@ -428,15 +435,15 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
428435

429436
<listitem>
430437
<para>
431-
The old indexes have <literal>pg_index.indisready</literal> switched to
438+
The old and auxiliary indexes have <literal>pg_index.indisready</literal> switched to
432439
<quote>false</quote> to prevent any new tuple insertions, after waiting
433440
for running queries that might reference the old index to complete.
434441
</para>
435442
</listitem>
436443

437444
<listitem>
438445
<para>
439-
The old indexes are dropped. The <literal>SHARE UPDATE
446+
The old and auxiliary indexes are dropped. The <literal>SHARE UPDATE
440447
EXCLUSIVE</literal> session locks for the indexes and the table are
441448
released.
442449
</para>
@@ -447,11 +454,11 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { DA
447454
<para>
448455
If a problem arises while rebuilding the indexes, such as a
449456
uniqueness violation in a unique index, the <command>REINDEX</command>
450-
command will fail but leave behind an <quote>invalid</quote> new index in addition to
451-
the pre-existing one. This index will be ignored for querying purposes
452-
because it might be incomplete; however it will still consume update
457+
command will fail but leave behind an <quote>invalid</quote> new index and its auxiliary index in addition to
458+
the pre-existing one. These indexes will be ignored for querying purposes
459+
because they might be incomplete; however they will still consume update
453460
overhead. The <application>psql</application> <command>\d</command> command will report
454-
such an index as <literal>INVALID</literal>:
461+
such indexes as <literal>INVALID</literal>:
455462

456463
<programlisting>
457464
postgres=# \d tab
@@ -462,12 +469,14 @@ postgres=# \d tab
462469
Indexes:
463470
"idx" btree (col)
464471
"idx_ccnew" btree (col) INVALID
472+
"idx_ccaux" stir (col) INVALID
473+
465474
</programlisting>
466475

467476
If the index marked <literal>INVALID</literal> is suffixed
468-
<literal>_ccnew</literal>, then it corresponds to the transient
477+
<literal>_ccnew</literal> or <literal>_ccaux</literal>, then it corresponds to the transient or auxiliary
469478
index created during the concurrent operation, and the recommended
470-
recovery method is to drop it using <literal>DROP INDEX</literal>,
479+
recovery method is to drop these indexes using <literal>DROP INDEX</literal>,
471480
then attempt <command>REINDEX CONCURRENTLY</command> again.
472481
If the invalid index is instead suffixed <literal>_ccold</literal>,
473482
it corresponds to the original index which could not be dropped;

src/backend/access/heap/README.HOT

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -375,6 +375,11 @@ constraint on which updates can be HOT. Other transactions must include
375375
such an index when determining HOT-safety of updates, even though they
376376
must ignore it for both insertion and searching purposes.
377377

378+
Also, special auxiliary index is created the same way. It marked as
379+
"ready for inserts" without any actual table scan. Its purpose is collect
380+
new tuples inserted into table while our target index is still "not ready
381+
for inserts"
382+
378383
We must do this to avoid making incorrect index entries. For example,
379384
suppose we are building an index on column X and we make an index entry for
380385
a non-HOT tuple with X=1. Then some other backend, unaware that X is an
@@ -394,10 +399,10 @@ As above, we point the index entry at the root of the HOT-update chain but we
394399
use the key value from the live tuple.
395400

396401
We mark the index open for inserts (but still not ready for reads) then
397-
we again wait for transactions which have the table open. Then we take
398-
a second reference snapshot and validate the index. This searches for
399-
tuples missing from the index, and inserts any missing ones. Again,
400-
the index entries have to have TIDs equal to HOT-chain root TIDs, but
402+
we again wait for transactions which have the table open. Then validate
403+
the index. This searches for tuples missing from the index in auxiliary
404+
index, and inserts any missing ones if them visible to reference snapshot.
405+
Again, the index entries have to have TIDs equal to HOT-chain root TIDs, but
401406
the value to be inserted is the one from the live tuple.
402407

403408
Then we wait until every transaction that could have a snapshot older than

0 commit comments

Comments
 (0)