Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit e093dcd

Browse files
committed
Add the ability to create indexes 'concurrently', that is, without
blocking concurrent writes to the table. Greg Stark, with a little help from Tom Lane.
1 parent 8f91e2b commit e093dcd

34 files changed

+1025
-138
lines changed

doc/src/sgml/catalogs.sgml

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/catalogs.sgml,v 2.129 2006/07/31 20:08:55 tgl Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/catalogs.sgml,v 2.130 2006/08/25 04:06:44 tgl Exp $ -->
22
<!--
33
Documentation of the system catalogs, directed toward PostgreSQL developers
44
-->
@@ -2420,6 +2420,17 @@
24202420
<entry>If true, the table was last clustered on this index</entry>
24212421
</row>
24222422

2423+
<row>
2424+
<entry><structfield>indisvalid</structfield></entry>
2425+
<entry><type>bool</type></entry>
2426+
<entry></entry>
2427+
<entry>If true, the index is currently valid for queries.
2428+
False means the index is possibly incomplete: it must still be
2429+
inserted into by INSERT/UPDATE operations, but it cannot safely be
2430+
used for queries, and if it is unique, the uniqueness shouldn't be
2431+
relied on either.</entry>
2432+
</row>
2433+
24232434
<row>
24242435
<entry><structfield>indkey</structfield></entry>
24252436
<entry><type>int2vector</type></entry>

doc/src/sgml/indexam.sgml

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/indexam.sgml,v 2.16 2006/07/31 20:08:59 tgl Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/indexam.sgml,v 2.17 2006/08/25 04:06:44 tgl Exp $ -->
22

33
<chapter id="indexam">
44
<title>Index Access Method Interface Definition</title>
@@ -648,6 +648,16 @@ amrestrpos (IndexScanDesc scan);
648648
</itemizedlist>
649649
</para>
650650

651+
<para>
652+
Furthermore, immediately before raising a uniqueness violation
653+
according to the above rules, the access method must recheck the
654+
liveness of the row being inserted. If it is committed dead then
655+
no error should be raised. (This case cannot occur during the
656+
ordinary scenario of inserting a row that's just been created by
657+
the current transaction. It can happen during
658+
<command>CREATE UNIQUE INDEX CONCURRENTLY</>, however.)
659+
</para>
660+
651661
<para>
652662
We require the index access method to apply these tests itself, which
653663
means that it must reach into the heap to check the commit status of

doc/src/sgml/indices.sgml

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/indices.sgml,v 1.57 2006/05/24 11:01:39 teodor Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/indices.sgml,v 1.58 2006/08/25 04:06:44 tgl Exp $ -->
22

33
<chapter id="indexes">
44
<title id="indexes-title">Indexes</title>
@@ -90,6 +90,17 @@ CREATE INDEX test1_id_index ON test1 (id);
9090
significantly speed up queries with joins.
9191
</para>
9292

93+
<para>
94+
Creating an index on a large table can take a long time. By default,
95+
<productname>PostgreSQL</productname> allows reads (selects) to occur
96+
on the table in parallel with index creation, but writes (inserts,
97+
updates, deletes) are blocked until the index build is finished.
98+
It is possible to allow writes to occur in parallel with index
99+
creation, but there are several caveats to be aware of &mdash;
100+
for more information see <xref linkend="SQL-CREATEINDEX-CONCURRENTLY"
101+
endterm="SQL-CREATEINDEX-CONCURRENTLY-title">.
102+
</para>
103+
93104
<para>
94105
After an index is created, the system has to keep it synchronized with the
95106
table. This adds overhead to data manipulation operations.

doc/src/sgml/mvcc.sgml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/mvcc.sgml,v 2.56 2006/04/23 03:39:52 momjian Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/mvcc.sgml,v 2.57 2006/08/25 04:06:45 tgl Exp $ -->
22

33
<chapter id="mvcc">
44
<title>Concurrency Control</title>
@@ -622,7 +622,8 @@ SELECT SUM(value) FROM mytab WHERE class = 2;
622622
</para>
623623

624624
<para>
625-
Acquired by <command>VACUUM</command> (without <option>FULL</option>).
625+
Acquired by <command>VACUUM</command> (without <option>FULL</option>)
626+
and by <command>CREATE INDEX CONCURRENTLY</>.
626627
</para>
627628
</listitem>
628629
</varlistentry>
@@ -641,7 +642,8 @@ SELECT SUM(value) FROM mytab WHERE class = 2;
641642
</para>
642643

643644
<para>
644-
Acquired by <command>CREATE INDEX</command>.
645+
Acquired by <command>CREATE INDEX</command>
646+
(without <option>CONCURRENTLY</option>).
645647
</para>
646648
</listitem>
647649
</varlistentry>

doc/src/sgml/ref/create_index.sgml

Lines changed: 101 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$PostgreSQL: pgsql/doc/src/sgml/ref/create_index.sgml,v 1.55 2006/07/11 21:05:57 tgl Exp $
2+
$PostgreSQL: pgsql/doc/src/sgml/ref/create_index.sgml,v 1.56 2006/08/25 04:06:45 tgl Exp $
33
PostgreSQL documentation
44
-->
55

@@ -20,7 +20,7 @@ PostgreSQL documentation
2020

2121
<refsynopsisdiv>
2222
<synopsis>
23-
CREATE [ UNIQUE ] INDEX <replaceable class="parameter">name</replaceable> ON <replaceable class="parameter">table</replaceable> [ USING <replaceable class="parameter">method</replaceable> ]
23+
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] <replaceable class="parameter">name</replaceable> ON <replaceable class="parameter">table</replaceable> [ USING <replaceable class="parameter">method</replaceable> ]
2424
( { <replaceable class="parameter">column</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ <replaceable class="parameter">opclass</replaceable> ] [, ...] )
2525
[ WITH ( <replaceable class="PARAMETER">storage_parameter</replaceable> = <replaceable class="PARAMETER">value</replaceable> [, ... ] ) ]
2626
[ TABLESPACE <replaceable class="parameter">tablespace</replaceable> ]
@@ -110,6 +110,21 @@ CREATE [ UNIQUE ] INDEX <replaceable class="parameter">name</replaceable> ON <re
110110
</listitem>
111111
</varlistentry>
112112

113+
<varlistentry>
114+
<term><literal>CONCURRENTLY</literal></term>
115+
<listitem>
116+
<para>
117+
When this option is used, <productname>PostgreSQL</> will build the
118+
index without taking any locks that prevent concurrent inserts,
119+
updates, or deletes on the table; whereas a standard index build
120+
locks out writes (but not reads) on the table until it's done.
121+
There are several caveats to be aware of when using this option
122+
&mdash; see <xref linkend="SQL-CREATEINDEX-CONCURRENTLY"
123+
endterm="SQL-CREATEINDEX-CONCURRENTLY-title">.
124+
</para>
125+
</listitem>
126+
</varlistentry>
127+
113128
<varlistentry>
114129
<term><replaceable class="parameter">name</replaceable></term>
115130
<listitem>
@@ -239,6 +254,82 @@ CREATE [ UNIQUE ] INDEX <replaceable class="parameter">name</replaceable> ON <re
239254
</variablelist>
240255

241256
</refsect2>
257+
258+
<refsect2 id="SQL-CREATEINDEX-CONCURRENTLY">
259+
<title id="SQL-CREATEINDEX-CONCURRENTLY-title">Building Indexes Concurrently</title>
260+
261+
<indexterm zone="SQL-CREATEINDEX-CONCURRENTLY">
262+
<primary>index</primary>
263+
<secondary>building concurrently</secondary>
264+
</indexterm>
265+
266+
<para>
267+
Creating an index for a large table can be a long operation. In large data
268+
warehousing applications it can easily take hours or even days to build
269+
indexes. It's important to understand the impact creating indexes has on a
270+
system.
271+
</para>
272+
273+
<para>
274+
Normally <productname>PostgreSQL</> locks the table to be indexed against
275+
writes and performs the entire index build with a single scan of the
276+
table. Other transactions can still read the table, but if they try to
277+
insert, update, or delete rows in the table they will block until the
278+
index build is finished.
279+
</para>
280+
281+
<para>
282+
<productname>PostgreSQL</> also supports building indexes without locking
283+
out writes. This method is invoked by specifying the
284+
<literal>CONCURRENTLY</> option of <command>CREATE INDEX</>.
285+
When this option is used,
286+
<productname>PostgreSQL</> must perform two scans of the table, and in
287+
addition it must wait for all existing transactions to terminate. Thus
288+
this method requires more total work than a standard index build and takes
289+
significantly longer to complete. However, since it allows normal
290+
operations to continue while the index is built, this method is useful for
291+
adding new indexes in a production environment. Of course, the extra CPU
292+
and I/O load imposed by the index creation may slow other operations.
293+
</para>
294+
295+
<para>
296+
If a problem arises during the second scan of the table, such as a
297+
uniqueness violation in a unique index, the <command>CREATE INDEX</>
298+
command will fail but leave behind an <quote>invalid</> index. This index
299+
will be ignored for querying purposes because it may be incomplete;
300+
however it will still consume update overhead. The recommended recovery
301+
method in such cases is to drop the index and try again to perform
302+
<command>CREATE INDEX CONCURRENTLY</>. (Another possibility is to rebuild
303+
the index with <command>REINDEX</>. However, since <command>REINDEX</>
304+
does not support concurrent builds, this option is unlikely to seem
305+
attractive.)
306+
</para>
307+
308+
<para>
309+
Another caveat when building a unique index concurrently is that the
310+
uniqueness constraint is already being enforced against other transactions
311+
when the second table scan begins. This means that constraint violations
312+
could be reported in other queries prior to the index becoming available
313+
for use, or even in cases where the index build eventually fails. Also,
314+
if a failure does occur in the second scan, the <quote>invalid</> index
315+
continues to enforce its uniqueness constraint afterwards.
316+
</para>
317+
318+
<para>
319+
Concurrent builds of expression indexes and partial indexes are supported.
320+
Errors occurring in the evaluation of these expressions could cause
321+
behavior similar to that described above for unique constraint violations.
322+
</para>
323+
324+
<para>
325+
Regular index builds permit other regular index builds on the
326+
same table to occur in parallel, but only one concurrent index build
327+
can occur on a table at a time. In both cases, no other types of schema
328+
modification on the table are allowed meanwhile. Another difference
329+
is that a regular <command>CREATE INDEX</> command can be performed within
330+
a transaction block, but <command>CREATE INDEX CONCURRENTLY</> cannot.
331+
</para>
332+
</refsect2>
242333
</refsect1>
243334

244335
<refsect1>
@@ -339,15 +430,22 @@ Is this example correct?
339430
To create a GiST index on a point attribute so that we
340431
can efficiently use box operators on the result of the
341432
conversion function:
342-
</para>
343433
<programlisting>
344434
CREATE INDEX pointloc
345435
ON points USING GIST (point2box(location) box_ops);
346436
SELECT * FROM points
347437
WHERE point2box(points.pointloc) = boxes.box;
348438
</programlisting>
439+
</para>
349440
-->
350441

442+
<para>
443+
To create an index without locking out writes to the table:
444+
<programlisting>
445+
CREATE INDEX CONCURRENTLY sales_quantity_index ON sales_table (quantity);
446+
</programlisting>
447+
</para>
448+
351449
</refsect1>
352450

353451
<refsect1>

doc/src/sgml/ref/reindex.sgml

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$PostgreSQL: pgsql/doc/src/sgml/ref/reindex.sgml,v 1.31 2006/07/04 18:07:24 tgl Exp $
2+
$PostgreSQL: pgsql/doc/src/sgml/ref/reindex.sgml,v 1.32 2006/08/25 04:06:45 tgl Exp $
33
PostgreSQL documentation
44
-->
55

@@ -30,7 +30,7 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
3030
<para>
3131
<command>REINDEX</command> rebuilds an index using the data
3232
stored in the index's table, replacing the old copy of the index. There are
33-
three main reasons to use <command>REINDEX</command>:
33+
several scenarios in which to use <command>REINDEX</command>:
3434

3535
<itemizedlist>
3636
<listitem>
@@ -61,6 +61,18 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
6161
for an index, and wish to ensure that the change has taken full effect.
6262
</para>
6363
</listitem>
64+
65+
<listitem>
66+
<para>
67+
An index build with the <literal>CONCURRENTLY</> option failed, leaving
68+
an <quote>invalid</> index. Such indexes are useless but it can be
69+
convenient to use <command>REINDEX</> to rebuild them. Note that
70+
<command>REINDEX</> will not perform a concurrent build. To build the
71+
index without interfering with production you should drop the index and
72+
reissue the <command>CREATE INDEX CONCURRENTLY</> command.
73+
</para>
74+
</listitem>
75+
6476
</itemizedlist>
6577
</para>
6678
</refsect1>

src/backend/access/nbtree/nbtinsert.c

Lines changed: 27 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtinsert.c,v 1.142 2006/07/25 19:13:00 tgl Exp $
11+
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtinsert.c,v 1.143 2006/08/25 04:06:46 tgl Exp $
1212
*
1313
*-------------------------------------------------------------------------
1414
*/
@@ -244,8 +244,33 @@ _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
244244
}
245245

246246
/*
247-
* Otherwise we have a definite conflict.
247+
* Otherwise we have a definite conflict. But before
248+
* complaining, look to see if the tuple we want to insert
249+
* is itself now committed dead --- if so, don't complain.
250+
* This is a waste of time in normal scenarios but we must
251+
* do it to support CREATE INDEX CONCURRENTLY.
248252
*/
253+
htup.t_self = itup->t_tid;
254+
if (heap_fetch(heapRel, SnapshotSelf, &htup, &hbuffer,
255+
false, NULL))
256+
{
257+
/* Normal case --- it's still live */
258+
ReleaseBuffer(hbuffer);
259+
}
260+
else if (htup.t_data != NULL)
261+
{
262+
/*
263+
* It's been deleted, so no error, and no need to
264+
* continue searching
265+
*/
266+
break;
267+
}
268+
else
269+
{
270+
/* couldn't find the tuple?? */
271+
elog(ERROR, "failed to fetch tuple being inserted");
272+
}
273+
249274
ereport(ERROR,
250275
(errcode(ERRCODE_UNIQUE_VIOLATION),
251276
errmsg("duplicate key violates unique constraint \"%s\"",

src/backend/bootstrap/bootparse.y

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
*
1010
*
1111
* IDENTIFICATION
12-
* $PostgreSQL: pgsql/src/backend/bootstrap/bootparse.y,v 1.83 2006/07/31 01:16:36 tgl Exp $
12+
* $PostgreSQL: pgsql/src/backend/bootstrap/bootparse.y,v 1.84 2006/08/25 04:06:46 tgl Exp $
1313
*
1414
*-------------------------------------------------------------------------
1515
*/
@@ -257,7 +257,7 @@ Boot_DeclareIndexStmt:
257257
$10,
258258
NULL, NIL, NIL,
259259
false, false, false,
260-
false, false, true, false);
260+
false, false, true, false, false);
261261
do_end();
262262
}
263263
;
@@ -275,7 +275,7 @@ Boot_DeclareUniqueIndexStmt:
275275
$11,
276276
NULL, NIL, NIL,
277277
true, false, false,
278-
false, false, true, false);
278+
false, false, true, false, false);
279279
do_end();
280280
}
281281
;

0 commit comments

Comments
 (0)