Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit b375974

Browse files
author
Commitfest Bot
committed
[CF 5004] v5 - CREATE INDEX CONCURRENTLY for partitioned tables
This branch was automatically generated by a robot using patches from an email thread registered at: https://commitfest.postgresql.org/patch/5004 The branch will be overwritten each time a new patch version is posted to the thread, and also periodically to check for bitrot caused by changes on the master branch. Patch(es): https://www.postgresql.org/message-id/428008d6-911d-4a14-986f-0fbb74db6fd8@gmail.com Author(s): Justin Pryzby, Ilya Gladyshev
2 parents 65db396 + 361a760 commit b375974

File tree

8 files changed

+520
-78
lines changed

8 files changed

+520
-78
lines changed

doc/src/sgml/ddl.sgml

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4410,14 +4410,12 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
44104410
As mentioned earlier, it is possible to create indexes on partitioned
44114411
tables so that they are applied automatically to the entire hierarchy.
44124412
This can be very convenient as not only will all existing partitions be
4413-
indexed, but any future partitions will be as well. However, one
4414-
limitation when creating new indexes on partitioned tables is that it
4415-
is not possible to use the <literal>CONCURRENTLY</literal>
4416-
qualifier, which could lead to long lock times. To avoid this, you can
4417-
use <command>CREATE INDEX ON ONLY</command> the partitioned table, which
4413+
indexed, but any future partitions will be as well. For more control over
4414+
locking of the partitions you can use <command>CREATE INDEX ON ONLY</command>
4415+
on the partitioned table, which
44184416
creates the new index marked as invalid, preventing automatic application
44194417
to existing partitions. Instead, indexes can then be created individually
4420-
on each partition using <literal>CONCURRENTLY</literal> and
4418+
on each partition and
44214419
<firstterm>attached</firstterm> to the partitioned index on the parent
44224420
using <command>ALTER INDEX ... ATTACH PARTITION</command>. Once indexes for
44234421
all the partitions are attached to the parent index, the parent index will

doc/src/sgml/ref/create_index.sgml

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -645,7 +645,10 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
645645
<para>
646646
If a problem arises while scanning the table, such as a deadlock or a
647647
uniqueness violation in a unique index, the <command>CREATE INDEX</command>
648-
command will fail but leave behind an <quote>invalid</quote> index. This index
648+
command will fail but leave behind an <quote>invalid</quote> index.
649+
If this happens while build an index concurrently on a partitioned
650+
table, the command can also leave behind <quote>valid</quote> or
651+
<quote>invalid</quote> indexes on table partitions. The invalid index
649652
will be ignored for querying purposes because it might be incomplete;
650653
however it will still consume update overhead. The <application>psql</application>
651654
<command>\d</command> command will report such an index as <literal>INVALID</literal>:
@@ -692,15 +695,6 @@ Indexes:
692695
cannot.
693696
</para>
694697

695-
<para>
696-
Concurrent builds for indexes on partitioned tables are currently not
697-
supported. However, you may concurrently build the index on each
698-
partition individually and then finally create the partitioned index
699-
non-concurrently in order to reduce the time where writes to the
700-
partitioned table will be locked out. In this case, building the
701-
partitioned index is a metadata only operation.
702-
</para>
703-
704698
</refsect2>
705699
</refsect1>
706700

src/backend/commands/indexcmds.c

Lines changed: 173 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,11 @@ static char *ChooseIndexName(const char *tabname, Oid namespaceId,
9898
bool primary, bool isconstraint);
9999
static char *ChooseIndexNameAddition(const List *colnames);
100100
static List *ChooseIndexColumnNames(const List *indexElems);
101+
static void DefineIndexConcurrentInternal(Oid relationId,
102+
Oid indexRelationId,
103+
IndexInfo *indexInfo,
104+
LOCKTAG heaplocktag,
105+
LockRelId heaprelid);
101106
static void ReindexIndex(const ReindexStmt *stmt, const ReindexParams *params,
102107
bool isTopLevel);
103108
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -573,20 +578,17 @@ DefineIndex(Oid tableId,
573578
amoptions_function amoptions;
574579
bool exclusion;
575580
bool partitioned;
576-
bool safe_index;
577581
Datum reloptions;
578582
int16 *coloptions;
579583
IndexInfo *indexInfo;
580584
bits16 flags;
581585
bits16 constr_flags;
582586
int numberOfAttributes;
583587
int numberOfKeyAttributes;
584-
TransactionId limitXmin;
585588
ObjectAddress address;
586589
LockRelId heaprelid;
587590
LOCKTAG heaplocktag;
588591
LOCKMODE lockmode;
589-
Snapshot snapshot;
590592
Oid root_save_userid;
591593
int root_save_sec_context;
592594
int root_save_nestlevel;
@@ -724,20 +726,6 @@ DefineIndex(Oid tableId,
724726
* partition.
725727
*/
726728
partitioned = rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE;
727-
if (partitioned)
728-
{
729-
/*
730-
* Note: we check 'stmt->concurrent' rather than 'concurrent', so that
731-
* the error is thrown also for temporary tables. Seems better to be
732-
* consistent, even though we could do it on temporary table because
733-
* we're not actually doing it concurrently.
734-
*/
735-
if (stmt->concurrent)
736-
ereport(ERROR,
737-
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
738-
errmsg("cannot create index on partitioned table \"%s\" concurrently",
739-
RelationGetRelationName(rel))));
740-
}
741729

742730
/*
743731
* Don't try to CREATE INDEX on temp tables of other backends.
@@ -1172,10 +1160,6 @@ DefineIndex(Oid tableId,
11721160
}
11731161
}
11741162

1175-
/* Is index safe for others to ignore? See set_indexsafe_procflags() */
1176-
safe_index = indexInfo->ii_Expressions == NIL &&
1177-
indexInfo->ii_Predicate == NIL;
1178-
11791163
/*
11801164
* Report index creation if appropriate (delay this till after most of the
11811165
* error checks)
@@ -1240,6 +1224,11 @@ DefineIndex(Oid tableId,
12401224
if (pd->nparts != 0)
12411225
flags |= INDEX_CREATE_INVALID;
12421226
}
1227+
else if (concurrent && OidIsValid(parentIndexId))
1228+
{
1229+
/* If concurrent, initially build index partitions as "invalid" */
1230+
flags |= INDEX_CREATE_INVALID;
1231+
}
12431232

12441233
if (stmt->deferrable)
12451234
constr_flags |= INDEX_CONSTR_CREATE_DEFERRABLE;
@@ -1557,21 +1546,7 @@ DefineIndex(Oid tableId,
15571546
*/
15581547
if (invalidate_parent)
15591548
{
1560-
Relation pg_index = table_open(IndexRelationId, RowExclusiveLock);
1561-
HeapTuple tup,
1562-
newtup;
1563-
1564-
tup = SearchSysCache1(INDEXRELID,
1565-
ObjectIdGetDatum(indexRelationId));
1566-
if (!HeapTupleIsValid(tup))
1567-
elog(ERROR, "cache lookup failed for index %u",
1568-
indexRelationId);
1569-
newtup = heap_copytuple(tup);
1570-
((Form_pg_index) GETSTRUCT(newtup))->indisvalid = false;
1571-
CatalogTupleUpdate(pg_index, &tup->t_self, newtup);
1572-
ReleaseSysCache(tup);
1573-
table_close(pg_index, RowExclusiveLock);
1574-
heap_freetuple(newtup);
1549+
index_set_state_flags(indexRelationId, INDEX_DROP_CLEAR_VALID);
15751550

15761551
/*
15771552
* CCI here to make this update visible, in case this recurses
@@ -1583,37 +1558,49 @@ DefineIndex(Oid tableId,
15831558

15841559
/*
15851560
* Indexes on partitioned tables are not themselves built, so we're
1586-
* done here.
1561+
* done here in the non-concurrent case.
15871562
*/
1588-
AtEOXact_GUC(false, root_save_nestlevel);
1589-
SetUserIdAndSecContext(root_save_userid, root_save_sec_context);
1590-
table_close(rel, NoLock);
1591-
if (!OidIsValid(parentIndexId))
1592-
pgstat_progress_end_command();
1593-
else
1563+
if (!concurrent)
15941564
{
1595-
/* Update progress for an intermediate partitioned index itself */
1596-
pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1);
1597-
}
1565+
AtEOXact_GUC(false, root_save_nestlevel);
1566+
SetUserIdAndSecContext(root_save_userid, root_save_sec_context);
1567+
table_close(rel, NoLock);
15981568

1599-
return address;
1569+
if (!OidIsValid(parentIndexId))
1570+
pgstat_progress_end_command();
1571+
else
1572+
{
1573+
/*
1574+
* Update progress for an intermediate partitioned index
1575+
* itself
1576+
*/
1577+
pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1);
1578+
}
1579+
1580+
return address;
1581+
}
16001582
}
16011583

16021584
AtEOXact_GUC(false, root_save_nestlevel);
16031585
SetUserIdAndSecContext(root_save_userid, root_save_sec_context);
16041586

1605-
if (!concurrent)
1587+
/*
1588+
* All done in the non-concurrent case, and when building catalog entries
1589+
* of partitions for CIC.
1590+
*/
1591+
if (!concurrent || OidIsValid(parentIndexId))
16061592
{
1607-
/* Close the heap and we're done, in the non-concurrent case */
16081593
table_close(rel, NoLock);
16091594

16101595
/*
16111596
* If this is the top-level index, the command is done overall;
1612-
* otherwise, increment progress to report one child index is done.
1597+
* otherwise (when being called recursively), increment progress to
1598+
* report that one child index is done. Except in the concurrent
1599+
* (catalog-only) case, which is handled later.
16131600
*/
16141601
if (!OidIsValid(parentIndexId))
16151602
pgstat_progress_end_command();
1616-
else
1603+
else if (!concurrent)
16171604
pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1);
16181605

16191606
return address;
@@ -1624,6 +1611,141 @@ DefineIndex(Oid tableId,
16241611
SET_LOCKTAG_RELATION(heaplocktag, heaprelid.dbId, heaprelid.relId);
16251612
table_close(rel, NoLock);
16261613

1614+
if (!partitioned)
1615+
{
1616+
/* CREATE INDEX CONCURRENTLY on a nonpartitioned table */
1617+
DefineIndexConcurrentInternal(tableId, indexRelationId,
1618+
indexInfo, heaplocktag, heaprelid);
1619+
pgstat_progress_end_command();
1620+
return address;
1621+
}
1622+
else
1623+
{
1624+
/*
1625+
* For CIC on a partitioned table, finish by building indexes on
1626+
* partitions
1627+
*/
1628+
1629+
ListCell *lc;
1630+
List *childs;
1631+
List *tosetvalid = NIL;
1632+
MemoryContext cic_context,
1633+
old_context;
1634+
1635+
/* Create special memory context for cross-transaction storage */
1636+
cic_context = AllocSetContextCreate(PortalContext,
1637+
"Create index concurrently",
1638+
ALLOCSET_DEFAULT_SIZES);
1639+
1640+
old_context = MemoryContextSwitchTo(cic_context);
1641+
childs = find_all_inheritors(indexRelationId, ShareUpdateExclusiveLock, NULL);
1642+
MemoryContextSwitchTo(old_context);
1643+
1644+
foreach(lc, childs)
1645+
{
1646+
Oid indrelid = lfirst_oid(lc);
1647+
Oid tabrelid;
1648+
char relkind;
1649+
1650+
/*
1651+
* Partition could have been dropped, since we looked it up. In
1652+
* this case consider it done and go to the next one.
1653+
*/
1654+
tabrelid = IndexGetRelation(indrelid, true);
1655+
if (!tabrelid)
1656+
{
1657+
pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1);
1658+
continue;
1659+
}
1660+
rel = try_table_open(tabrelid, ShareUpdateExclusiveLock);
1661+
if (!rel)
1662+
{
1663+
pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1);
1664+
continue;
1665+
}
1666+
1667+
/*
1668+
* Pre-existing partitions which were ATTACHED were already
1669+
* counted in the progress report.
1670+
*/
1671+
if (get_index_isvalid(indrelid))
1672+
{
1673+
table_close(rel, ShareUpdateExclusiveLock);
1674+
continue;
1675+
}
1676+
1677+
/*
1678+
* Partitioned indexes are counted in the progress report, but
1679+
* don't need to be further processed.
1680+
*/
1681+
relkind = get_rel_relkind(indrelid);
1682+
if (!RELKIND_HAS_STORAGE(relkind))
1683+
{
1684+
/* The toplevel index doesn't count towards "partitions done" */
1685+
if (indrelid != indexRelationId)
1686+
pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1);
1687+
1688+
/*
1689+
* Build up a list of all the intermediate partitioned tables
1690+
* which will later need to be set valid.
1691+
*/
1692+
old_context = MemoryContextSwitchTo(cic_context);
1693+
tosetvalid = lappend_oid(tosetvalid, indrelid);
1694+
MemoryContextSwitchTo(old_context);
1695+
table_close(rel, ShareUpdateExclusiveLock);
1696+
continue;
1697+
}
1698+
1699+
heaprelid = rel->rd_lockInfo.lockRelId;
1700+
1701+
/*
1702+
* Close the table but retain the lock, that should be extended to
1703+
* session level in DefineIndexConcurrentInternal.
1704+
*/
1705+
table_close(rel, NoLock);
1706+
SET_LOCKTAG_RELATION(heaplocktag, heaprelid.dbId, heaprelid.relId);
1707+
1708+
/* Process each partition in a separate transaction */
1709+
DefineIndexConcurrentInternal(tabrelid, indrelid, indexInfo,
1710+
heaplocktag, heaprelid);
1711+
1712+
PushActiveSnapshot(GetTransactionSnapshot());
1713+
pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1);
1714+
}
1715+
1716+
/* Set as valid all partitioned indexes, including the parent */
1717+
foreach(lc, tosetvalid)
1718+
{
1719+
Oid indrelid = lfirst_oid(lc);
1720+
Relation indrel = try_index_open(indrelid, ShareUpdateExclusiveLock);
1721+
1722+
if (!indrel)
1723+
continue;
1724+
index_set_state_flags(indrelid, INDEX_CREATE_SET_READY);
1725+
CommandCounterIncrement();
1726+
index_set_state_flags(indrelid, INDEX_CREATE_SET_VALID);
1727+
index_close(indrel, ShareUpdateExclusiveLock);
1728+
}
1729+
1730+
MemoryContextDelete(cic_context);
1731+
pgstat_progress_end_command();
1732+
PopActiveSnapshot();
1733+
return address;
1734+
}
1735+
}
1736+
1737+
1738+
static void
1739+
DefineIndexConcurrentInternal(Oid tableId, Oid indexRelationId, IndexInfo *indexInfo,
1740+
LOCKTAG heaplocktag, LockRelId heaprelid)
1741+
{
1742+
TransactionId limitXmin;
1743+
Snapshot snapshot;
1744+
1745+
/* Is index safe for others to ignore? See set_indexsafe_procflags() */
1746+
bool safe_index = indexInfo->ii_Expressions == NIL &&
1747+
indexInfo->ii_Predicate == NIL;
1748+
16271749
/*
16281750
* For a concurrent build, it's important to make the catalog entries
16291751
* visible to other transactions before we start to build the index. That
@@ -1827,10 +1949,6 @@ DefineIndex(Oid tableId,
18271949
* Last thing to do is release the session-level lock on the parent table.
18281950
*/
18291951
UnlockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
1830-
1831-
pgstat_progress_end_command();
1832-
1833-
return address;
18341952
}
18351953

18361954

0 commit comments

Comments
 (0)