Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit dc7420c

Browse files
committed
snapshot scalability: Don't compute global horizons while building snapshots.
To make GetSnapshotData() more scalable, it cannot not look at at each proc's xmin: While snapshot contents do not need to change whenever a read-only transaction commits or a snapshot is released, a proc's xmin is modified in those cases. The frequency of xmin modifications leads to, particularly on higher core count systems, many cache misses inside GetSnapshotData(), despite the data underlying a snapshot not changing. That is the most significant source of GetSnapshotData() scaling poorly on larger systems. Without accessing xmins, GetSnapshotData() cannot calculate accurate horizons / thresholds as it has so far. But we don't really have to: The horizons don't actually change that much between GetSnapshotData() calls. Nor are the horizons actually used every time a snapshot is built. The trick this commit introduces is to delay computation of accurate horizons until there use and using horizon boundaries to determine whether accurate horizons need to be computed. The use of RecentGlobal[Data]Xmin to decide whether a row version could be removed has been replaces with new GlobalVisTest* functions. These use two thresholds to determine whether a row can be pruned: 1) definitely_needed, indicating that rows deleted by XIDs >= definitely_needed are definitely still visible. 2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can definitely be removed GetSnapshotData() updates definitely_needed to be the xmin of the computed snapshot. When testing whether a row can be removed (with GlobalVisTestIsRemovableXid()) and the tested XID falls in between the two (i.e. XID >= maybe_needed && XID < definitely_needed) the boundaries can be recomputed to be more accurate. As it is not cheap to compute accurate boundaries, we limit the number of times that happens in short succession. As the boundaries used by GlobalVisTestIsRemovableXid() are never reset (with maybe_needed updated by GetSnapshotData()), it is likely that further test can benefit from an earlier computation of accurate horizons. To avoid regressing performance when old_snapshot_threshold is set (as that requires an accurate horizon to be computed), heap_page_prune_opt() doesn't unconditionally call TransactionIdLimitedForOldSnapshots() anymore. Both the computation of the limited horizon, and the triggering of errors (with SetOldSnapshotThresholdTimestamp()) is now only done when necessary to remove tuples. This commit just removes the accesses to PGXACT->xmin from GetSnapshotData(), but other members of PGXACT residing in the same cache line are accessed. Therefore this in itself does not result in a significant improvement. Subsequent commits will take advantage of the fact that GetSnapshotData() now does not need to access xmins anymore. Note: This contains a workaround in heap_page_prune_opt() to keep the snapshot_too_old tests working. While that workaround is ugly, the tests currently are not meaningful, and it seems best to address them separately. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
1 parent 1f42d35 commit dc7420c

38 files changed

+1462
-566
lines changed

contrib/amcheck/verify_nbtree.c

+4-4
Original file line numberDiff line numberDiff line change
@@ -434,10 +434,10 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
434434
RelationGetRelationName(rel));
435435

436436
/*
437-
* RecentGlobalXmin assertion matches index_getnext_tid(). See note on
438-
* RecentGlobalXmin/B-Tree page deletion.
437+
* This assertion matches the one in index_getnext_tid(). See page
438+
* recycling/"visible to everyone" notes in nbtree README.
439439
*/
440-
Assert(TransactionIdIsValid(RecentGlobalXmin));
440+
Assert(TransactionIdIsValid(RecentXmin));
441441

442442
/*
443443
* Initialize state for entire verification operation
@@ -1581,7 +1581,7 @@ bt_right_page_check_scankey(BtreeCheckState *state)
15811581
* does not occur until no possible index scan could land on the page.
15821582
* Index scans can follow links with nothing more than their snapshot as
15831583
* an interlock and be sure of at least that much. (See page
1584-
* recycling/RecentGlobalXmin notes in nbtree README.)
1584+
* recycling/"visible to everyone" notes in nbtree README.)
15851585
*
15861586
* Furthermore, it's okay if we follow a rightlink and find a half-dead or
15871587
* dead (ignorable) page one or more times. There will either be a

contrib/pg_visibility/pg_visibility.c

+8-10
Original file line numberDiff line numberDiff line change
@@ -563,17 +563,14 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
563563
BufferAccessStrategy bstrategy = GetAccessStrategy(BAS_BULKREAD);
564564
TransactionId OldestXmin = InvalidTransactionId;
565565

566-
if (all_visible)
567-
{
568-
/* Don't pass rel; that will fail in recovery. */
569-
OldestXmin = GetOldestXmin(NULL, PROCARRAY_FLAGS_VACUUM);
570-
}
571-
572566
rel = relation_open(relid, AccessShareLock);
573567

574568
/* Only some relkinds have a visibility map */
575569
check_relation_relkind(rel);
576570

571+
if (all_visible)
572+
OldestXmin = GetOldestNonRemovableTransactionId(rel);
573+
577574
nblocks = RelationGetNumberOfBlocks(rel);
578575

579576
/*
@@ -679,11 +676,12 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
679676
* From a concurrency point of view, it sort of sucks to
680677
* retake ProcArrayLock here while we're holding the buffer
681678
* exclusively locked, but it should be safe against
682-
* deadlocks, because surely GetOldestXmin() should never take
683-
* a buffer lock. And this shouldn't happen often, so it's
684-
* worth being careful so as to avoid false positives.
679+
* deadlocks, because surely
680+
* GetOldestNonRemovableTransactionId() should never take a
681+
* buffer lock. And this shouldn't happen often, so it's worth
682+
* being careful so as to avoid false positives.
685683
*/
686-
RecomputedOldestXmin = GetOldestXmin(NULL, PROCARRAY_FLAGS_VACUUM);
684+
RecomputedOldestXmin = GetOldestNonRemovableTransactionId(rel);
687685

688686
if (!TransactionIdPrecedes(OldestXmin, RecomputedOldestXmin))
689687
record_corrupt_item(items, &tuple.t_self);

contrib/pgstattuple/pgstatapprox.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ statapprox_heap(Relation rel, output_type *stat)
7171
BufferAccessStrategy bstrategy;
7272
TransactionId OldestXmin;
7373

74-
OldestXmin = GetOldestXmin(rel, PROCARRAY_FLAGS_VACUUM);
74+
OldestXmin = GetOldestNonRemovableTransactionId(rel);
7575
bstrategy = GetAccessStrategy(BAS_BULKREAD);
7676

7777
nblocks = RelationGetNumberOfBlocks(rel);

src/backend/access/gin/ginvacuum.c

+26
Original file line numberDiff line numberDiff line change
@@ -793,3 +793,29 @@ ginvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
793793

794794
return stats;
795795
}
796+
797+
/*
798+
* Return whether Page can safely be recycled.
799+
*/
800+
bool
801+
GinPageIsRecyclable(Page page)
802+
{
803+
TransactionId delete_xid;
804+
805+
if (PageIsNew(page))
806+
return true;
807+
808+
if (!GinPageIsDeleted(page))
809+
return false;
810+
811+
delete_xid = GinPageGetDeleteXid(page);
812+
813+
if (!TransactionIdIsValid(delete_xid))
814+
return true;
815+
816+
/*
817+
* If no backend still could view delete_xid as in running, all scans
818+
* concurrent with ginDeletePage() must have finished.
819+
*/
820+
return GlobalVisCheckRemovableXid(NULL, delete_xid);
821+
}

src/backend/access/gist/gistutil.c

+3-5
Original file line numberDiff line numberDiff line change
@@ -891,15 +891,13 @@ gistPageRecyclable(Page page)
891891
* As long as that can happen, we must keep the deleted page around as
892892
* a tombstone.
893893
*
894-
* Compare the deletion XID with RecentGlobalXmin. If deleteXid <
895-
* RecentGlobalXmin, then no scan that's still in progress could have
894+
* For that check if the deletion XID could still be visible to
895+
* anyone. If not, then no scan that's still in progress could have
896896
* seen its downlink, and we can recycle it.
897897
*/
898898
FullTransactionId deletexid_full = GistPageGetDeleteXid(page);
899-
FullTransactionId recentxmin_full = GetFullRecentGlobalXmin();
900899

901-
if (FullTransactionIdPrecedes(deletexid_full, recentxmin_full))
902-
return true;
900+
return GlobalVisIsRemovableFullXid(NULL, deletexid_full);
903901
}
904902
return false;
905903
}

src/backend/access/gist/gistxlog.c

+5-5
Original file line numberDiff line numberDiff line change
@@ -387,11 +387,11 @@ gistRedoPageReuse(XLogReaderState *record)
387387
* PAGE_REUSE records exist to provide a conflict point when we reuse
388388
* pages in the index via the FSM. That's all they do though.
389389
*
390-
* latestRemovedXid was the page's deleteXid. The deleteXid <
391-
* RecentGlobalXmin test in gistPageRecyclable() conceptually mirrors the
392-
* pgxact->xmin > limitXmin test in GetConflictingVirtualXIDs().
393-
* Consequently, one XID value achieves the same exclusion effect on
394-
* primary and standby.
390+
* latestRemovedXid was the page's deleteXid. The
391+
* GlobalVisIsRemovableFullXid(deleteXid) test in gistPageRecyclable()
392+
* conceptually mirrors the pgxact->xmin > limitXmin test in
393+
* GetConflictingVirtualXIDs(). Consequently, one XID value achieves the
394+
* same exclusion effect on primary and standby.
395395
*/
396396
if (InHotStandby)
397397
{

src/backend/access/heap/heapam.c

+11-4
Original file line numberDiff line numberDiff line change
@@ -1517,6 +1517,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
15171517
bool at_chain_start;
15181518
bool valid;
15191519
bool skip;
1520+
GlobalVisState *vistest = NULL;
15201521

15211522
/* If this is not the first call, previous call returned a (live!) tuple */
15221523
if (all_dead)
@@ -1527,7 +1528,8 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
15271528
at_chain_start = first_call;
15281529
skip = !first_call;
15291530

1530-
Assert(TransactionIdIsValid(RecentGlobalXmin));
1531+
/* XXX: we should assert that a snapshot is pushed or registered */
1532+
Assert(TransactionIdIsValid(RecentXmin));
15311533
Assert(BufferGetBlockNumber(buffer) == blkno);
15321534

15331535
/* Scan through possible multiple members of HOT-chain */
@@ -1616,9 +1618,14 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
16161618
* Note: if you change the criterion here for what is "dead", fix the
16171619
* planner's get_actual_variable_range() function to match.
16181620
*/
1619-
if (all_dead && *all_dead &&
1620-
!HeapTupleIsSurelyDead(heapTuple, RecentGlobalXmin))
1621-
*all_dead = false;
1621+
if (all_dead && *all_dead)
1622+
{
1623+
if (!vistest)
1624+
vistest = GlobalVisTestFor(relation);
1625+
1626+
if (!HeapTupleIsSurelyDead(heapTuple, vistest))
1627+
*all_dead = false;
1628+
}
16221629

16231630
/*
16241631
* Check to see if HOT chain continues past this tuple; if so fetch

src/backend/access/heap/heapam_handler.c

+12-12
Original file line numberDiff line numberDiff line change
@@ -1203,7 +1203,7 @@ heapam_index_build_range_scan(Relation heapRelation,
12031203

12041204
/* okay to ignore lazy VACUUMs here */
12051205
if (!IsBootstrapProcessingMode() && !indexInfo->ii_Concurrent)
1206-
OldestXmin = GetOldestXmin(heapRelation, PROCARRAY_FLAGS_VACUUM);
1206+
OldestXmin = GetOldestNonRemovableTransactionId(heapRelation);
12071207

12081208
if (!scan)
12091209
{
@@ -1244,6 +1244,17 @@ heapam_index_build_range_scan(Relation heapRelation,
12441244

12451245
hscan = (HeapScanDesc) scan;
12461246

1247+
/*
1248+
* Must have called GetOldestNonRemovableTransactionId() if using
1249+
* SnapshotAny. Shouldn't have for an MVCC snapshot. (It's especially
1250+
* worth checking this for parallel builds, since ambuild routines that
1251+
* support parallel builds must work these details out for themselves.)
1252+
*/
1253+
Assert(snapshot == SnapshotAny || IsMVCCSnapshot(snapshot));
1254+
Assert(snapshot == SnapshotAny ? TransactionIdIsValid(OldestXmin) :
1255+
!TransactionIdIsValid(OldestXmin));
1256+
Assert(snapshot == SnapshotAny || !anyvisible);
1257+
12471258
/* Publish number of blocks to scan */
12481259
if (progress)
12491260
{
@@ -1263,17 +1274,6 @@ heapam_index_build_range_scan(Relation heapRelation,
12631274
nblocks);
12641275
}
12651276

1266-
/*
1267-
* Must call GetOldestXmin() with SnapshotAny. Should never call
1268-
* GetOldestXmin() with MVCC snapshot. (It's especially worth checking
1269-
* this for parallel builds, since ambuild routines that support parallel
1270-
* builds must work these details out for themselves.)
1271-
*/
1272-
Assert(snapshot == SnapshotAny || IsMVCCSnapshot(snapshot));
1273-
Assert(snapshot == SnapshotAny ? TransactionIdIsValid(OldestXmin) :
1274-
!TransactionIdIsValid(OldestXmin));
1275-
Assert(snapshot == SnapshotAny || !anyvisible);
1276-
12771277
/* set our scan endpoints */
12781278
if (!allow_sync)
12791279
heap_setscanlimits(scan, start_blockno, numblocks);

src/backend/access/heap/heapam_visibility.c

+73-26
Original file line numberDiff line numberDiff line change
@@ -1154,19 +1154,56 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
11541154
* we mainly want to know is if a tuple is potentially visible to *any*
11551155
* running transaction. If so, it can't be removed yet by VACUUM.
11561156
*
1157-
* OldestXmin is a cutoff XID (obtained from GetOldestXmin()). Tuples
1158-
* deleted by XIDs >= OldestXmin are deemed "recently dead"; they might
1159-
* still be visible to some open transaction, so we can't remove them,
1160-
* even if we see that the deleting transaction has committed.
1157+
* OldestXmin is a cutoff XID (obtained from
1158+
* GetOldestNonRemovableTransactionId()). Tuples deleted by XIDs >=
1159+
* OldestXmin are deemed "recently dead"; they might still be visible to some
1160+
* open transaction, so we can't remove them, even if we see that the deleting
1161+
* transaction has committed.
11611162
*/
11621163
HTSV_Result
11631164
HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
11641165
Buffer buffer)
1166+
{
1167+
TransactionId dead_after = InvalidTransactionId;
1168+
HTSV_Result res;
1169+
1170+
res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
1171+
1172+
if (res == HEAPTUPLE_RECENTLY_DEAD)
1173+
{
1174+
Assert(TransactionIdIsValid(dead_after));
1175+
1176+
if (TransactionIdPrecedes(dead_after, OldestXmin))
1177+
res = HEAPTUPLE_DEAD;
1178+
}
1179+
else
1180+
Assert(!TransactionIdIsValid(dead_after));
1181+
1182+
return res;
1183+
}
1184+
1185+
/*
1186+
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
1187+
*
1188+
* In contrast to HeapTupleSatisfiesVacuum this routine, when encountering a
1189+
* tuple that could still be visible to some backend, stores the xid that
1190+
* needs to be compared with the horizon in *dead_after, and returns
1191+
* HEAPTUPLE_RECENTLY_DEAD. The caller then can perform the comparison with
1192+
* the horizon. This is e.g. useful when comparing with different horizons.
1193+
*
1194+
* Note: HEAPTUPLE_DEAD can still be returned here, e.g. if the inserting
1195+
* transaction aborted.
1196+
*/
1197+
HTSV_Result
1198+
HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer, TransactionId *dead_after)
11651199
{
11661200
HeapTupleHeader tuple = htup->t_data;
11671201

11681202
Assert(ItemPointerIsValid(&htup->t_self));
11691203
Assert(htup->t_tableOid != InvalidOid);
1204+
Assert(dead_after != NULL);
1205+
1206+
*dead_after = InvalidTransactionId;
11701207

11711208
/*
11721209
* Has inserting transaction committed?
@@ -1323,17 +1360,15 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
13231360
else if (TransactionIdDidCommit(xmax))
13241361
{
13251362
/*
1326-
* The multixact might still be running due to lockers. If the
1327-
* updater is below the xid horizon, we have to return DEAD
1328-
* regardless -- otherwise we could end up with a tuple where the
1329-
* updater has to be removed due to the horizon, but is not pruned
1330-
* away. It's not a problem to prune that tuple, because any
1331-
* remaining lockers will also be present in newer tuple versions.
1363+
* The multixact might still be running due to lockers. Need to
1364+
* allow for pruning if below the xid horizon regardless --
1365+
* otherwise we could end up with a tuple where the updater has to
1366+
* be removed due to the horizon, but is not pruned away. It's
1367+
* not a problem to prune that tuple, because any remaining
1368+
* lockers will also be present in newer tuple versions.
13321369
*/
1333-
if (!TransactionIdPrecedes(xmax, OldestXmin))
1334-
return HEAPTUPLE_RECENTLY_DEAD;
1335-
1336-
return HEAPTUPLE_DEAD;
1370+
*dead_after = xmax;
1371+
return HEAPTUPLE_RECENTLY_DEAD;
13371372
}
13381373
else if (!MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
13391374
{
@@ -1372,14 +1407,11 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
13721407
}
13731408

13741409
/*
1375-
* Deleter committed, but perhaps it was recent enough that some open
1376-
* transactions could still see the tuple.
1410+
* Deleter committed, allow caller to check if it was recent enough that
1411+
* some open transactions could still see the tuple.
13771412
*/
1378-
if (!TransactionIdPrecedes(HeapTupleHeaderGetRawXmax(tuple), OldestXmin))
1379-
return HEAPTUPLE_RECENTLY_DEAD;
1380-
1381-
/* Otherwise, it's dead and removable */
1382-
return HEAPTUPLE_DEAD;
1413+
*dead_after = HeapTupleHeaderGetRawXmax(tuple);
1414+
return HEAPTUPLE_RECENTLY_DEAD;
13831415
}
13841416

13851417

@@ -1393,14 +1425,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
13931425
*
13941426
* This is an interface to HeapTupleSatisfiesVacuum that's callable via
13951427
* HeapTupleSatisfiesSnapshot, so it can be used through a Snapshot.
1396-
* snapshot->xmin must have been set up with the xmin horizon to use.
1428+
* snapshot->vistest must have been set up with the horizon to use.
13971429
*/
13981430
static bool
13991431
HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
14001432
Buffer buffer)
14011433
{
1402-
return HeapTupleSatisfiesVacuum(htup, snapshot->xmin, buffer)
1403-
!= HEAPTUPLE_DEAD;
1434+
TransactionId dead_after = InvalidTransactionId;
1435+
HTSV_Result res;
1436+
1437+
res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
1438+
1439+
if (res == HEAPTUPLE_RECENTLY_DEAD)
1440+
{
1441+
Assert(TransactionIdIsValid(dead_after));
1442+
1443+
if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
1444+
res = HEAPTUPLE_DEAD;
1445+
}
1446+
else
1447+
Assert(!TransactionIdIsValid(dead_after));
1448+
1449+
return res != HEAPTUPLE_DEAD;
14041450
}
14051451

14061452

@@ -1418,7 +1464,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
14181464
* if the tuple is removable.
14191465
*/
14201466
bool
1421-
HeapTupleIsSurelyDead(HeapTuple htup, TransactionId OldestXmin)
1467+
HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
14221468
{
14231469
HeapTupleHeader tuple = htup->t_data;
14241470

@@ -1459,7 +1505,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, TransactionId OldestXmin)
14591505
return false;
14601506

14611507
/* Deleter committed, so tuple is dead if the XID is old enough. */
1462-
return TransactionIdPrecedes(HeapTupleHeaderGetRawXmax(tuple), OldestXmin);
1508+
return GlobalVisTestIsRemovableXid(vistest,
1509+
HeapTupleHeaderGetRawXmax(tuple));
14631510
}
14641511

14651512
/*

0 commit comments

Comments
 (0)