Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 2834855

Browse files
committed
Fix BRIN to use SnapshotAny during summarization
For correctness of summarization results, it is critical that the snapshot used during the summarization scan is able to see all tuples that are live to all transactions -- including tuples inserted or deleted by in-progress transactions. Otherwise, it would be possible for a transaction to insert a tuple, then idle for a long time while a concurrent transaction executes summarization of the range: this would result in the inserted value not being considered in the summary. Previously we were trying to use a MVCC snapshot in conjunction with adding a "placeholder" tuple in the index: the snapshot would see all committed tuples, and the placeholder tuple would catch insertions by any new inserters. The hole is that prior insertions by transactions that are still in progress by the time the MVCC snapshot was taken were ignored. Kevin Grittner reported this as a bogus error message during vacuum with default transaction isolation mode set to repeatable read (because the error report mentioned a function name not being invoked during), but the problem is larger than that. To fix, tweak IndexBuildHeapRangeScan to have a new mode that behaves the way we need using SnapshotAny visibility rules. This change simplifies the BRIN code a bit, mainly by removing large comments that were mistaken. Instead, rely on the SnapshotAny semantics to provide what it needs. (The business about a placeholder tuple needs to remain: that covers the case that a transaction inserts a a tuple in a page that summarization already scanned.) Discussion: https://www.postgresql.org/message-id/20150731175700.GX2441@postgresql.org In passing, remove a couple of unused declarations from brin.h and reword a comment to be proper English. This part submitted by Kevin Grittner. Backpatch to 9.5, where BRIN was introduced.
1 parent 6af9ee4 commit 2834855

File tree

7 files changed

+129
-35
lines changed

7 files changed

+129
-35
lines changed

src/backend/access/brin/brin.c

+9-32
Original file line numberDiff line numberDiff line change
@@ -696,7 +696,7 @@ brinbuildempty(PG_FUNCTION_ARGS)
696696
*
697697
* XXX we could mark item tuples as "dirty" (when a minimum or maximum heap
698698
* tuple is deleted), meaning the need to re-run summarization on the affected
699-
* range. Need to an extra flag in brintuples for that.
699+
* range. Would need to add an extra flag in brintuples for that.
700700
*/
701701
Datum
702702
brinbulkdelete(PG_FUNCTION_ARGS)
@@ -951,9 +951,13 @@ summarize_range(IndexInfo *indexInfo, BrinBuildState *state, Relation heapRel,
951951
* Execute the partial heap scan covering the heap blocks in the specified
952952
* page range, summarizing the heap tuples in it. This scan stops just
953953
* short of brinbuildCallback creating the new index entry.
954+
*
955+
* Note that it is critical we use the "any visible" mode of
956+
* IndexBuildHeapRangeScan here: otherwise, we would miss tuples inserted
957+
* by transactions that are still in progress, among other corner cases.
954958
*/
955959
state->bs_currRangeStart = heapBlk;
956-
IndexBuildHeapRangeScan(heapRel, state->bs_irel, indexInfo, false,
960+
IndexBuildHeapRangeScan(heapRel, state->bs_irel, indexInfo, false, true,
957961
heapBlk, state->bs_pagesPerRange,
958962
brinbuildCallback, (void *) state);
959963

@@ -1058,36 +1062,6 @@ brinsummarize(Relation index, Relation heapRel, double *numSummarized,
10581062
state = initialize_brin_buildstate(index, revmap,
10591063
pagesPerRange);
10601064
indexInfo = BuildIndexInfo(index);
1061-
1062-
/*
1063-
* We only have ShareUpdateExclusiveLock on the table, and
1064-
* therefore other sessions may insert tuples into the range
1065-
* we're going to scan. This is okay, because we take
1066-
* additional precautions to avoid losing the additional
1067-
* tuples; see comments in summarize_range. Set the
1068-
* concurrent flag, which causes IndexBuildHeapRangeScan to
1069-
* use a snapshot other than SnapshotAny, and silences
1070-
* warnings emitted there.
1071-
*/
1072-
indexInfo->ii_Concurrent = true;
1073-
1074-
/*
1075-
* If using transaction-snapshot mode, it would be possible
1076-
* for another transaction to insert a tuple that's not
1077-
* visible to our snapshot if we have already acquired one,
1078-
* when in snapshot-isolation mode; therefore, disallow this
1079-
* from running in such a transaction unless a snapshot hasn't
1080-
* been acquired yet.
1081-
*
1082-
* This code is called by VACUUM and
1083-
* brin_summarize_new_values. Have the error message mention
1084-
* the latter because VACUUM cannot run in a transaction and
1085-
* thus cannot cause this issue.
1086-
*/
1087-
if (IsolationUsesXactSnapshot() && FirstSnapshotSet)
1088-
ereport(ERROR,
1089-
(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
1090-
errmsg("brin_summarize_new_values() cannot run in a transaction that has already obtained a snapshot")));
10911065
}
10921066
summarize_range(indexInfo, state, heapRel, heapBlk);
10931067

@@ -1111,7 +1085,10 @@ brinsummarize(Relation index, Relation heapRel, double *numSummarized,
11111085
/* free resources */
11121086
brinRevmapTerminate(revmap);
11131087
if (state)
1088+
{
11141089
terminate_brin_buildstate(state);
1090+
pfree(indexInfo);
1091+
}
11151092
}
11161093

11171094
/*

src/backend/catalog/index.c

+35-1
Original file line numberDiff line numberDiff line change
@@ -2161,6 +2161,7 @@ IndexBuildHeapScan(Relation heapRelation,
21612161
{
21622162
return IndexBuildHeapRangeScan(heapRelation, indexRelation,
21632163
indexInfo, allow_sync,
2164+
false,
21642165
0, InvalidBlockNumber,
21652166
callback, callback_state);
21662167
}
@@ -2170,12 +2171,17 @@ IndexBuildHeapScan(Relation heapRelation,
21702171
* number of blocks are scanned. Scan to end-of-rel can be signalled by
21712172
* passing InvalidBlockNumber as numblocks. Note that restricting the range
21722173
* to scan cannot be done when requesting syncscan.
2174+
*
2175+
* When "anyvisible" mode is requested, all tuples visible to any transaction
2176+
* are considered, including those inserted or deleted by transactions that are
2177+
* still in progress.
21732178
*/
21742179
double
21752180
IndexBuildHeapRangeScan(Relation heapRelation,
21762181
Relation indexRelation,
21772182
IndexInfo *indexInfo,
21782183
bool allow_sync,
2184+
bool anyvisible,
21792185
BlockNumber start_blockno,
21802186
BlockNumber numblocks,
21812187
IndexBuildCallback callback,
@@ -2209,6 +2215,12 @@ IndexBuildHeapRangeScan(Relation heapRelation,
22092215
checking_uniqueness = (indexInfo->ii_Unique ||
22102216
indexInfo->ii_ExclusionOps != NULL);
22112217

2218+
/*
2219+
* "Any visible" mode is not compatible with uniqueness checks; make sure
2220+
* only one of those is requested.
2221+
*/
2222+
Assert(!(anyvisible && checking_uniqueness));
2223+
22122224
/*
22132225
* Need an EState for evaluation of index expressions and partial-index
22142226
* predicates. Also a slot to hold the current tuple.
@@ -2236,6 +2248,9 @@ IndexBuildHeapRangeScan(Relation heapRelation,
22362248
{
22372249
snapshot = RegisterSnapshot(GetTransactionSnapshot());
22382250
OldestXmin = InvalidTransactionId; /* not used */
2251+
2252+
/* "any visible" mode is not compatible with this */
2253+
Assert(!anyvisible);
22392254
}
22402255
else
22412256
{
@@ -2363,6 +2378,17 @@ IndexBuildHeapRangeScan(Relation heapRelation,
23632378
break;
23642379
case HEAPTUPLE_INSERT_IN_PROGRESS:
23652380

2381+
/*
2382+
* In "anyvisible" mode, this tuple is visible and we don't
2383+
* need any further checks.
2384+
*/
2385+
if (anyvisible)
2386+
{
2387+
indexIt = true;
2388+
tupleIsAlive = true;
2389+
break;
2390+
}
2391+
23662392
/*
23672393
* Since caller should hold ShareLock or better, normally
23682394
* the only way to see this is if it was inserted earlier
@@ -2409,8 +2435,16 @@ IndexBuildHeapRangeScan(Relation heapRelation,
24092435

24102436
/*
24112437
* As with INSERT_IN_PROGRESS case, this is unexpected
2412-
* unless it's our own deletion or a system catalog.
2438+
* unless it's our own deletion or a system catalog;
2439+
* but in anyvisible mode, this tuple is visible.
24132440
*/
2441+
if (anyvisible)
2442+
{
2443+
indexIt = true;
2444+
tupleIsAlive = false;
2445+
break;
2446+
}
2447+
24142448
xwait = HeapTupleHeaderGetUpdateXid(heapTuple->t_data);
24152449
if (!TransactionIdIsCurrentTransactionId(xwait))
24162450
{

src/include/access/brin.h

-2
Original file line numberDiff line numberDiff line change
@@ -22,15 +22,13 @@ extern Datum brinbuild(PG_FUNCTION_ARGS);
2222
extern Datum brinbuildempty(PG_FUNCTION_ARGS);
2323
extern Datum brininsert(PG_FUNCTION_ARGS);
2424
extern Datum brinbeginscan(PG_FUNCTION_ARGS);
25-
extern Datum bringettuple(PG_FUNCTION_ARGS);
2625
extern Datum bringetbitmap(PG_FUNCTION_ARGS);
2726
extern Datum brinrescan(PG_FUNCTION_ARGS);
2827
extern Datum brinendscan(PG_FUNCTION_ARGS);
2928
extern Datum brinmarkpos(PG_FUNCTION_ARGS);
3029
extern Datum brinrestrpos(PG_FUNCTION_ARGS);
3130
extern Datum brinbulkdelete(PG_FUNCTION_ARGS);
3231
extern Datum brinvacuumcleanup(PG_FUNCTION_ARGS);
33-
extern Datum brincanreturn(PG_FUNCTION_ARGS);
3432
extern Datum brincostestimate(PG_FUNCTION_ARGS);
3533
extern Datum brinoptions(PG_FUNCTION_ARGS);
3634

src/include/catalog/index.h

+1
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,7 @@ extern double IndexBuildHeapRangeScan(Relation heapRelation,
105105
Relation indexRelation,
106106
IndexInfo *indexInfo,
107107
bool allow_sync,
108+
bool anyvisible,
108109
BlockNumber start_blockno,
109110
BlockNumber end_blockno,
110111
IndexBuildCallback callback,
+39
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
Parsed test spec with 2 sessions
2+
3+
starting permutation: s2check s1b s2b s1i s2summ s1c s2c s2check
4+
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
5+
itemoffset blknum attnum allnulls hasnulls placeholder value
6+
7+
1 0 1 f f f {1 .. 1}
8+
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
9+
step s2b: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT 1;
10+
?column?
11+
12+
1
13+
step s1i: INSERT INTO brin_iso VALUES (1000);
14+
step s2summ: SELECT brin_summarize_new_values('brinidx'::regclass);
15+
brin_summarize_new_values
16+
17+
1
18+
step s1c: COMMIT;
19+
step s2c: COMMIT;
20+
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
21+
itemoffset blknum attnum allnulls hasnulls placeholder value
22+
23+
1 0 1 f f f {1 .. 1}
24+
2 1 1 f f f {1 .. 1000}
25+
26+
starting permutation: s2check s1b s1i s2vacuum s1c s2check
27+
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
28+
itemoffset blknum attnum allnulls hasnulls placeholder value
29+
30+
1 0 1 f f f {1 .. 1}
31+
step s1b: BEGIN ISOLATION LEVEL REPEATABLE READ;
32+
step s1i: INSERT INTO brin_iso VALUES (1000);
33+
step s2vacuum: VACUUM brin_iso;
34+
step s1c: COMMIT;
35+
step s2check: SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass);
36+
itemoffset blknum attnum allnulls hasnulls placeholder value
37+
38+
1 0 1 f f f {1 .. 1}
39+
2 1 1 f f f {1 .. 1000}

src/test/isolation/isolation_schedule

+1
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ test: skip-locked
3636
test: skip-locked-2
3737
test: skip-locked-3
3838
test: skip-locked-4
39+
test: brin-1
3940
test: drop-index-concurrently-1
4041
test: alter-table-1
4142
test: alter-table-2

src/test/isolation/specs/brin-1.spec

+44
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# This test verifies that values inserted in transactions still in progress
2+
# are considered during concurrent range summarization (either using the
3+
# brin_summarize_new_values function or regular VACUUM).
4+
5+
setup
6+
{
7+
CREATE TABLE brin_iso (
8+
value int
9+
) WITH (fillfactor=10);
10+
CREATE INDEX brinidx ON brin_iso USING brin (value) WITH (pages_per_range=1);
11+
-- this fills the first page
12+
DO $$
13+
DECLARE curtid tid;
14+
BEGIN
15+
LOOP
16+
INSERT INTO brin_iso VALUES (1) RETURNING ctid INTO curtid;
17+
EXIT WHEN curtid > tid '(1, 0)';
18+
END LOOP;
19+
END;
20+
$$;
21+
CREATE EXTENSION IF NOT EXISTS pageinspect;
22+
}
23+
24+
teardown
25+
{
26+
DROP TABLE brin_iso;
27+
}
28+
29+
session "s1"
30+
step "s1b" { BEGIN ISOLATION LEVEL REPEATABLE READ; }
31+
step "s1i" { INSERT INTO brin_iso VALUES (1000); }
32+
step "s1c" { COMMIT; }
33+
34+
session "s2"
35+
step "s2b" { BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT 1; }
36+
step "s2summ" { SELECT brin_summarize_new_values('brinidx'::regclass); }
37+
step "s2c" { COMMIT; }
38+
39+
step "s2vacuum" { VACUUM brin_iso; }
40+
41+
step "s2check" { SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'::regclass); }
42+
43+
permutation "s2check" "s1b" "s2b" "s1i" "s2summ" "s1c" "s2c" "s2check"
44+
permutation "s2check" "s1b" "s1i" "s2vacuum" "s1c" "s2check"

0 commit comments

Comments
 (0)