Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit c34787f

Browse files
Harden nbtree page deletion.
Add some additional defensive checks in the second phase of index deletion to detect and report index corruption during VACUUM, and to avoid having VACUUM become stuck in more cases. The code is still not robust in the presence of a circular chain of sibling links, though it's not clear whether that really matters. This is follow-up work to commit 3a01f68. The new defensive checks rely on the assumption that there can be no more than one VACUUM operation running for an index at any given time. Remove an old comment suggesting that multiple concurrent VACUUMs need to be considered here. This concern now seems highly unlikely to have any real validity, since we clearly rely on the same assumption in several other places. For example, there are much more recent comments that appear in the same function (added by commit efada2b) that make the same assumption. Also add a CHECK_FOR_INTERRUPTS() to the relevant code path. Contrary to comments added by commit 3a01f68, it is actually possible to handle interrupts here, at least in the common case where processing takes place at the leaf level. We only hold a pin on leafbuf/target page when stepping right at the leaf level. No backpatch due to the lack of complaints following hardening added to the same area by commit 3a01f68.
1 parent 2f86ab3 commit c34787f

File tree

1 file changed

+38
-26
lines changed

1 file changed

+38
-26
lines changed

src/backend/access/nbtree/nbtpage.c

+38-26
Original file line numberDiff line numberDiff line change
@@ -1978,9 +1978,6 @@ _bt_pagedel(Relation rel, Buffer leafbuf, TransactionId *oldestBtpoXact)
19781978
* Then unlink it from its siblings. Each call to
19791979
* _bt_unlink_halfdead_page unlinks the topmost page from the subtree,
19801980
* making it shallower. Iterate until the leafbuf page is deleted.
1981-
*
1982-
* _bt_unlink_halfdead_page should never fail, since we established
1983-
* that deletion is generally safe in _bt_mark_page_halfdead.
19841981
*/
19851982
rightsib_empty = false;
19861983
Assert(P_ISLEAF(opaque) && P_ISHALFDEAD(opaque));
@@ -1991,7 +1988,15 @@ _bt_pagedel(Relation rel, Buffer leafbuf, TransactionId *oldestBtpoXact)
19911988
&rightsib_empty, oldestBtpoXact,
19921989
&ndeleted))
19931990
{
1994-
/* _bt_unlink_halfdead_page failed, released buffer */
1991+
/*
1992+
* _bt_unlink_halfdead_page should never fail, since we
1993+
* established that deletion is generally safe in
1994+
* _bt_mark_page_halfdead -- index must be corrupt.
1995+
*
1996+
* Note that _bt_unlink_halfdead_page already released the
1997+
* lock and pin on leafbuf for us.
1998+
*/
1999+
Assert(false);
19952000
return ndeleted;
19962001
}
19972002
}
@@ -2355,11 +2360,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
23552360
* So, first lock the leaf page, if it's not the target. Then find and
23562361
* write-lock the current left sibling of the target page. The sibling
23572362
* that was current a moment ago could have split, so we may have to move
2358-
* right. This search could fail if either the sibling or the target page
2359-
* was deleted by someone else meanwhile; if so, give up. (Right now,
2360-
* that should never happen, since page deletion is only done in VACUUM
2361-
* and there shouldn't be multiple VACUUMs concurrently on the same
2362-
* table.)
2363+
* right.
23632364
*/
23642365
if (target != leafblkno)
23652366
_bt_lockbuf(rel, leafbuf, BT_WRITE);
@@ -2370,23 +2371,26 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
23702371
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
23712372
while (P_ISDELETED(opaque) || opaque->btpo_next != target)
23722373
{
2373-
/* step right one page */
2374-
leftsib = opaque->btpo_next;
2375-
_bt_relbuf(rel, lbuf);
2374+
bool leftsibvalid = true;
23762375

23772376
/*
2378-
* It'd be good to check for interrupts here, but it's not easy to
2379-
* do so because a lock is always held. This block isn't
2380-
* frequently reached, so hopefully the consequences of not
2381-
* checking interrupts aren't too bad.
2377+
* Before we follow the link from the page that was the left
2378+
* sibling mere moments ago, validate its right link. This
2379+
* reduces the opportunities for loop to fail to ever make any
2380+
* progress in the presence of index corruption.
2381+
*
2382+
* Note: we rely on the assumption that there can only be one
2383+
* vacuum process running at a time (against the same index).
23822384
*/
2385+
if (P_RIGHTMOST(opaque) || P_ISDELETED(opaque) ||
2386+
leftsib == opaque->btpo_next)
2387+
leftsibvalid = false;
2388+
2389+
leftsib = opaque->btpo_next;
2390+
_bt_relbuf(rel, lbuf);
23832391

2384-
if (leftsib == P_NONE)
2392+
if (!leftsibvalid)
23852393
{
2386-
ereport(LOG,
2387-
(errmsg("no left sibling (concurrent deletion?) of block %u in \"%s\"",
2388-
target,
2389-
RelationGetRelationName(rel))));
23902394
if (target != leafblkno)
23912395
{
23922396
/* we have only a pin on target, but pin+lock on leafbuf */
@@ -2398,8 +2402,20 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
23982402
/* we have only a pin on leafbuf */
23992403
ReleaseBuffer(leafbuf);
24002404
}
2405+
2406+
ereport(LOG,
2407+
(errcode(ERRCODE_INDEX_CORRUPTED),
2408+
errmsg_internal("valid left sibling for deletion target could not be located: "
2409+
"left sibling %u of target %u with leafblkno %u and scanblkno %u in index \"%s\"",
2410+
leftsib, target, leafblkno, scanblkno,
2411+
RelationGetRelationName(rel))));
2412+
24012413
return false;
24022414
}
2415+
2416+
CHECK_FOR_INTERRUPTS();
2417+
2418+
/* step right one page */
24032419
lbuf = _bt_getbuf(rel, leftsib, BT_WRITE);
24042420
page = BufferGetPage(lbuf);
24052421
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
@@ -2408,11 +2424,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
24082424
else
24092425
lbuf = InvalidBuffer;
24102426

2411-
/*
2412-
* Next write-lock the target page itself. It's okay to take a write lock
2413-
* rather than a superexclusive lock, since no scan will stop on an empty
2414-
* page.
2415-
*/
2427+
/* Next write-lock the target page itself */
24162428
_bt_lockbuf(rel, buf, BT_WRITE);
24172429
page = BufferGetPage(buf);
24182430
opaque = (BTPageOpaque) PageGetSpecialPointer(page);

0 commit comments

Comments
 (0)