Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit e89526d

Browse files
committed
In B-tree page deletion, clean up properly after page deletion failure.
In _bt_unlink_halfdead_page(), we might fail to find an immediate left sibling of the target page, perhaps because of corruption of the page sibling links. The code intends to cope with this by just abandoning the deletion attempt; but what actually happens is that it fails outright due to releasing the same buffer lock twice. (And error recovery masks a second problem, which is possible leakage of a pin on another page.) Seems to have been introduced by careless refactoring in commit efada2b. Since there are multiple cases to consider, let's make releasing the buffer lock in the failure case the responsibility of _bt_unlink_halfdead_page() not its caller. Also, avoid fetching the leaf page's left-link again after we've dropped lock on the page. This is probably harmless, but it's not exactly good coding practice. Per report from Kyotaro Horiguchi. Back-patch to 9.4 where the faulty code was introduced. Discussion: <20160803.173116.111915228.horiguchi.kyotaro@lab.ntt.co.jp>
1 parent 69dc5ae commit e89526d

File tree

1 file changed

+23
-4
lines changed

1 file changed

+23
-4
lines changed

src/backend/access/nbtree/nbtpage.c

Lines changed: 23 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1284,7 +1284,7 @@ _bt_pagedel(Relation rel, Buffer buf)
12841284
{
12851285
if (!_bt_unlink_halfdead_page(rel, buf, &rightsib_empty))
12861286
{
1287-
_bt_relbuf(rel, buf);
1287+
/* _bt_unlink_halfdead_page already released buffer */
12881288
return ndeleted;
12891289
}
12901290
ndeleted++;
@@ -1501,6 +1501,11 @@ _bt_mark_page_halfdead(Relation rel, Buffer leafbuf, BTStack stack)
15011501
* Returns 'false' if the page could not be unlinked (shouldn't happen).
15021502
* If the (new) right sibling of the page is empty, *rightsib_empty is set
15031503
* to true.
1504+
*
1505+
* Must hold pin and lock on leafbuf at entry (read or write doesn't matter).
1506+
* On success exit, we'll be holding pin and write lock. On failure exit,
1507+
* we'll release both pin and lock before returning (we define it that way
1508+
* to avoid having to reacquire a lock we already released).
15041509
*/
15051510
static bool
15061511
_bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
@@ -1543,11 +1548,13 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
15431548
/*
15441549
* If the leaf page still has a parent pointing to it (or a chain of
15451550
* parents), we don't unlink the leaf page yet, but the topmost remaining
1546-
* parent in the branch.
1551+
* parent in the branch. Set 'target' and 'buf' to reference the page
1552+
* actually being unlinked.
15471553
*/
15481554
if (ItemPointerIsValid(leafhikey))
15491555
{
15501556
target = ItemPointerGetBlockNumber(leafhikey);
1557+
Assert(target != leafblkno);
15511558

15521559
/* fetch the block number of the topmost parent's left sibling */
15531560
buf = _bt_getbuf(rel, target, BT_READ);
@@ -1567,7 +1574,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
15671574
target = leafblkno;
15681575

15691576
buf = leafbuf;
1570-
leftsib = opaque->btpo_prev;
1577+
leftsib = leafleftsib;
15711578
targetlevel = 0;
15721579
}
15731580

@@ -1598,8 +1605,20 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
15981605
_bt_relbuf(rel, lbuf);
15991606
if (leftsib == P_NONE)
16001607
{
1601-
elog(LOG, "no left sibling (concurrent deletion?) in \"%s\"",
1608+
elog(LOG, "no left sibling (concurrent deletion?) of block %u in \"%s\"",
1609+
target,
16021610
RelationGetRelationName(rel));
1611+
if (target != leafblkno)
1612+
{
1613+
/* we have only a pin on target, but pin+lock on leafbuf */
1614+
ReleaseBuffer(buf);
1615+
_bt_relbuf(rel, leafbuf);
1616+
}
1617+
else
1618+
{
1619+
/* we have only a pin on leafbuf */
1620+
ReleaseBuffer(leafbuf);
1621+
}
16031622
return false;
16041623
}
16051624
lbuf = _bt_getbuf(rel, leftsib, BT_WRITE);

0 commit comments

Comments
 (0)