Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit ffafacc

Browse files
committed
Repair potential deadlock created by recent changes to recycle btree
index pages: when _bt_getbuf asks the FSM for a free index page, it is possible (and, in some cases, even moderately likely) that the answer will be the same page that _bt_split is trying to split. _bt_getbuf already knew that the returned page might not be free, but it wasn't prepared for the possibility that even trying to lock the page could be problematic. Fix by doing a conditional rather than unconditional grab of the page lock.
1 parent 18c1087 commit ffafacc

File tree

3 files changed

+67
-11
lines changed

3 files changed

+67
-11
lines changed

src/backend/access/nbtree/nbtpage.c

+33-9
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
*
1010
*
1111
* IDENTIFICATION
12-
* $Header: /cvsroot/pgsql/src/backend/access/nbtree/nbtpage.c,v 1.69 2003/08/08 21:41:27 momjian Exp $
12+
* $Header: /cvsroot/pgsql/src/backend/access/nbtree/nbtpage.c,v 1.70 2003/08/10 19:48:08 tgl Exp $
1313
*
1414
* NOTES
1515
* Postgres btree pages look like ordinary relation pages. The opaque
@@ -409,23 +409,47 @@ _bt_getbuf(Relation rel, BlockNumber blkno, int access)
409409
* that the page is still free. (For example, an already-free
410410
* page could have been re-used between the time the last VACUUM
411411
* scanned it and the time the VACUUM made its FSM updates.)
412+
*
413+
* In fact, it's worse than that: we can't even assume that it's
414+
* safe to take a lock on the reported page. If somebody else
415+
* has a lock on it, or even worse our own caller does, we could
416+
* deadlock. (The own-caller scenario is actually not improbable.
417+
* Consider an index on a serial or timestamp column. Nearly all
418+
* splits will be at the rightmost page, so it's entirely likely
419+
* that _bt_split will call us while holding a lock on the page most
420+
* recently acquired from FSM. A VACUUM running concurrently with
421+
* the previous split could well have placed that page back in FSM.)
422+
*
423+
* To get around that, we ask for only a conditional lock on the
424+
* reported page. If we fail, then someone else is using the page,
425+
* and we may reasonably assume it's not free. (If we happen to be
426+
* wrong, the worst consequence is the page will be lost to use till
427+
* the next VACUUM, which is no big problem.)
412428
*/
413429
for (;;)
414430
{
415431
blkno = GetFreeIndexPage(&rel->rd_node);
416432
if (blkno == InvalidBlockNumber)
417433
break;
418434
buf = ReadBuffer(rel, blkno);
419-
LockBuffer(buf, access);
420-
page = BufferGetPage(buf);
421-
if (_bt_page_recyclable(page))
435+
if (ConditionalLockBuffer(buf))
422436
{
423-
/* Okay to use page. Re-initialize and return it */
424-
_bt_pageinit(page, BufferGetPageSize(buf));
425-
return buf;
437+
page = BufferGetPage(buf);
438+
if (_bt_page_recyclable(page))
439+
{
440+
/* Okay to use page. Re-initialize and return it */
441+
_bt_pageinit(page, BufferGetPageSize(buf));
442+
return buf;
443+
}
444+
elog(DEBUG2, "FSM returned nonrecyclable page");
445+
_bt_relbuf(rel, buf);
446+
}
447+
else
448+
{
449+
elog(DEBUG2, "FSM returned nonlockable page");
450+
/* couldn't get lock, so just drop pin */
451+
ReleaseBuffer(buf);
426452
}
427-
elog(DEBUG2, "FSM returned nonrecyclable page");
428-
_bt_relbuf(rel, buf);
429453
}
430454

431455
/*

src/backend/storage/buffer/bufmgr.c

+32-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $Header: /cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v 1.139 2003/08/04 02:40:03 momjian Exp $
11+
* $Header: /cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v 1.140 2003/08/10 19:48:08 tgl Exp $
1212
*
1313
*-------------------------------------------------------------------------
1414
*/
@@ -1936,6 +1936,37 @@ LockBuffer(Buffer buffer, int mode)
19361936
elog(ERROR, "unrecognized buffer lock mode: %d", mode);
19371937
}
19381938

1939+
/*
1940+
* Acquire the cntx_lock for the buffer, but only if we don't have to wait.
1941+
*
1942+
* This assumes the caller wants BUFFER_LOCK_EXCLUSIVE mode.
1943+
*/
1944+
bool
1945+
ConditionalLockBuffer(Buffer buffer)
1946+
{
1947+
BufferDesc *buf;
1948+
1949+
Assert(BufferIsValid(buffer));
1950+
if (BufferIsLocal(buffer))
1951+
return true; /* act as though we got it */
1952+
1953+
buf = &(BufferDescriptors[buffer - 1]);
1954+
1955+
if (LWLockConditionalAcquire(buf->cntx_lock, LW_EXCLUSIVE))
1956+
{
1957+
/*
1958+
* This is not the best place to set cntxDirty flag (eg indices do
1959+
* not always change buffer they lock in excl mode). But please
1960+
* remember that it's critical to set cntxDirty *before* logging
1961+
* changes with XLogInsert() - see comments in BufferSync().
1962+
*/
1963+
buf->cntxDirty = true;
1964+
1965+
return true;
1966+
}
1967+
return false;
1968+
}
1969+
19391970
/*
19401971
* LockBufferForCleanup - lock a buffer in preparation for deleting items
19411972
*

src/include/storage/bufmgr.h

+2-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
* Portions Copyright (c) 1996-2003, PostgreSQL Global Development Group
88
* Portions Copyright (c) 1994, Regents of the University of California
99
*
10-
* $Id: bufmgr.h,v 1.69 2003/08/04 02:40:14 momjian Exp $
10+
* $Id: bufmgr.h,v 1.70 2003/08/10 19:48:08 tgl Exp $
1111
*
1212
*-------------------------------------------------------------------------
1313
*/
@@ -180,6 +180,7 @@ extern void SetBufferCommitInfoNeedsSave(Buffer buffer);
180180

181181
extern void UnlockBuffers(void);
182182
extern void LockBuffer(Buffer buffer, int mode);
183+
extern bool ConditionalLockBuffer(Buffer buffer);
183184
extern void LockBufferForCleanup(Buffer buffer);
184185

185186
extern void AbortBufferIO(void);

0 commit comments

Comments
 (0)