Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 55432fe

Browse files
committed
Implement LockBufferForCleanup(), which will allow concurrent VACUUM
to wait until it's safe to remove tuples and compact free space in a shared buffer page. Miscellaneous small code cleanups in bufmgr, too.
1 parent 1e9e5de commit 55432fe

File tree

11 files changed

+418
-254
lines changed

11 files changed

+418
-254
lines changed

src/backend/access/transam/xact.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $Header: /cvsroot/pgsql/src/backend/access/transam/xact.c,v 1.104 2001/06/22 19:16:21 wieck Exp $
11+
* $Header: /cvsroot/pgsql/src/backend/access/transam/xact.c,v 1.105 2001/07/06 21:04:25 tgl Exp $
1212
*
1313
* NOTES
1414
* Transaction aborts can now occur two ways:
@@ -653,7 +653,7 @@ void
653653
RecordTransactionCommit()
654654
{
655655
TransactionId xid;
656-
int leak;
656+
bool leak;
657657

658658
xid = GetCurrentTransactionId();
659659

src/backend/storage/buffer/README

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
$Header: /cvsroot/pgsql/src/backend/storage/buffer/README,v 1.1 2001/07/06 21:04:25 tgl Exp $
2+
3+
Notes about shared buffer access rules
4+
--------------------------------------
5+
6+
There are two separate access control mechanisms for shared disk buffers:
7+
reference counts (a/k/a pin counts) and buffer locks. (Actually, there's
8+
a third level of access control: one must hold the appropriate kind of
9+
lock on a relation before one can legally access any page belonging to
10+
the relation. Relation-level locks are not discussed here.)
11+
12+
Pins: one must "hold a pin on" a buffer (increment its reference count)
13+
before being allowed to do anything at all with it. An unpinned buffer is
14+
subject to being reclaimed and reused for a different page at any instant,
15+
so touching it is unsafe. Typically a pin is acquired via ReadBuffer and
16+
released via WriteBuffer (if one modified the page) or ReleaseBuffer (if not).
17+
It is OK and indeed common for a single backend to pin a page more than
18+
once concurrently; the buffer manager handles this efficiently. It is
19+
considered OK to hold a pin for long intervals --- for example, sequential
20+
scans hold a pin on the current page until done processing all the tuples
21+
on the page, which could be quite a while if the scan is the outer scan of
22+
a join. Similarly, btree index scans hold a pin on the current index page.
23+
This is OK because normal operations never wait for a page's pin count to
24+
drop to zero. (Anything that might need to do such a wait is instead
25+
handled by waiting to obtain the relation-level lock, which is why you'd
26+
better hold one first.) Pins may not be held across transaction
27+
boundaries, however.
28+
29+
Buffer locks: there are two kinds of buffer locks, shared and exclusive,
30+
which act just as you'd expect: multiple backends can hold shared locks on
31+
the same buffer, but an exclusive lock prevents anyone else from holding
32+
either shared or exclusive lock. (These can alternatively be called READ
33+
and WRITE locks.) These locks are short-term: they should not be held for
34+
long. They are implemented as per-buffer spinlocks, so another backend
35+
trying to acquire a competing lock will spin as long as you hold yours!
36+
Buffer locks are acquired and released by LockBuffer(). It will *not* work
37+
for a single backend to try to acquire multiple locks on the same buffer.
38+
One must pin a buffer before trying to lock it.
39+
40+
Buffer access rules:
41+
42+
1. To scan a page for tuples, one must hold a pin and either shared or
43+
exclusive lock. To examine the commit status (XIDs and status bits) of
44+
a tuple in a shared buffer, one must likewise hold a pin and either shared
45+
or exclusive lock.
46+
47+
2. Once one has determined that a tuple is interesting (visible to the
48+
current transaction) one may drop the buffer lock, yet continue to access
49+
the tuple's data for as long as one holds the buffer pin. This is what is
50+
typically done by heap scans, since the tuple returned by heap_fetch
51+
contains a pointer to tuple data in the shared buffer. Therefore the
52+
tuple cannot go away while the pin is held (see rule #5). Its state could
53+
change, but that is assumed not to matter after the initial determination
54+
of visibility is made.
55+
56+
3. To add a tuple or change the xmin/xmax fields of an existing tuple,
57+
one must hold a pin and an exclusive lock on the containing buffer.
58+
This ensures that no one else might see a partially-updated state of the
59+
tuple.
60+
61+
4. It is considered OK to update tuple commit status bits (ie, OR the
62+
values HEAP_XMIN_COMMITTED, HEAP_XMIN_INVALID, HEAP_XMAX_COMMITTED, or
63+
HEAP_XMAX_INVALID into t_infomask) while holding only a shared lock and
64+
pin on a buffer. This is OK because another backend looking at the tuple
65+
at about the same time would OR the same bits into the field, so there
66+
is little or no risk of conflicting update; what's more, if there did
67+
manage to be a conflict it would merely mean that one bit-update would
68+
be lost and need to be done again later. These four bits are only hints
69+
(they cache the results of transaction status lookups in pg_log), so no
70+
great harm is done if they get reset to zero by conflicting updates.
71+
72+
5. To physically remove a tuple or compact free space on a page, one
73+
must hold a pin and an exclusive lock, *and* observe while holding the
74+
exclusive lock that the buffer's shared reference count is one (ie,
75+
no other backend holds a pin). If these conditions are met then no other
76+
backend can perform a page scan until the exclusive lock is dropped, and
77+
no other backend can be holding a reference to an existing tuple that it
78+
might expect to examine again. Note that another backend might pin the
79+
buffer (increment the refcount) while one is performing the cleanup, but
80+
it won't be able to actually examine the page until it acquires shared
81+
or exclusive lock.
82+
83+
84+
As of 7.1, the only operation that removes tuples or compacts free space is
85+
(oldstyle) VACUUM. It does not have to implement rule #5 directly, because
86+
it instead acquires exclusive lock at the relation level, which ensures
87+
indirectly that no one else is accessing pages of the relation at all.
88+
89+
To implement concurrent VACUUM we will need to make it obey rule #5 fully.
90+
To do this, we'll create a new buffer manager operation
91+
LockBufferForCleanup() that gets an exclusive lock and then checks to see
92+
if the shared pin count is currently 1. If not, it releases the exclusive
93+
lock (but not the caller's pin) and waits until signaled by another backend,
94+
whereupon it tries again. The signal will occur when UnpinBuffer
95+
decrements the shared pin count to 1. As indicated above, this operation
96+
might have to wait a good while before it acquires lock, but that shouldn't
97+
matter much for concurrent VACUUM. The current implementation only
98+
supports a single waiter for pin-count-1 on any particular shared buffer.
99+
This is enough for VACUUM's use, since we don't allow multiple VACUUMs
100+
concurrently on a single relation anyway.

src/backend/storage/buffer/buf_init.c

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $Header: /cvsroot/pgsql/src/backend/storage/buffer/buf_init.c,v 1.42 2001/03/22 03:59:44 momjian Exp $
11+
* $Header: /cvsroot/pgsql/src/backend/storage/buffer/buf_init.c,v 1.43 2001/07/06 21:04:25 tgl Exp $
1212
*
1313
*-------------------------------------------------------------------------
1414
*/
@@ -63,7 +63,6 @@ long *PrivateRefCount; /* also used in freelist.c */
6363
bits8 *BufferLocks; /* flag bits showing locks I have set */
6464
BufferTag *BufferTagLastDirtied; /* tag buffer had when last
6565
* dirtied by me */
66-
BufferBlindId *BufferBlindLastDirtied;
6766
bool *BufferDirtiedByMe; /* T if buf has been dirtied in cur xact */
6867

6968

@@ -237,7 +236,6 @@ InitBufferPoolAccess(void)
237236
PrivateRefCount = (long *) calloc(NBuffers, sizeof(long));
238237
BufferLocks = (bits8 *) calloc(NBuffers, sizeof(bits8));
239238
BufferTagLastDirtied = (BufferTag *) calloc(NBuffers, sizeof(BufferTag));
240-
BufferBlindLastDirtied = (BufferBlindId *) calloc(NBuffers, sizeof(BufferBlindId));
241239
BufferDirtiedByMe = (bool *) calloc(NBuffers, sizeof(bool));
242240

243241
/*

0 commit comments

Comments
 (0)