Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 6bd4f40

Browse files
committed
Replace the former method of determining snapshot xmax --- to wit, calling
ReadNewTransactionId from GetSnapshotData --- with a "latestCompletedXid" variable that is updated during transaction commit or abort. Since latestCompletedXid is written only in places that had to lock ProcArrayLock exclusively anyway, and is read only in places that had to lock ProcArrayLock shared anyway, it adds no new locking requirements to the system despite being cluster-wide. Moreover, removing ReadNewTransactionId from snapshot acquisition eliminates the need to take both XidGenLock and ProcArrayLock at the same time. Since XidGenLock is sometimes held across I/O this can be a significant win. Some preliminary benchmarking suggested that this patch has no effect on average throughput but can significantly improve the worst-case transaction times seen in pgbench. Concept by Florian Pflug, implementation by Tom Lane.
1 parent 0a51e70 commit 6bd4f40

File tree

14 files changed

+330
-212
lines changed

14 files changed

+330
-212
lines changed

src/backend/access/transam/README

Lines changed: 42 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
$PostgreSQL: pgsql/src/backend/access/transam/README,v 1.8 2007/09/07 20:59:26 tgl Exp $
1+
$PostgreSQL: pgsql/src/backend/access/transam/README,v 1.9 2007/09/08 20:31:14 tgl Exp $
22

33
The Transaction System
44
----------------------
@@ -238,8 +238,10 @@ reason why this would be bad is that C would see (in the row inserted by A)
238238
earlier changes by B, and it would be inconsistent for C not to see any
239239
of B's changes elsewhere in the database.
240240

241-
Formally, the correctness requirement is "if A sees B as committed,
242-
and B sees C as committed, then A must see C as committed".
241+
Formally, the correctness requirement is "if a snapshot A considers
242+
transaction X as committed, and any of transaction X's snapshots considered
243+
transaction Y as committed, then snapshot A must consider transaction Y as
244+
committed".
243245

244246
What we actually enforce is strict serialization of commits and rollbacks
245247
with snapshot-taking: we do not allow any transaction to exit the set of
@@ -248,42 +250,45 @@ stronger than necessary for consistency, but is relatively simple to
248250
enforce, and it assists with some other issues as explained below.) The
249251
implementation of this is that GetSnapshotData takes the ProcArrayLock in
250252
shared mode (so that multiple backends can take snapshots in parallel),
251-
but xact.c must take the ProcArrayLock in exclusive mode while clearing
252-
MyProc->xid at transaction end (either commit or abort).
253+
but ProcArrayEndTransaction must take the ProcArrayLock in exclusive mode
254+
while clearing MyProc->xid at transaction end (either commit or abort).
253255

254-
GetSnapshotData must in fact acquire ProcArrayLock before it calls
255-
ReadNewTransactionId. Otherwise it would be possible for a transaction A
256-
postdating the xmax to commit, and then an existing transaction B that saw
257-
A as committed to commit, before GetSnapshotData is able to acquire
258-
ProcArrayLock and finish taking its snapshot. This would violate the
259-
consistency requirement, because A would be still running and B not
260-
according to this snapshot.
256+
ProcArrayEndTransaction also holds the lock while advancing the shared
257+
latestCompletedXid variable. This allows GetSnapshotData to use
258+
latestCompletedXid + 1 as xmax for its snapshot: there can be no
259+
transaction >= this xid value that the snapshot needs to consider as
260+
completed.
261261

262262
In short, then, the rule is that no transaction may exit the set of
263-
currently-running transactions between the time we fetch xmax and the time
264-
we finish building our snapshot. However, this restriction only applies
265-
to transactions that have an XID --- read-only transactions can end without
266-
acquiring ProcArrayLock, since they don't affect anyone else's snapshot.
263+
currently-running transactions between the time we fetch latestCompletedXid
264+
and the time we finish building our snapshot. However, this restriction
265+
only applies to transactions that have an XID --- read-only transactions
266+
can end without acquiring ProcArrayLock, since they don't affect anyone
267+
else's snapshot nor latestCompletedXid.
267268

268269
Transaction start, per se, doesn't have any interlocking with these
269270
considerations, since we no longer assign an XID immediately at transaction
270-
start. But when we do decide to allocate an XID, we must require
271-
GetNewTransactionId to store the new XID into the shared ProcArray before
272-
releasing XidGenLock. This ensures that when GetSnapshotData calls
273-
ReadNewTransactionId (which also takes XidGenLock), all active XIDs before
274-
the returned value of nextXid are already present in the ProcArray and
275-
can't be missed by GetSnapshotData. Unfortunately, we can't have
276-
GetNewTransactionId take ProcArrayLock to do this, else it could deadlock
277-
against GetSnapshotData. Therefore, we simply let GetNewTransactionId
278-
store into MyProc->xid without any lock. We are thereby relying on
279-
fetch/store of an XID to be atomic, else other backends might see a
280-
partially-set XID. (NOTE: for multiprocessors that need explicit memory
281-
access fence instructions, this means that acquiring/releasing XidGenLock
282-
is just as necessary as acquiring/releasing ProcArrayLock for
283-
GetSnapshotData to ensure it sees up-to-date xid fields.) This also means
284-
that readers of the ProcArray xid fields must be careful to fetch a value
285-
only once, rather than assume they can read it multiple times and get the
286-
same answer each time.
271+
start. But when we do decide to allocate an XID, GetNewTransactionId must
272+
store the new XID into the shared ProcArray before releasing XidGenLock.
273+
This ensures that all top-level XIDs <= latestCompletedXid are either
274+
present in the ProcArray, or not running anymore. (This guarantee doesn't
275+
apply to subtransaction XIDs, because of the possibility that there's not
276+
room for them in the subxid array; instead we guarantee that they are
277+
present or the overflow flag is set.) If a backend released XidGenLock
278+
before storing its XID into MyProc, then it would be possible for another
279+
backend to allocate and commit a later XID, causing latestCompletedXid to
280+
pass the first backend's XID, before that value became visible in the
281+
ProcArray. That would break GetOldestXmin, as discussed below.
282+
283+
We allow GetNewTransactionId to store the XID into MyProc->xid (or the
284+
subxid array) without taking ProcArrayLock. This was once necessary to
285+
avoid deadlock; while that is no longer the case, it's still beneficial for
286+
performance. We are thereby relying on fetch/store of an XID to be atomic,
287+
else other backends might see a partially-set XID. This also means that
288+
readers of the ProcArray xid fields must be careful to fetch a value only
289+
once, rather than assume they can read it multiple times and get the same
290+
answer each time. (Use volatile-qualified pointers when doing this, to
291+
ensure that the C compiler does exactly what you tell it to.)
287292

288293
Another important activity that uses the shared ProcArray is GetOldestXmin,
289294
which must determine a lower bound for the oldest xmin of any active MVCC
@@ -303,12 +308,10 @@ currently-active XIDs: no xact, in particular not the oldest, can exit
303308
while we hold shared ProcArrayLock. So GetOldestXmin's view of the minimum
304309
active XID will be the same as that of any concurrent GetSnapshotData, and
305310
so it can't produce an overestimate. If there is no active transaction at
306-
all, GetOldestXmin returns the result of ReadNewTransactionId. Note that
307-
two concurrent executions of GetOldestXmin might not see the same result
308-
from ReadNewTransactionId --- but if there is a difference, the intervening
309-
execution(s) of GetNewTransactionId must have stored their XIDs into the
310-
ProcArray, so the later execution of GetOldestXmin will see them and
311-
compute the same global xmin anyway.
311+
all, GetOldestXmin returns latestCompletedXid + 1, which is a lower bound
312+
for the xmin that might be computed by concurrent or later GetSnapshotData
313+
calls. (We know that no XID less than this could be about to appear in
314+
the ProcArray, because of the XidGenLock interlock discussed above.)
312315

313316
GetSnapshotData also performs an oldest-xmin calculation (which had better
314317
match GetOldestXmin's) and stores that into RecentGlobalXmin, which is used

src/backend/access/transam/transam.c

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $PostgreSQL: pgsql/src/backend/access/transam/transam.c,v 1.70 2007/08/01 22:45:07 tgl Exp $
11+
* $PostgreSQL: pgsql/src/backend/access/transam/transam.c,v 1.71 2007/09/08 20:31:14 tgl Exp $
1212
*
1313
* NOTES
1414
* This file contains the high level access-method interface to the
@@ -432,6 +432,33 @@ TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
432432
return (diff >= 0);
433433
}
434434

435+
436+
/*
437+
* TransactionIdLatest --- get latest XID among a main xact and its children
438+
*/
439+
TransactionId
440+
TransactionIdLatest(TransactionId mainxid,
441+
int nxids, const TransactionId *xids)
442+
{
443+
TransactionId result;
444+
445+
/*
446+
* In practice it is highly likely that the xids[] array is sorted, and
447+
* so we could save some cycles by just taking the last child XID, but
448+
* this probably isn't so performance-critical that it's worth depending
449+
* on that assumption. But just to show we're not totally stupid, scan
450+
* the array back-to-front to avoid useless assignments.
451+
*/
452+
result = mainxid;
453+
while (--nxids >= 0)
454+
{
455+
if (TransactionIdPrecedes(result, xids[nxids]))
456+
result = xids[nxids];
457+
}
458+
return result;
459+
}
460+
461+
435462
/*
436463
* TransactionIdGetCommitLSN
437464
*

src/backend/access/transam/twophase.c

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
* Portions Copyright (c) 1994, Regents of the University of California
88
*
99
* IDENTIFICATION
10-
* $PostgreSQL: pgsql/src/backend/access/transam/twophase.c,v 1.34 2007/09/05 20:53:17 tgl Exp $
10+
* $PostgreSQL: pgsql/src/backend/access/transam/twophase.c,v 1.35 2007/09/08 20:31:14 tgl Exp $
1111
*
1212
* NOTES
1313
* Each global transaction is associated with a global transaction
@@ -1127,6 +1127,7 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
11271127
char *buf;
11281128
char *bufptr;
11291129
TwoPhaseFileHeader *hdr;
1130+
TransactionId latestXid;
11301131
TransactionId *children;
11311132
RelFileNode *commitrels;
11321133
RelFileNode *abortrels;
@@ -1162,6 +1163,9 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
11621163
abortrels = (RelFileNode *) bufptr;
11631164
bufptr += MAXALIGN(hdr->nabortrels * sizeof(RelFileNode));
11641165

1166+
/* compute latestXid among all children */
1167+
latestXid = TransactionIdLatest(xid, hdr->nsubxacts, children);
1168+
11651169
/*
11661170
* The order of operations here is critical: make the XLOG entry for
11671171
* commit or abort, then mark the transaction committed or aborted in
@@ -1179,7 +1183,7 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
11791183
hdr->nsubxacts, children,
11801184
hdr->nabortrels, abortrels);
11811185

1182-
ProcArrayRemove(&gxact->proc);
1186+
ProcArrayRemove(&gxact->proc, latestXid);
11831187

11841188
/*
11851189
* In case we fail while running the callbacks, mark the gxact invalid so

src/backend/access/transam/varsup.c

Lines changed: 18 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
* Copyright (c) 2000-2007, PostgreSQL Global Development Group
77
*
88
* IDENTIFICATION
9-
* $PostgreSQL: pgsql/src/backend/access/transam/varsup.c,v 1.78 2007/02/15 23:23:22 alvherre Exp $
9+
* $PostgreSQL: pgsql/src/backend/access/transam/varsup.c,v 1.79 2007/09/08 20:31:14 tgl Exp $
1010
*
1111
*-------------------------------------------------------------------------
1212
*/
@@ -31,7 +31,9 @@ VariableCache ShmemVariableCache = NULL;
3131

3232

3333
/*
34-
* Allocate the next XID for my new transaction.
34+
* Allocate the next XID for my new transaction or subtransaction.
35+
*
36+
* The new XID is also stored into MyProc before returning.
3537
*/
3638
TransactionId
3739
GetNewTransactionId(bool isSubXact)
@@ -43,7 +45,11 @@ GetNewTransactionId(bool isSubXact)
4345
* transaction id.
4446
*/
4547
if (IsBootstrapProcessingMode())
48+
{
49+
Assert(!isSubXact);
50+
MyProc->xid = BootstrapTransactionId;
4651
return BootstrapTransactionId;
52+
}
4753

4854
LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
4955

@@ -112,19 +118,19 @@ GetNewTransactionId(bool isSubXact)
112118
TransactionIdAdvance(ShmemVariableCache->nextXid);
113119

114120
/*
115-
* We must store the new XID into the shared PGPROC array before releasing
116-
* XidGenLock. This ensures that when GetSnapshotData calls
117-
* ReadNewTransactionId, all active XIDs before the returned value of
118-
* nextXid are already present in PGPROC. Else we have a race condition.
121+
* We must store the new XID into the shared ProcArray before releasing
122+
* XidGenLock. This ensures that every active XID older than
123+
* latestCompletedXid is present in the ProcArray, which is essential
124+
* for correct OldestXmin tracking; see src/backend/access/transam/README.
119125
*
120126
* XXX by storing xid into MyProc without acquiring ProcArrayLock, we are
121127
* relying on fetch/store of an xid to be atomic, else other backends
122128
* might see a partially-set xid here. But holding both locks at once
123-
* would be a nasty concurrency hit (and in fact could cause a deadlock
124-
* against GetSnapshotData). So for now, assume atomicity. Note that
125-
* readers of PGPROC xid field should be careful to fetch the value only
126-
* once, rather than assume they can read it multiple times and get the
127-
* same answer each time.
129+
* would be a nasty concurrency hit. So for now, assume atomicity.
130+
*
131+
* Note that readers of PGPROC xid fields should be careful to fetch the
132+
* value only once, rather than assume they can read a value multiple
133+
* times and get the same answer each time.
128134
*
129135
* The same comments apply to the subxact xid count and overflow fields.
130136
*
@@ -138,11 +144,10 @@ GetNewTransactionId(bool isSubXact)
138144
* race-condition window, in that the new XID will not appear as running
139145
* until its parent link has been placed into pg_subtrans. However, that
140146
* will happen before anyone could possibly have a reason to inquire about
141-
* the status of the XID, so it seems OK. (Snapshots taken during this
147+
* the status of the XID, so it seems OK. (Snapshots taken during this
142148
* window *will* include the parent XID, so they will deliver the correct
143149
* answer later on when someone does have a reason to inquire.)
144150
*/
145-
if (MyProc != NULL)
146151
{
147152
/*
148153
* Use volatile pointer to prevent code rearrangement; other backends

0 commit comments

Comments
 (0)