Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit ed5c4e4

Browse files
committed
Improve documentation about reasoning behind the order of operations
in GetSnapshotData, GetNewTransactionId, CommitTransaction, AbortTransaction, etc. Correct race condition in transaction status testing in HeapTupleSatisfiesVacuum --- this wasn't important for old VACUUM with exclusive lock on its table, but it sure is important now. All per pghackers discussion 7/11/01 and 7/12/01.
1 parent ffbd97c commit ed5c4e4

File tree

4 files changed

+110
-49
lines changed

4 files changed

+110
-49
lines changed

src/backend/access/transam/varsup.c

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
* Copyright (c) 2000, PostgreSQL Global Development Group
77
*
88
* IDENTIFICATION
9-
* $Header: /cvsroot/pgsql/src/backend/access/transam/varsup.c,v 1.41 2001/07/12 04:11:13 tgl Exp $
9+
* $Header: /cvsroot/pgsql/src/backend/access/transam/varsup.c,v 1.42 2001/07/16 22:43:33 tgl Exp $
1010
*
1111
*-------------------------------------------------------------------------
1212
*/
@@ -60,8 +60,16 @@ GetNewTransactionId(TransactionId *xid)
6060
* XXX by storing xid into MyProc without acquiring SInvalLock, we are
6161
* relying on fetch/store of an xid to be atomic, else other backends
6262
* might see a partially-set xid here. But holding both locks at once
63-
* would be a nasty concurrency hit (and at this writing, could cause a
64-
* deadlock against GetSnapshotData). So for now, assume atomicity.
63+
* would be a nasty concurrency hit (and in fact could cause a deadlock
64+
* against GetSnapshotData). So for now, assume atomicity. Note that
65+
* readers of PROC xid field should be careful to fetch the value only
66+
* once, rather than assume they can read it multiple times and get the
67+
* same answer each time.
68+
*
69+
* A solution to the atomic-store problem would be to give each PROC
70+
* its own spinlock used only for fetching/storing that PROC's xid.
71+
* (SInvalLock would then mean primarily that PROCs couldn't be added/
72+
* removed while holding the lock.)
6573
*/
6674
if (MyProc != (PROC *) NULL)
6775
MyProc->xid = *xid;

src/backend/access/transam/xact.c

Lines changed: 32 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $Header: /cvsroot/pgsql/src/backend/access/transam/xact.c,v 1.107 2001/07/15 22:48:16 tgl Exp $
11+
* $Header: /cvsroot/pgsql/src/backend/access/transam/xact.c,v 1.108 2001/07/16 22:43:33 tgl Exp $
1212
*
1313
* NOTES
1414
* Transaction aborts can now occur two ways:
@@ -1018,16 +1018,21 @@ CommitTransaction(void)
10181018

10191019
CloseSequences();
10201020
AtEOXact_portals();
1021+
1022+
/* Here is where we really truly commit. */
10211023
RecordTransactionCommit();
10221024

10231025
/*
10241026
* Let others know about no transaction in progress by me. Note that
1025-
* this must be done _before_ releasing locks we hold and
1027+
* this must be done _before_ releasing locks we hold and _after_
1028+
* RecordTransactionCommit.
1029+
*
10261030
* SpinAcquire(SInvalLock) is required: UPDATE with xid 0 is blocked
10271031
* by xid 1' UPDATE, xid 1 is doing commit while xid 2 gets snapshot -
10281032
* if xid 2' GetSnapshotData sees xid 1 as running then it must see
10291033
* xid 0 as running as well or it will see two tuple versions - one
1030-
* deleted by xid 1 and one inserted by xid 0.
1034+
* deleted by xid 1 and one inserted by xid 0. See notes in
1035+
* GetSnapshotData.
10311036
*/
10321037
if (MyProc != (PROC *) NULL)
10331038
{
@@ -1038,6 +1043,12 @@ CommitTransaction(void)
10381043
SpinRelease(SInvalLock);
10391044
}
10401045

1046+
/*
1047+
* This is all post-commit cleanup. Note that if an error is raised
1048+
* here, it's too late to abort the transaction. This should be just
1049+
* noncritical resource releasing.
1050+
*/
1051+
10411052
RelationPurgeLocalRelation(true);
10421053
AtEOXact_temp_relations(true);
10431054
smgrDoPendingDeletes(true);
@@ -1080,23 +1091,6 @@ AbortTransaction(void)
10801091
/* Prevent cancel/die interrupt while cleaning up */
10811092
HOLD_INTERRUPTS();
10821093

1083-
/*
1084-
* Let others to know about no transaction in progress - vadim
1085-
* 11/26/96
1086-
*
1087-
* XXX it'd be nice to acquire SInvalLock for this, but too much risk of
1088-
* lockup: what if we were holding SInvalLock when we elog'd? Net effect
1089-
* is that we are relying on fetch/store of an xid to be atomic, else
1090-
* other backends might see a partially-zeroed xid here. Would it be
1091-
* safe to release spins before we reset xid/xmin? But see also
1092-
* GetNewTransactionId, which does the same thing.
1093-
*/
1094-
if (MyProc != (PROC *) NULL)
1095-
{
1096-
MyProc->xid = InvalidTransactionId;
1097-
MyProc->xmin = InvalidTransactionId;
1098-
}
1099-
11001094
/*
11011095
* Release any spinlocks or buffer context locks we might be holding
11021096
* as quickly as possible. (Real locks, however, must be held till we
@@ -1143,10 +1137,23 @@ AbortTransaction(void)
11431137
AtAbort_Notify();
11441138
CloseSequences();
11451139
AtEOXact_portals();
1140+
1141+
/* Advertise the fact that we aborted in pg_log. */
11461142
RecordTransactionAbort();
11471143

1148-
/* Count transaction abort in statistics collector */
1149-
pgstat_count_xact_rollback();
1144+
/*
1145+
* Let others know about no transaction in progress by me. Note that
1146+
* this must be done _before_ releasing locks we hold and _after_
1147+
* RecordTransactionAbort.
1148+
*/
1149+
if (MyProc != (PROC *) NULL)
1150+
{
1151+
/* Lock SInvalLock because that's what GetSnapshotData uses. */
1152+
SpinAcquire(SInvalLock);
1153+
MyProc->xid = InvalidTransactionId;
1154+
MyProc->xmin = InvalidTransactionId;
1155+
SpinRelease(SInvalLock);
1156+
}
11501157

11511158
RelationPurgeLocalRelation(false);
11521159
AtEOXact_temp_relations(false);
@@ -1165,6 +1172,9 @@ AbortTransaction(void)
11651172

11661173
SharedBufferChanged = false;/* safest place to do it */
11671174

1175+
/* Count transaction abort in statistics collector */
1176+
pgstat_count_xact_rollback();
1177+
11681178
/*
11691179
* State remains TRANS_ABORT until CleanupTransaction().
11701180
*/

src/backend/storage/ipc/sinval.c

Lines changed: 48 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $Header: /cvsroot/pgsql/src/backend/storage/ipc/sinval.c,v 1.36 2001/07/12 04:11:13 tgl Exp $
11+
* $Header: /cvsroot/pgsql/src/backend/storage/ipc/sinval.c,v 1.37 2001/07/16 22:43:34 tgl Exp $
1212
*
1313
*-------------------------------------------------------------------------
1414
*/
@@ -193,8 +193,10 @@ TransactionIdIsInProgress(TransactionId xid)
193193
if (pOffset != INVALID_OFFSET)
194194
{
195195
PROC *proc = (PROC *) MAKE_PTR(pOffset);
196+
/* Fetch xid just once - see GetNewTransactionId */
197+
TransactionId pxid = proc->xid;
196198

197-
if (TransactionIdEquals(proc->xid, xid))
199+
if (TransactionIdEquals(pxid, xid))
198200
{
199201
result = true;
200202
break;
@@ -236,9 +238,9 @@ GetXmaxRecent(TransactionId *XmaxRecent)
236238
if (pOffset != INVALID_OFFSET)
237239
{
238240
PROC *proc = (PROC *) MAKE_PTR(pOffset);
239-
TransactionId xid;
241+
/* Fetch xid just once - see GetNewTransactionId */
242+
TransactionId xid = proc->xid;
240243

241-
xid = proc->xid;
242244
if (! TransactionIdIsSpecial(xid))
243245
{
244246
if (TransactionIdPrecedes(xid, result))
@@ -256,8 +258,19 @@ GetXmaxRecent(TransactionId *XmaxRecent)
256258
*XmaxRecent = result;
257259
}
258260

259-
/*
261+
/*----------
260262
* GetSnapshotData -- returns information about running transactions.
263+
*
264+
* The returned snapshot includes xmin (lowest still-running xact ID),
265+
* xmax (next xact ID to be assigned), and a list of running xact IDs
266+
* in the range xmin <= xid < xmax. It is used as follows:
267+
* All xact IDs < xmin are considered finished.
268+
* All xact IDs >= xmax are considered still running.
269+
* For an xact ID xmin <= xid < xmax, consult list to see whether
270+
* it is considered running or not.
271+
* This ensures that the set of transactions seen as "running" by the
272+
* current xact will not change after it takes the snapshot.
273+
*----------
261274
*/
262275
Snapshot
263276
GetSnapshotData(bool serializable)
@@ -287,12 +300,33 @@ GetSnapshotData(bool serializable)
287300
elog(ERROR, "Memory exhausted in GetSnapshotData");
288301
}
289302

290-
/*
291-
* Unfortunately, we have to call ReadNewTransactionId() after
292-
* acquiring SInvalLock above. It's not good because
293-
* ReadNewTransactionId() does SpinAcquire(XidGenLockId) but
294-
* _necessary_.
303+
/*--------------------
304+
* Unfortunately, we have to call ReadNewTransactionId() after acquiring
305+
* SInvalLock above. It's not good because ReadNewTransactionId() does
306+
* SpinAcquire(XidGenLockId), but *necessary*. We need to be sure that
307+
* no transactions exit the set of currently-running transactions
308+
* between the time we fetch xmax and the time we finish building our
309+
* snapshot. Otherwise we could have a situation like this:
310+
*
311+
* 1. Tx Old is running (in Read Committed mode).
312+
* 2. Tx S reads new transaction ID into xmax, then
313+
* is swapped out before acquiring SInvalLock.
314+
* 3. Tx New gets new transaction ID (>= S' xmax),
315+
* makes changes and commits.
316+
* 4. Tx Old changes some row R changed by Tx New and commits.
317+
* 5. Tx S finishes getting its snapshot data. It sees Tx Old as
318+
* done, but sees Tx New as still running (since New >= xmax).
319+
*
320+
* Now S will see R changed by both Tx Old and Tx New, *but* does not
321+
* see other changes made by Tx New. If S is supposed to be in
322+
* Serializable mode, this is wrong.
323+
*
324+
* By locking SInvalLock before we read xmax, we ensure that TX Old
325+
* cannot exit the set of running transactions seen by Tx S. Therefore
326+
* both Old and New will be seen as still running => no inconsistency.
327+
*--------------------
295328
*/
329+
296330
ReadNewTransactionId(&(snapshot->xmax));
297331

298332
for (index = 0; index < segP->lastBackend; index++)
@@ -302,6 +336,7 @@ GetSnapshotData(bool serializable)
302336
if (pOffset != INVALID_OFFSET)
303337
{
304338
PROC *proc = (PROC *) MAKE_PTR(pOffset);
339+
/* Fetch xid just once - see GetNewTransactionId */
305340
TransactionId xid = proc->xid;
306341

307342
/*
@@ -325,11 +360,12 @@ GetSnapshotData(bool serializable)
325360

326361
if (serializable)
327362
MyProc->xmin = snapshot->xmin;
328-
/* Serializable snapshot must be computed before any other... */
329-
Assert(MyProc->xmin != InvalidTransactionId);
330363

331364
SpinRelease(SInvalLock);
332365

366+
/* Serializable snapshot must be computed before any other... */
367+
Assert(MyProc->xmin != InvalidTransactionId);
368+
333369
snapshot->xcnt = count;
334370
return snapshot;
335371
}

src/backend/utils/time/tqual.c

Lines changed: 19 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $Header: /cvsroot/pgsql/src/backend/utils/time/tqual.c,v 1.38 2001/07/12 04:11:13 tgl Exp $
11+
* $Header: /cvsroot/pgsql/src/backend/utils/time/tqual.c,v 1.39 2001/07/16 22:43:34 tgl Exp $
1212
*
1313
*-------------------------------------------------------------------------
1414
*/
@@ -610,6 +610,13 @@ HeapTupleSatisfiesVacuum(HeapTupleHeader tuple, TransactionId XmaxRecent)
610610
*
611611
* If the inserting transaction aborted, then the tuple was never visible
612612
* to any other transaction, so we can delete it immediately.
613+
*
614+
* NOTE: must check TransactionIdIsInProgress (which looks in shared mem)
615+
* before TransactionIdDidCommit/TransactionIdDidAbort (which look in
616+
* pg_log). Otherwise we have a race condition where we might decide
617+
* that a just-committed transaction crashed, because none of the tests
618+
* succeed. xact.c is careful to record commit/abort in pg_log before
619+
* it unsets MyProc->xid in shared memory.
613620
*/
614621
if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
615622
{
@@ -636,19 +643,19 @@ HeapTupleSatisfiesVacuum(HeapTupleHeader tuple, TransactionId XmaxRecent)
636643
}
637644
tuple->t_infomask |= HEAP_XMIN_COMMITTED;
638645
}
646+
else if (TransactionIdIsInProgress(tuple->t_xmin))
647+
return HEAPTUPLE_INSERT_IN_PROGRESS;
648+
else if (TransactionIdDidCommit(tuple->t_xmin))
649+
tuple->t_infomask |= HEAP_XMIN_COMMITTED;
639650
else if (TransactionIdDidAbort(tuple->t_xmin))
640651
{
641652
tuple->t_infomask |= HEAP_XMIN_INVALID;
642653
return HEAPTUPLE_DEAD;
643654
}
644-
else if (TransactionIdDidCommit(tuple->t_xmin))
645-
tuple->t_infomask |= HEAP_XMIN_COMMITTED;
646-
else if (TransactionIdIsInProgress(tuple->t_xmin))
647-
return HEAPTUPLE_INSERT_IN_PROGRESS;
648655
else
649656
{
650657
/*
651-
* Not Aborted, Not Committed, Not in Progress -
658+
* Not in Progress, Not Committed, Not Aborted -
652659
* so it's from crashed process. - vadim 11/26/96
653660
*/
654661
tuple->t_infomask |= HEAP_XMIN_INVALID;
@@ -667,19 +674,19 @@ HeapTupleSatisfiesVacuum(HeapTupleHeader tuple, TransactionId XmaxRecent)
667674

668675
if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
669676
{
670-
if (TransactionIdDidAbort(tuple->t_xmax))
677+
if (TransactionIdIsInProgress(tuple->t_xmax))
678+
return HEAPTUPLE_DELETE_IN_PROGRESS;
679+
else if (TransactionIdDidCommit(tuple->t_xmax))
680+
tuple->t_infomask |= HEAP_XMAX_COMMITTED;
681+
else if (TransactionIdDidAbort(tuple->t_xmax))
671682
{
672683
tuple->t_infomask |= HEAP_XMAX_INVALID;
673684
return HEAPTUPLE_LIVE;
674685
}
675-
else if (TransactionIdDidCommit(tuple->t_xmax))
676-
tuple->t_infomask |= HEAP_XMAX_COMMITTED;
677-
else if (TransactionIdIsInProgress(tuple->t_xmax))
678-
return HEAPTUPLE_DELETE_IN_PROGRESS;
679686
else
680687
{
681688
/*
682-
* Not Aborted, Not Committed, Not in Progress -
689+
* Not in Progress, Not Committed, Not Aborted -
683690
* so it's from crashed process. - vadim 06/02/97
684691
*/
685692
tuple->t_infomask |= HEAP_XMAX_INVALID;

0 commit comments

Comments
 (0)