Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit cb02cbc

Browse files
committed
Fix race between GetNewTransactionId and GetOldestActiveTransactionId.
The race condition goes like this: 1. GetNewTransactionId advances nextXid e.g. from 100 to 101 2. GetOldestActiveTransactionId reads the new nextXid, 101 3. GetOldestActiveTransactionId loops through the proc array. There are no active XIDs there, so it returns 101 as the oldest active XID. 4. GetNewTransactionid stores XID 100 to MyPgXact->xid So, GetOldestActiveTransactionId returned XID 101, even though 100 only just started and is surely still running. This would be hard to hit in practice, and even harder to spot any ill effect if it happens. GetOldestActiveTransactionId is only used when creating a checkpoint in a master server, and the race condition can only happen on an online checkpoint, as there are no backends running during a shutdown checkpoint. The oldestActiveXid value of an online checkpoint is only used when starting up a hot standby server, to determine the starting point where pg_subtrans is initialized from. For the race condition to happen, there must be no other XIDs in the proc array that would hold back the oldest-active XID value, which means that the missed XID must be a top transaction's XID. However, pg_subtrans is not used for top XIDs, so I believe an off-by-one error is in fact inconsequential. Nevertheless, let's fix it, as it's clearly wrong and the fix is simple. This has been wrong ever since hot standby was introduced, so backport to all supported versions. Discussion: https://www.postgresql.org/message-id/e7258662-82b6-7a45-56d4-99b337a32bf7@iki.fi
1 parent ff2d537 commit cb02cbc

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

src/backend/storage/ipc/procarray.c

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2095,20 +2095,21 @@ GetOldestActiveTransactionId(void)
20952095

20962096
Assert(!RecoveryInProgress());
20972097

2098-
LWLockAcquire(ProcArrayLock, LW_SHARED);
2099-
21002098
/*
2101-
* It's okay to read nextXid without acquiring XidGenLock because (1) we
2102-
* assume TransactionIds can be read atomically and (2) we don't care if
2103-
* we get a slightly stale value. It can't be very stale anyway, because
2104-
* the LWLockAcquire above will have done any necessary memory
2105-
* interlocking.
2099+
* Read nextXid, as the upper bound of what's still active.
2100+
*
2101+
* Reading a TransactionId is atomic, but we must grab the lock to make
2102+
* sure that all XIDs < nextXid are already present in the proc array (or
2103+
* have already completed), when we spin over it.
21062104
*/
2105+
LWLockAcquire(XidGenLock, LW_SHARED);
21072106
oldestRunningXid = ShmemVariableCache->nextXid;
2107+
LWLockRelease(XidGenLock);
21082108

21092109
/*
21102110
* Spin over procArray collecting all xids and subxids.
21112111
*/
2112+
LWLockAcquire(ProcArrayLock, LW_SHARED);
21122113
for (index = 0; index < arrayP->numProcs; index++)
21132114
{
21142115
int pgprocno = arrayP->pgprocnos[index];
@@ -2130,7 +2131,6 @@ GetOldestActiveTransactionId(void)
21302131
* smaller than oldestRunningXid
21312132
*/
21322133
}
2133-
21342134
LWLockRelease(ProcArrayLock);
21352135

21362136
return oldestRunningXid;

0 commit comments

Comments
 (0)