Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit c34f807

Browse files
committed
Ensure correct minimum consistent point on standbys
Startup process has improved its calculation of incorrect minimum consistent point in 8d68ee6, which ensures that all WAL available gets replayed when doing crash recovery, and has introduced an incorrect calculation of the minimum recovery point for non-startup processes, which can cause incorrect page references on a standby when for example the background writer flushed a couple of pages on-disk but was not updating the control file to let a subsequent crash recovery replay to where it should have. The only case where this has been reported to be a problem is when a standby needs to calculate the latest removed xid when replaying a btree deletion record, so one would need connections on a standby that happen just after recovery has thought it reached a consistent point. Using a background worker which is started after the consistent point is reached would be the easiest way to get into problems if it connects to a database. Having clients which attempt to connect periodically could also be a problem, but the odds of seeing this problem are much lower. The fix used is pretty simple, as the idea is to give access to the minimum recovery point written in the control file to non-startup processes so as they use a reference, while the startup process still initializes its own references of the minimum consistent point so as the original problem with incorrect page references happening post-promotion with a crash do not show up. Reported-by: Alexander Kukushkin Diagnosed-by: Alexander Kukushkin Author: Michael Paquier Reviewed-by: Kyotaro Horiguchi, Alexander Kukushkin Discussion: https://postgr.es/m/153492341830.1368.3936905691758473953@wrigleys.postgresql.org Backpatch-through: 9.3
1 parent d787af7 commit c34f807

File tree

1 file changed

+25
-6
lines changed
  • src/backend/access/transam

1 file changed

+25
-6
lines changed

src/backend/access/transam/xlog.c

Lines changed: 25 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2722,9 +2722,13 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
27222722
* i.e., we're doing crash recovery. We never modify the control file's
27232723
* value in that case, so we can short-circuit future checks here too. The
27242724
* local values of minRecoveryPoint and minRecoveryPointTLI should not be
2725-
* updated until crash recovery finishes.
2725+
* updated until crash recovery finishes. We only do this for the startup
2726+
* process as it should not update its own reference of minRecoveryPoint
2727+
* until it has finished crash recovery to make sure that all WAL
2728+
* available is replayed in this case. This also saves from extra locks
2729+
* taken on the control file from the startup process.
27262730
*/
2727-
if (XLogRecPtrIsInvalid(minRecoveryPoint))
2731+
if (XLogRecPtrIsInvalid(minRecoveryPoint) && InRecovery)
27282732
{
27292733
updateMinRecoveryPoint = false;
27302734
return;
@@ -2736,7 +2740,9 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
27362740
minRecoveryPoint = ControlFile->minRecoveryPoint;
27372741
minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
27382742

2739-
if (force || minRecoveryPoint < lsn)
2743+
if (XLogRecPtrIsInvalid(minRecoveryPoint))
2744+
updateMinRecoveryPoint = false;
2745+
else if (force || minRecoveryPoint < lsn)
27402746
{
27412747
XLogRecPtr newMinRecoveryPoint;
27422748
TimeLineID newMinRecoveryPointTLI;
@@ -3126,9 +3132,11 @@ XLogNeedsFlush(XLogRecPtr record)
31263132
* An invalid minRecoveryPoint means that we need to recover all the
31273133
* WAL, i.e., we're doing crash recovery. We never modify the control
31283134
* file's value in that case, so we can short-circuit future checks
3129-
* here too.
3135+
* here too. This triggers a quick exit path for the startup process,
3136+
* which cannot update its local copy of minRecoveryPoint as long as
3137+
* it has not replayed all WAL available when doing crash recovery.
31303138
*/
3131-
if (XLogRecPtrIsInvalid(minRecoveryPoint))
3139+
if (XLogRecPtrIsInvalid(minRecoveryPoint) && InRecovery)
31323140
updateMinRecoveryPoint = false;
31333141

31343142
/* Quick exit if already known to be updated or cannot be updated */
@@ -3145,8 +3153,19 @@ XLogNeedsFlush(XLogRecPtr record)
31453153
minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
31463154
LWLockRelease(ControlFileLock);
31473155

3156+
/*
3157+
* Check minRecoveryPoint for any other process than the startup
3158+
* process doing crash recovery, which should not update the control
3159+
* file value if crash recovery is still running.
3160+
*/
3161+
if (XLogRecPtrIsInvalid(minRecoveryPoint))
3162+
updateMinRecoveryPoint = false;
3163+
31483164
/* check again */
3149-
return record > minRecoveryPoint;
3165+
if (record <= minRecoveryPoint || !updateMinRecoveryPoint)
3166+
return false;
3167+
else
3168+
return true;
31503169
}
31513170

31523171
/* Quick exit if already known flushed */

0 commit comments

Comments
 (0)