Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 2e9559b

Browse files
committed
Fix control file update done in restartpoints still running after promotion
If a cluster is promoted (aka the control file shows a state different than DB_IN_ARCHIVE_RECOVERY) while CreateRestartPoint() is still processing, this function could miss an update of the control file for "checkPoint" and "checkPointCopy" but still do the recycling and/or removal of the past WAL segments, assuming that the to-be-updated LSN values should be used as reference points for the cleanup. This causes a follow-up restart attempting crash recovery to fail with a PANIC on a missing checkpoint record if the end-of-recovery checkpoint triggered by the promotion did not complete while the cluster abruptly stopped or crashed before the completion of this checkpoint. The PANIC would be caused by the redo LSN referred in the control file as located in a segment already gone, recycled by the previous restartpoint with "checkPoint" out-of-sync in the control file. This commit fixes the update of the control file during restartpoints so as "checkPoint" and "checkPointCopy" are updated even if the cluster has been promoted while a restartpoint is running, to be on par with the set of WAL segments actually recycled in the end of CreateRestartPoint(). 7863ee4 has fixed this problem already on master, but the release timing of the latest point versions did not let me enough time to study and fix that on all the stable branches. Reported-by: Fujii Masao, Rui Zhao Author: Kyotaro Horiguchi Reviewed-by: Nathan Bossart, Michael Paquier Discussion: https://postgr.es/m/20220316.102444.2193181487576617583.horikyota.ntt@gmail.com Backpatch-through: 10
1 parent b7579b2 commit 2e9559b

File tree

1 file changed

+29
-15
lines changed
  • src/backend/access/transam

1 file changed

+29
-15
lines changed

src/backend/access/transam/xlog.c

Lines changed: 29 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -9481,21 +9481,26 @@ CreateRestartPoint(int flags)
94819481
PriorRedoPtr = ControlFile->checkPointCopy.redo;
94829482

94839483
/*
9484-
* Update pg_control, using current time. Check that it still shows
9485-
* DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing;
9486-
* this is a quick hack to make sure nothing really bad happens if somehow
9487-
* we get here after the end-of-recovery checkpoint.
9484+
* Update pg_control, using current time. Check that it still shows an
9485+
* older checkpoint, else do nothing; this is a quick hack to make sure
9486+
* nothing really bad happens if somehow we get here after the
9487+
* end-of-recovery checkpoint.
94889488
*/
94899489
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
9490-
if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY &&
9491-
ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
9490+
if (ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
94929491
{
9492+
/*
9493+
* Update the checkpoint information. We do this even if the cluster
9494+
* does not show DB_IN_ARCHIVE_RECOVERY to match with the set of WAL
9495+
* segments recycled below.
9496+
*/
94939497
ControlFile->checkPoint = lastCheckPointRecPtr;
94949498
ControlFile->checkPointCopy = lastCheckPoint;
94959499
ControlFile->time = (pg_time_t) time(NULL);
94969500

94979501
/*
9498-
* Ensure minRecoveryPoint is past the checkpoint record. Normally,
9502+
* Ensure minRecoveryPoint is past the checkpoint record and update it
9503+
* if the control file still shows DB_IN_ARCHIVE_RECOVERY. Normally,
94999504
* this will have happened already while writing out dirty buffers,
95009505
* but not necessarily - e.g. because no buffers were dirtied. We do
95019506
* this because a non-exclusive base backup uses minRecoveryPoint to
@@ -9504,18 +9509,27 @@ CreateRestartPoint(int flags)
95049509
* at a minimum. Note that for an ordinary restart of recovery there's
95059510
* no value in having the minimum recovery point any earlier than this
95069511
* anyway, because redo will begin just after the checkpoint record.
9512+
* this because a non-exclusive base backup uses minRecoveryPoint to
9513+
* determine which WAL files must be included in the backup, and the
9514+
* file (or files) containing the checkpoint record must be included,
9515+
* at a minimum. Note that for an ordinary restart of recovery there's
9516+
* no value in having the minimum recovery point any earlier than this
9517+
* anyway, because redo will begin just after the checkpoint record.
95079518
*/
9508-
if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr)
9519+
if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY)
95099520
{
9510-
ControlFile->minRecoveryPoint = lastCheckPointEndPtr;
9511-
ControlFile->minRecoveryPointTLI = lastCheckPoint.ThisTimeLineID;
9521+
if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr)
9522+
{
9523+
ControlFile->minRecoveryPoint = lastCheckPointEndPtr;
9524+
ControlFile->minRecoveryPointTLI = lastCheckPoint.ThisTimeLineID;
95129525

9513-
/* update local copy */
9514-
minRecoveryPoint = ControlFile->minRecoveryPoint;
9515-
minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
9526+
/* update local copy */
9527+
minRecoveryPoint = ControlFile->minRecoveryPoint;
9528+
minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
9529+
}
9530+
if (flags & CHECKPOINT_IS_SHUTDOWN)
9531+
ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
95169532
}
9517-
if (flags & CHECKPOINT_IS_SHUTDOWN)
9518-
ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
95199533
UpdateControlFile();
95209534
}
95219535
LWLockRelease(ControlFileLock);

0 commit comments

Comments
 (0)