Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 05e2293

Browse files
committed
Fix possible pg_basebackup failure on standby with "include WAL".
If a restartpoint flushed no dirty buffers, it could fail to update the minimum recovery point, leading to a minimum recovery point prior to the starting REDO location. perform_base_backup() would interpret that as meaning that no WAL files at all needed to be included in the backup, failing an internal sanity check. To fix, have restartpoints always update the minimum recovery point to just after the checkpoint record itself, so that the file (or files) containing the checkpoint record will always be included in the backup. Code by Amit Kapila, per a design suggestion by me, with some additional work on the code comment by me. Test case by Michael Paquier. Report by Kyotaro Horiguchi.
1 parent 445035a commit 05e2293

File tree

2 files changed

+33
-1
lines changed

2 files changed

+33
-1
lines changed

src/backend/access/transam/xlog.c

+28-1
Original file line numberDiff line numberDiff line change
@@ -612,11 +612,14 @@ typedef struct XLogCtlData
612612

613613
/*
614614
* During recovery, we keep a copy of the latest checkpoint record here.
615-
* Used by the background writer when it wants to create a restartpoint.
615+
* lastCheckPointRecPtr points to start of checkpoint record and
616+
* lastCheckPointEndPtr points to end+1 of checkpoint record. Used by the
617+
* background writer when it wants to create a restartpoint.
616618
*
617619
* Protected by info_lck.
618620
*/
619621
XLogRecPtr lastCheckPointRecPtr;
622+
XLogRecPtr lastCheckPointEndPtr;
620623
CheckPoint lastCheckPoint;
621624

622625
/*
@@ -8691,6 +8694,7 @@ RecoveryRestartPoint(const CheckPoint *checkPoint)
86918694
*/
86928695
SpinLockAcquire(&XLogCtl->info_lck);
86938696
XLogCtl->lastCheckPointRecPtr = ReadRecPtr;
8697+
XLogCtl->lastCheckPointEndPtr = EndRecPtr;
86948698
XLogCtl->lastCheckPoint = *checkPoint;
86958699
SpinLockRelease(&XLogCtl->info_lck);
86968700
}
@@ -8710,6 +8714,7 @@ bool
87108714
CreateRestartPoint(int flags)
87118715
{
87128716
XLogRecPtr lastCheckPointRecPtr;
8717+
XLogRecPtr lastCheckPointEndPtr;
87138718
CheckPoint lastCheckPoint;
87148719
XLogRecPtr PriorRedoPtr;
87158720
TimestampTz xtime;
@@ -8723,6 +8728,7 @@ CreateRestartPoint(int flags)
87238728
/* Get a local copy of the last safe checkpoint record. */
87248729
SpinLockAcquire(&XLogCtl->info_lck);
87258730
lastCheckPointRecPtr = XLogCtl->lastCheckPointRecPtr;
8731+
lastCheckPointEndPtr = XLogCtl->lastCheckPointEndPtr;
87268732
lastCheckPoint = XLogCtl->lastCheckPoint;
87278733
SpinLockRelease(&XLogCtl->info_lck);
87288734

@@ -8826,6 +8832,27 @@ CreateRestartPoint(int flags)
88268832
ControlFile->checkPoint = lastCheckPointRecPtr;
88278833
ControlFile->checkPointCopy = lastCheckPoint;
88288834
ControlFile->time = (pg_time_t) time(NULL);
8835+
8836+
/*
8837+
* Ensure minRecoveryPoint is past the checkpoint record. Normally,
8838+
* this will have happened already while writing out dirty buffers,
8839+
* but not necessarily - e.g. because no buffers were dirtied. We do
8840+
* this because a non-exclusive base backup uses minRecoveryPoint to
8841+
* determine which WAL files must be included in the backup, and the
8842+
* file (or files) containing the checkpoint record must be included,
8843+
* at a minimum. Note that for an ordinary restart of recovery there's
8844+
* no value in having the minimum recovery point any earlier than this
8845+
* anyway, because redo will begin just after the checkpoint record.
8846+
*/
8847+
if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr)
8848+
{
8849+
ControlFile->minRecoveryPoint = lastCheckPointEndPtr;
8850+
ControlFile->minRecoveryPointTLI = lastCheckPoint.ThisTimeLineID;
8851+
8852+
/* update local copy */
8853+
minRecoveryPoint = ControlFile->minRecoveryPoint;
8854+
minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
8855+
}
88298856
if (flags & CHECKPOINT_IS_SHUTDOWN)
88308857
ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
88318858
UpdateControlFile();

src/test/recovery/t/001_stream_rep.pl

+5
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,11 @@
2424
# pg_basebackup works on a standby).
2525
$node_standby_1->backup($backup_name);
2626

27+
# Take a second backup of the standby while the master is offline.
28+
$node_master->stop;
29+
$node_standby_1->backup('my_backup_2');
30+
$node_master->start;
31+
2732
# Create second standby node linking to standby 1
2833
my $node_standby_2 = get_new_node('standby_2');
2934
$node_standby_2->init_from_backup($node_standby_1, $backup_name,

0 commit comments

Comments
 (0)