Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 6df1543

Browse files
committed
Refactor some end-of-recovery code out of StartupXLOG().
Create a new function PerformRecoveryXLogAction() and move the code which either writes an end-of-recovery record or requests a checkpoint there. Also create a new function CleanupAfterArchiveRecovery() to perform a few tasks that we want to do after we've actually exited archive recovery but before we start accepting new WAL writes. More refactoring of this file is planned, but this commit is just straightforward code movement to make StartupXLOG() a little bit shorter and a little bit easier to understand. Robert Haas and Amul Sul Discussion: http://postgr.es/m/CAAJ_b97abMuq=470Wahun=aS1PHTSbStHtrjjPaD-C0YQ1AqVw@mail.gmail.com
1 parent 8c7be86 commit 6df1543

File tree

1 file changed

+143
-118
lines changed
  • src/backend/access/transam

1 file changed

+143
-118
lines changed

src/backend/access/transam/xlog.c

+143-118
Original file line numberDiff line numberDiff line change
@@ -889,6 +889,8 @@ static MemoryContext walDebugCxt = NULL;
889889
static void readRecoverySignalFile(void);
890890
static void validateRecoveryParameters(void);
891891
static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog);
892+
static void CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI,
893+
XLogRecPtr EndOfLog);
892894
static bool recoveryStopsBefore(XLogReaderState *record);
893895
static bool recoveryStopsAfter(XLogReaderState *record);
894896
static char *getRecoveryStopReason(void);
@@ -937,6 +939,7 @@ static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force);
937939
static XLogRecord *ReadRecord(XLogReaderState *xlogreader,
938940
int emode, bool fetching_ckpt);
939941
static void CheckRecoveryConsistency(void);
942+
static bool PerformRecoveryXLogAction(void);
940943
static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader,
941944
XLogRecPtr RecPtr, int whichChkpt, bool report);
942945
static bool rescanLatestTimeLine(void);
@@ -5731,6 +5734,88 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
57315734
(errmsg("archive recovery complete")));
57325735
}
57335736

5737+
/*
5738+
* Perform cleanup actions at the conclusion of archive recovery.
5739+
*/
5740+
static void
5741+
CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog)
5742+
{
5743+
/*
5744+
* Execute the recovery_end_command, if any.
5745+
*/
5746+
if (recoveryEndCommand && strcmp(recoveryEndCommand, "") != 0)
5747+
ExecuteRecoveryCommand(recoveryEndCommand,
5748+
"recovery_end_command",
5749+
true);
5750+
5751+
/*
5752+
* We switched to a new timeline. Clean up segments on the old timeline.
5753+
*
5754+
* If there are any higher-numbered segments on the old timeline, remove
5755+
* them. They might contain valid WAL, but they might also be pre-allocated
5756+
* files containing garbage. In any case, they are not part of the new
5757+
* timeline's history so we don't need them.
5758+
*/
5759+
RemoveNonParentXlogFiles(EndOfLog, ThisTimeLineID);
5760+
5761+
/*
5762+
* If the switch happened in the middle of a segment, what to do with the
5763+
* last, partial segment on the old timeline? If we don't archive it, and
5764+
* the server that created the WAL never archives it either (e.g. because it
5765+
* was hit by a meteor), it will never make it to the archive. That's OK
5766+
* from our point of view, because the new segment that we created with the
5767+
* new TLI contains all the WAL from the old timeline up to the switch
5768+
* point. But if you later try to do PITR to the "missing" WAL on the old
5769+
* timeline, recovery won't find it in the archive. It's physically present
5770+
* in the new file with new TLI, but recovery won't look there when it's
5771+
* recovering to the older timeline. On the other hand, if we archive the
5772+
* partial segment, and the original server on that timeline is still
5773+
* running and archives the completed version of the same segment later, it
5774+
* will fail. (We used to do that in 9.4 and below, and it caused such
5775+
* problems).
5776+
*
5777+
* As a compromise, we rename the last segment with the .partial suffix, and
5778+
* archive it. Archive recovery will never try to read .partial segments, so
5779+
* they will normally go unused. But in the odd PITR case, the administrator
5780+
* can copy them manually to the pg_wal directory (removing the suffix).
5781+
* They can be useful in debugging, too.
5782+
*
5783+
* If a .done or .ready file already exists for the old timeline, however,
5784+
* we had already determined that the segment is complete, so we can let it
5785+
* be archived normally. (In particular, if it was restored from the archive
5786+
* to begin with, it's expected to have a .done file).
5787+
*/
5788+
if (XLogSegmentOffset(EndOfLog, wal_segment_size) != 0 &&
5789+
XLogArchivingActive())
5790+
{
5791+
char origfname[MAXFNAMELEN];
5792+
XLogSegNo endLogSegNo;
5793+
5794+
XLByteToPrevSeg(EndOfLog, endLogSegNo, wal_segment_size);
5795+
XLogFileName(origfname, EndOfLogTLI, endLogSegNo, wal_segment_size);
5796+
5797+
if (!XLogArchiveIsReadyOrDone(origfname))
5798+
{
5799+
char origpath[MAXPGPATH];
5800+
char partialfname[MAXFNAMELEN];
5801+
char partialpath[MAXPGPATH];
5802+
5803+
XLogFilePath(origpath, EndOfLogTLI, endLogSegNo, wal_segment_size);
5804+
snprintf(partialfname, MAXFNAMELEN, "%s.partial", origfname);
5805+
snprintf(partialpath, MAXPGPATH, "%s.partial", origpath);
5806+
5807+
/*
5808+
* Make sure there's no .done or .ready file for the .partial
5809+
* file.
5810+
*/
5811+
XLogArchiveCleanup(partialfname);
5812+
5813+
durable_rename(origpath, partialpath, ERROR);
5814+
XLogArchiveNotify(partialfname);
5815+
}
5816+
}
5817+
}
5818+
57345819
/*
57355820
* Extract timestamp from WAL record.
57365821
*
@@ -7953,127 +8038,13 @@ StartupXLOG(void)
79538038
UpdateFullPageWrites();
79548039
LocalXLogInsertAllowed = -1;
79558040

8041+
/* Emit checkpoint or end-of-recovery record in XLOG, if required. */
79568042
if (InRecovery)
7957-
{
7958-
/*
7959-
* Perform a checkpoint to update all our recovery activity to disk.
7960-
*
7961-
* Note that we write a shutdown checkpoint rather than an on-line
7962-
* one. This is not particularly critical, but since we may be
7963-
* assigning a new TLI, using a shutdown checkpoint allows us to have
7964-
* the rule that TLI only changes in shutdown checkpoints, which
7965-
* allows some extra error checking in xlog_redo.
7966-
*
7967-
* In promotion, only create a lightweight end-of-recovery record
7968-
* instead of a full checkpoint. A checkpoint is requested later,
7969-
* after we're fully out of recovery mode and already accepting
7970-
* queries.
7971-
*/
7972-
if (ArchiveRecoveryRequested && IsUnderPostmaster &&
7973-
LocalPromoteIsTriggered)
7974-
{
7975-
promoted = true;
7976-
7977-
/*
7978-
* Insert a special WAL record to mark the end of recovery, since
7979-
* we aren't doing a checkpoint. That means that the checkpointer
7980-
* process may likely be in the middle of a time-smoothed
7981-
* restartpoint and could continue to be for minutes after this.
7982-
* That sounds strange, but the effect is roughly the same and it
7983-
* would be stranger to try to come out of the restartpoint and
7984-
* then checkpoint. We request a checkpoint later anyway, just for
7985-
* safety.
7986-
*/
7987-
CreateEndOfRecoveryRecord();
7988-
}
7989-
else
7990-
{
7991-
RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
7992-
CHECKPOINT_IMMEDIATE |
7993-
CHECKPOINT_WAIT);
7994-
}
7995-
}
8043+
promoted = PerformRecoveryXLogAction();
79968044

8045+
/* If this is archive recovery, perform post-recovery cleanup actions. */
79978046
if (ArchiveRecoveryRequested)
7998-
{
7999-
/*
8000-
* And finally, execute the recovery_end_command, if any.
8001-
*/
8002-
if (recoveryEndCommand && strcmp(recoveryEndCommand, "") != 0)
8003-
ExecuteRecoveryCommand(recoveryEndCommand,
8004-
"recovery_end_command",
8005-
true);
8006-
8007-
/*
8008-
* We switched to a new timeline. Clean up segments on the old
8009-
* timeline.
8010-
*
8011-
* If there are any higher-numbered segments on the old timeline,
8012-
* remove them. They might contain valid WAL, but they might also be
8013-
* pre-allocated files containing garbage. In any case, they are not
8014-
* part of the new timeline's history so we don't need them.
8015-
*/
8016-
RemoveNonParentXlogFiles(EndOfLog, ThisTimeLineID);
8017-
8018-
/*
8019-
* If the switch happened in the middle of a segment, what to do with
8020-
* the last, partial segment on the old timeline? If we don't archive
8021-
* it, and the server that created the WAL never archives it either
8022-
* (e.g. because it was hit by a meteor), it will never make it to the
8023-
* archive. That's OK from our point of view, because the new segment
8024-
* that we created with the new TLI contains all the WAL from the old
8025-
* timeline up to the switch point. But if you later try to do PITR to
8026-
* the "missing" WAL on the old timeline, recovery won't find it in
8027-
* the archive. It's physically present in the new file with new TLI,
8028-
* but recovery won't look there when it's recovering to the older
8029-
* timeline. On the other hand, if we archive the partial segment, and
8030-
* the original server on that timeline is still running and archives
8031-
* the completed version of the same segment later, it will fail. (We
8032-
* used to do that in 9.4 and below, and it caused such problems).
8033-
*
8034-
* As a compromise, we rename the last segment with the .partial
8035-
* suffix, and archive it. Archive recovery will never try to read
8036-
* .partial segments, so they will normally go unused. But in the odd
8037-
* PITR case, the administrator can copy them manually to the pg_wal
8038-
* directory (removing the suffix). They can be useful in debugging,
8039-
* too.
8040-
*
8041-
* If a .done or .ready file already exists for the old timeline,
8042-
* however, we had already determined that the segment is complete, so
8043-
* we can let it be archived normally. (In particular, if it was
8044-
* restored from the archive to begin with, it's expected to have a
8045-
* .done file).
8046-
*/
8047-
if (XLogSegmentOffset(EndOfLog, wal_segment_size) != 0 &&
8048-
XLogArchivingActive())
8049-
{
8050-
char origfname[MAXFNAMELEN];
8051-
XLogSegNo endLogSegNo;
8052-
8053-
XLByteToPrevSeg(EndOfLog, endLogSegNo, wal_segment_size);
8054-
XLogFileName(origfname, EndOfLogTLI, endLogSegNo, wal_segment_size);
8055-
8056-
if (!XLogArchiveIsReadyOrDone(origfname))
8057-
{
8058-
char origpath[MAXPGPATH];
8059-
char partialfname[MAXFNAMELEN];
8060-
char partialpath[MAXPGPATH];
8061-
8062-
XLogFilePath(origpath, EndOfLogTLI, endLogSegNo, wal_segment_size);
8063-
snprintf(partialfname, MAXFNAMELEN, "%s.partial", origfname);
8064-
snprintf(partialpath, MAXPGPATH, "%s.partial", origpath);
8065-
8066-
/*
8067-
* Make sure there's no .done or .ready file for the .partial
8068-
* file.
8069-
*/
8070-
XLogArchiveCleanup(partialfname);
8071-
8072-
durable_rename(origpath, partialpath, ERROR);
8073-
XLogArchiveNotify(partialfname);
8074-
}
8075-
}
8076-
}
8047+
CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog);
80778048

80788049
/*
80798050
* Preallocate additional log files, if wanted.
@@ -8282,6 +8253,60 @@ CheckRecoveryConsistency(void)
82828253
}
82838254
}
82848255

8256+
/*
8257+
* Perform whatever XLOG actions are necessary at end of REDO.
8258+
*
8259+
* The goal here is to make sure that we'll be able to recover properly if
8260+
* we crash again. If we choose to write a checkpoint, we'll write a shutdown
8261+
* checkpoint rather than an on-line one. This is not particularly critical,
8262+
* but since we may be assigning a new TLI, using a shutdown checkpoint allows
8263+
* us to have the rule that TLI only changes in shutdown checkpoints, which
8264+
* allows some extra error checking in xlog_redo.
8265+
*/
8266+
static bool
8267+
PerformRecoveryXLogAction(void)
8268+
{
8269+
bool promoted = false;
8270+
8271+
/*
8272+
* Perform a checkpoint to update all our recovery activity to disk.
8273+
*
8274+
* Note that we write a shutdown checkpoint rather than an on-line one. This
8275+
* is not particularly critical, but since we may be assigning a new TLI,
8276+
* using a shutdown checkpoint allows us to have the rule that TLI only
8277+
* changes in shutdown checkpoints, which allows some extra error checking
8278+
* in xlog_redo.
8279+
*
8280+
* In promotion, only create a lightweight end-of-recovery record instead of
8281+
* a full checkpoint. A checkpoint is requested later, after we're fully out
8282+
* of recovery mode and already accepting queries.
8283+
*/
8284+
if (ArchiveRecoveryRequested && IsUnderPostmaster &&
8285+
LocalPromoteIsTriggered)
8286+
{
8287+
promoted = true;
8288+
8289+
/*
8290+
* Insert a special WAL record to mark the end of recovery, since we
8291+
* aren't doing a checkpoint. That means that the checkpointer process
8292+
* may likely be in the middle of a time-smoothed restartpoint and could
8293+
* continue to be for minutes after this. That sounds strange, but the
8294+
* effect is roughly the same and it would be stranger to try to come
8295+
* out of the restartpoint and then checkpoint. We request a checkpoint
8296+
* later anyway, just for safety.
8297+
*/
8298+
CreateEndOfRecoveryRecord();
8299+
}
8300+
else
8301+
{
8302+
RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
8303+
CHECKPOINT_IMMEDIATE |
8304+
CHECKPOINT_WAIT);
8305+
}
8306+
8307+
return promoted;
8308+
}
8309+
82858310
/*
82868311
* Is the system still in recovery?
82878312
*

0 commit comments

Comments
 (0)