Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 7cbee7c

Browse files
committed
At promotion, don't leave behind a partial segment on the old timeline.
With commit de76884, a copy of the partial segment was archived with the .partial suffix, but the original file was still left in pg_xlog, so it didn't actually solve the problems with archiving the partial segment that it was supposed to solve. With this patch, the partial segment is renamed rather than copied, so we only archive it with the .partial suffix. Also be more robust in detecting if the last segment is already being archived. Previously I used XLogArchiveIsBusy() for that, but that's not quite right. With archive_mode='always', there might be a .ready file for it, and we don't want to rename it to .partial in that case. The old segment is needed until we're fully committed to the new timeline, i.e. until we've written the end-of-recovery WAL record and updated the min recovery point and timeline in the control file. So move the renaming later in the startup sequence, after all that's been done.
1 parent c5dd8ea commit 7cbee7c

File tree

3 files changed

+120
-54
lines changed

3 files changed

+120
-54
lines changed

src/backend/access/transam/xlog.c

+84-54
Original file line numberDiff line numberDiff line change
@@ -5224,31 +5224,6 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
52245224
* happens in the middle of a segment, copy data from the last WAL segment
52255225
* of the old timeline up to the switch point, to the starting WAL segment
52265226
* on the new timeline.
5227-
*
5228-
* What to do with the partial segment on the old timeline? If we don't
5229-
* archive it, and the server that created the WAL never archives it
5230-
* either (e.g. because it was hit by a meteor), it will never make it to
5231-
* the archive. That's OK from our point of view, because the new segment
5232-
* that we created with the new TLI contains all the WAL from the old
5233-
* timeline up to the switch point. But if you later try to do PITR to the
5234-
* "missing" WAL on the old timeline, recovery won't find it in the
5235-
* archive. It's physically present in the new file with new TLI, but
5236-
* recovery won't look there when it's recovering to the older timeline.
5237-
* On the other hand, if we archive the partial segment, and the original
5238-
* server on that timeline is still running and archives the completed
5239-
* version of the same segment later, it will fail. (We used to do that in
5240-
* 9.4 and below, and it caused such problems).
5241-
*
5242-
* As a compromise, we archive the last segment with the .partial suffix.
5243-
* Archive recovery will never try to read .partial segments, so they will
5244-
* normally go unused. But in the odd PITR case, the administrator can
5245-
* copy them manually to the pg_xlog directory (removing the suffix). They
5246-
* can be useful in debugging, too.
5247-
*
5248-
* If a .done file already exists for the old timeline, however, there is
5249-
* already a complete copy of the file in the archive, and there is no
5250-
* need to archive the partial one. (In particular, if it was restored
5251-
* from the archive to begin with, it's expected to have .done file).
52525227
*/
52535228
if (endLogSegNo == startLogSegNo)
52545229
{
@@ -5266,31 +5241,6 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
52665241
tmpfname = XLogFileCopy(NULL, xlogfname, endOfLog % XLOG_SEG_SIZE);
52675242
if (!InstallXLogFileSegment(&endLogSegNo, tmpfname, false, 0, false))
52685243
elog(ERROR, "InstallXLogFileSegment should not have failed");
5269-
5270-
/*
5271-
* Make a .partial copy for the archive (unless the original file was
5272-
* already archived)
5273-
*/
5274-
if (XLogArchivingActive() && XLogArchiveIsBusy(xlogfname))
5275-
{
5276-
char partialfname[MAXFNAMELEN];
5277-
5278-
snprintf(partialfname, MAXFNAMELEN, "%s.partial", xlogfname);
5279-
5280-
/* Make sure there's no .done or .ready file for it. */
5281-
XLogArchiveCleanup(partialfname);
5282-
5283-
/*
5284-
* We copy the whole segment, not just upto the switch point.
5285-
* The portion after the switch point might be garbage, but it
5286-
* might also be valid WAL, if we stopped recovery at user's
5287-
* request before reaching the end. Better to preserve the
5288-
* file as it is, garbage and all, than lose the evidence if
5289-
* something goes wrong.
5290-
*/
5291-
(void) XLogFileCopy(partialfname, xlogfname, XLOG_SEG_SIZE);
5292-
XLogArchiveNotify(partialfname);
5293-
}
52945244
}
52955245
else
52965246
{
@@ -5942,6 +5892,7 @@ StartupXLOG(void)
59425892
XLogRecPtr RecPtr,
59435893
checkPointLoc,
59445894
EndOfLog;
5895+
TimeLineID EndOfLogTLI;
59455896
TimeLineID PrevTimeLineID;
59465897
XLogRecord *record;
59475898
TransactionId oldestActiveXID;
@@ -7032,6 +6983,15 @@ StartupXLOG(void)
70326983
record = ReadRecord(xlogreader, LastRec, PANIC, false);
70336984
EndOfLog = EndRecPtr;
70346985

6986+
/*
6987+
* EndOfLogTLI is the TLI in the filename of the XLOG segment containing
6988+
* the end-of-log. It could be different from the timeline that EndOfLog
6989+
* nominally belongs to, if there was a timeline switch in that segment,
6990+
* and we were reading the old wAL from a segment belonging to a higher
6991+
* timeline.
6992+
*/
6993+
EndOfLogTLI = xlogreader->readPageTLI;
6994+
70356995
/*
70366996
* Complain if we did not roll forward far enough to render the backup
70376997
* dump consistent. Note: it is indeed okay to look at the local variable
@@ -7131,7 +7091,7 @@ StartupXLOG(void)
71317091
* we will use that below.)
71327092
*/
71337093
if (ArchiveRecoveryRequested)
7134-
exitArchiveRecovery(xlogreader->readPageTLI, EndOfLog);
7094+
exitArchiveRecovery(EndOfLogTLI, EndOfLog);
71357095

71367096
/*
71377097
* Prepare to write WAL starting at EndOfLog position, and init xlog
@@ -7262,12 +7222,82 @@ StartupXLOG(void)
72627222
true);
72637223
}
72647224

7265-
/*
7266-
* Clean up any (possibly bogus) future WAL segments on the old timeline.
7267-
*/
72687225
if (ArchiveRecoveryRequested)
7226+
{
7227+
/*
7228+
* We switched to a new timeline. Clean up segments on the old
7229+
* timeline.
7230+
*
7231+
* If there are any higher-numbered segments on the old timeline,
7232+
* remove them. They might contain valid WAL, but they might also be
7233+
* pre-allocated files containing garbage. In any case, they are not
7234+
* part of the new timeline's history so we don't need them.
7235+
*/
72697236
RemoveNonParentXlogFiles(EndOfLog, ThisTimeLineID);
72707237

7238+
/*
7239+
* If the switch happened in the middle of a segment, what to do with
7240+
* the last, partial segment on the old timeline? If we don't archive
7241+
* it, and the server that created the WAL never archives it either
7242+
* (e.g. because it was hit by a meteor), it will never make it to the
7243+
* archive. That's OK from our point of view, because the new segment
7244+
* that we created with the new TLI contains all the WAL from the old
7245+
* timeline up to the switch point. But if you later try to do PITR to
7246+
* the "missing" WAL on the old timeline, recovery won't find it in
7247+
* the archive. It's physically present in the new file with new TLI,
7248+
* but recovery won't look there when it's recovering to the older
7249+
* timeline. On the other hand, if we archive the partial segment, and
7250+
* the original server on that timeline is still running and archives
7251+
* the completed version of the same segment later, it will fail. (We
7252+
* used to do that in 9.4 and below, and it caused such problems).
7253+
*
7254+
* As a compromise, we rename the last segment with the .partial
7255+
* suffix, and archive it. Archive recovery will never try to read
7256+
* .partial segments, so they will normally go unused. But in the odd
7257+
* PITR case, the administrator can copy them manually to the pg_xlog
7258+
* directory (removing the suffix). They can be useful in debugging,
7259+
* too.
7260+
*
7261+
* If a .done or .ready file already exists for the old timeline,
7262+
* however, we had already determined that the segment is complete,
7263+
* so we can let it be archived normally. (In particular, if it was
7264+
* restored from the archive to begin with, it's expected to have a
7265+
* .done file).
7266+
*/
7267+
if (EndOfLog % XLOG_SEG_SIZE != 0 && XLogArchivingActive())
7268+
{
7269+
char origfname[MAXFNAMELEN];
7270+
XLogSegNo endLogSegNo;
7271+
7272+
XLByteToPrevSeg(EndOfLog, endLogSegNo);
7273+
XLogFileName(origfname, EndOfLogTLI, endLogSegNo);
7274+
7275+
if (!XLogArchiveIsReadyOrDone(origfname))
7276+
{
7277+
char origpath[MAXPGPATH];
7278+
char partialfname[MAXFNAMELEN];
7279+
char partialpath[MAXPGPATH];
7280+
7281+
XLogFilePath(origpath, EndOfLogTLI, endLogSegNo);
7282+
snprintf(partialfname, MAXPGPATH, "%s.partial", origfname);
7283+
snprintf(partialpath, MAXPGPATH, "%s.partial", origpath);
7284+
7285+
/*
7286+
* Make sure there's no .done or .ready file for the .partial
7287+
* file.
7288+
*/
7289+
XLogArchiveCleanup(partialfname);
7290+
7291+
if (rename(origpath, partialpath) != 0)
7292+
ereport(ERROR,
7293+
(errcode_for_file_access(),
7294+
errmsg("could not rename file \"%s\" to \"%s\": %m",
7295+
origpath, partialpath)));
7296+
XLogArchiveNotify(partialfname);
7297+
}
7298+
}
7299+
}
7300+
72717301
/*
72727302
* Preallocate additional log files, if wanted.
72737303
*/

src/backend/access/transam/xlogarchive.c

+35
Original file line numberDiff line numberDiff line change
@@ -697,6 +697,41 @@ XLogArchiveIsBusy(const char *xlog)
697697
return true;
698698
}
699699

700+
/*
701+
* XLogArchiveIsReadyOrDone
702+
*
703+
* Check to see if an XLOG segment file has a .ready or .done file.
704+
* This is similar to XLogArchiveIsBusy(), but returns true if the file
705+
* is already archived or is about to be archived.
706+
*
707+
* This is currently only used at recovery. During normal operation this
708+
* would be racy: the file might get removed or marked with .ready as we're
709+
* checking it, or immediately after we return.
710+
*/
711+
bool
712+
XLogArchiveIsReadyOrDone(const char *xlog)
713+
{
714+
char archiveStatusPath[MAXPGPATH];
715+
struct stat stat_buf;
716+
717+
/* First check for .done --- this means archiver is done with it */
718+
StatusFilePath(archiveStatusPath, xlog, ".done");
719+
if (stat(archiveStatusPath, &stat_buf) == 0)
720+
return true;
721+
722+
/* check for .ready --- this means archiver is still busy with it */
723+
StatusFilePath(archiveStatusPath, xlog, ".ready");
724+
if (stat(archiveStatusPath, &stat_buf) == 0)
725+
return true;
726+
727+
/* Race condition --- maybe archiver just finished, so recheck */
728+
StatusFilePath(archiveStatusPath, xlog, ".done");
729+
if (stat(archiveStatusPath, &stat_buf) == 0)
730+
return true;
731+
732+
return false;
733+
}
734+
700735
/*
701736
* XLogArchiveIsReady
702737
*

src/include/access/xlog_internal.h

+1
Original file line numberDiff line numberDiff line change
@@ -305,6 +305,7 @@ extern void XLogArchiveForceDone(const char *xlog);
305305
extern bool XLogArchiveCheckDone(const char *xlog);
306306
extern bool XLogArchiveIsBusy(const char *xlog);
307307
extern bool XLogArchiveIsReady(const char *xlog);
308+
extern bool XLogArchiveIsReadyOrDone(const char *xlog);
308309
extern void XLogArchiveCleanup(const char *xlog);
309310

310311
#endif /* XLOG_INTERNAL_H */

0 commit comments

Comments
 (0)