Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 0668c84

Browse files
committed
Fix bug in cancellation of non-exclusive backup to avoid assertion failure.
Previously an assertion failure occurred when pg_stop_backup() for non-exclusive backup was aborted while it's waiting for WAL files to be archived. This assertion failure happened in do_pg_abort_backup() which was called when a non-exclusive backup was canceled. do_pg_abort_backup() assumes that there is at least one non-exclusive backup running when it's called. But pg_stop_backup() can be canceled even after it marks the end of non-exclusive backup (e.g., during waiting for WAL archiving). This broke the assumption that do_pg_abort_backup() relies on, and which caused an assertion failure. This commit changes do_pg_abort_backup() so that it does nothing when non-exclusive backup has been already marked as completed. That is, the asssumption is also changed, and do_pg_abort_backup() now can handle even the case where it's called when there is no running backup. Backpatch to 9.6 where SQL-callable non-exclusive backup was added. Author: Masahiko Sawada and Michael Paquier Reviewed-By: Robert Haas and Fujii Masao Discussion: https://www.postgresql.org/message-id/CAD21AoD2L1Fu2c==gnVASMyFAAaq3y-AQ2uEVj-zTCGFFjvmDg@mail.gmail.com
1 parent 986a915 commit 0668c84

File tree

2 files changed

+35
-6
lines changed

2 files changed

+35
-6
lines changed

src/backend/access/transam/xlog.c

Lines changed: 32 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10275,13 +10275,20 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
1027510275
/*
1027610276
* Mark that start phase has correctly finished for an exclusive backup.
1027710277
* Session-level locks are updated as well to reflect that state.
10278+
*
10279+
* Note that CHECK_FOR_INTERRUPTS() must not occur while updating
10280+
* backup counters and session-level lock. Otherwise they can be
10281+
* updated inconsistently, and which might cause do_pg_abort_backup()
10282+
* to fail.
1027810283
*/
1027910284
if (exclusive)
1028010285
{
1028110286
WALInsertLockAcquireExclusive();
1028210287
XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_IN_PROGRESS;
10283-
WALInsertLockRelease();
10288+
10289+
/* Set session-level lock */
1028410290
sessionBackupState = SESSION_BACKUP_EXCLUSIVE;
10291+
WALInsertLockRelease();
1028510292
}
1028610293
else
1028710294
sessionBackupState = SESSION_BACKUP_NON_EXCLUSIVE;
@@ -10489,7 +10496,11 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
1048910496
}
1049010497

1049110498
/*
10492-
* OK to update backup counters and forcePageWrites
10499+
* OK to update backup counters, forcePageWrites and session-level lock.
10500+
*
10501+
* Note that CHECK_FOR_INTERRUPTS() must not occur while updating them.
10502+
* Otherwise they can be updated inconsistently, and which might cause
10503+
* do_pg_abort_backup() to fail.
1049310504
*/
1049410505
WALInsertLockAcquireExclusive();
1049510506
if (exclusive)
@@ -10513,11 +10524,20 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
1051310524
{
1051410525
XLogCtl->Insert.forcePageWrites = false;
1051510526
}
10516-
WALInsertLockRelease();
1051710527

10518-
/* Clean up session-level lock */
10528+
/*
10529+
* Clean up session-level lock.
10530+
*
10531+
* You might think that WALInsertLockRelease() can be called
10532+
* before cleaning up session-level lock because session-level
10533+
* lock doesn't need to be protected with WAL insertion lock.
10534+
* But since CHECK_FOR_INTERRUPTS() can occur in it,
10535+
* session-level lock must be cleaned up before it.
10536+
*/
1051910537
sessionBackupState = SESSION_BACKUP_NONE;
1052010538

10539+
WALInsertLockRelease();
10540+
1052110541
/*
1052210542
* Read and parse the START WAL LOCATION line (this code is pretty crude,
1052310543
* but we are not expecting any variability in the file format).
@@ -10750,8 +10770,16 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
1075010770
void
1075110771
do_pg_abort_backup(void)
1075210772
{
10773+
/*
10774+
* Quick exit if session is not keeping around a non-exclusive backup
10775+
* already started.
10776+
*/
10777+
if (sessionBackupState == SESSION_BACKUP_NONE)
10778+
return;
10779+
1075310780
WALInsertLockAcquireExclusive();
1075410781
Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
10782+
Assert(sessionBackupState == SESSION_BACKUP_NON_EXCLUSIVE);
1075510783
XLogCtl->Insert.nonExclusiveBackups--;
1075610784

1075710785
if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&

src/backend/replication/basebackup.c

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
140140
* Once do_pg_start_backup has been called, ensure that any failure causes
141141
* us to abort the backup so we don't "leak" a backup counter. For this
142142
* reason, *all* functionality between do_pg_start_backup() and
143-
* do_pg_stop_backup() should be inside the error cleanup block!
143+
* the end of do_pg_stop_backup() should be inside the error cleanup block!
144144
*/
145145

146146
PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
@@ -249,10 +249,11 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
249249
else
250250
pq_putemptymessage('c'); /* CopyDone */
251251
}
252+
253+
endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
252254
}
253255
PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
254256

255-
endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli);
256257

257258
if (opt->includewal)
258259
{

0 commit comments

Comments
 (0)