Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 15c5409

Browse files
dwsteeleCommitfest Bot
authored and
Commitfest Bot
committed
Return pg_control from pg_backup_stop().
Harden recovery by returning a copy of pg_control from pg_backup_stop() that has a flag set to prevent recovery if the backup_label file is missing. Instead of backup software copying pg_control from PGDATA, it stores an updated version that is returned from pg_backup_stop(). This is better for the following reasons: * The user can no longer remove backup_label and get what looks like a successful recovery (while almost certainly causing corruption). If backup_label is removed the cluster will not start. The user may try pg_resetwal, but that tool makes it pretty clear that corruption will result from its use. * We don't need to worry about backup software seeing a torn copy of pg_control, since Postgres can safely read it out of memory and provide a valid copy via pg_backup_stop(). This solves torn reads without needing to write pg_control via a temp file, which may affect performance on a standby. * For backup from standby, we no longer need to instruct the backup software to copy pg_control last. In fact the backup software should not copy pg_control from PGDATA at all. These changes have no impact on current backup software and they are free to use the pg_control available from pg_stop_backup() or continue to use pg_control from PGDATA. Of course they will miss the benefits of getting a consistent copy of pg_control and the backup_label checking, but will be no worse off than before. Catalog version bump is required.
1 parent bb87edc commit 15c5409

File tree

6 files changed

+101
-15
lines changed

6 files changed

+101
-15
lines changed

doc/src/sgml/backup.sgml

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1027,9 +1027,12 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true);
10271027
values. The second of these fields should be written to a file named
10281028
<filename>backup_label</filename> in the root directory of the backup. The
10291029
third field should be written to a file named
1030-
<filename>tablespace_map</filename> unless the field is empty. These files are
1030+
<filename>tablespace_map</filename> unless the field is empty. The fourth
1031+
field should be written into a file named
1032+
<filename>global/pg_control</filename> (overwriting the existing file when
1033+
present). These files are
10311034
vital to the backup working and must be written byte for byte without
1032-
modification, which may require opening the file in binary mode.
1035+
modification, which will require opening the file in binary mode.
10331036
</para>
10341037
</listitem>
10351038
<listitem>
@@ -1101,7 +1104,16 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true);
11011104
</para>
11021105

11031106
<para>
1104-
You should, however, omit from the backup the files within the
1107+
You should exclude <filename>global/pg_control</filename> from your backup
1108+
and put the contents of the <parameter>controlfile</parameter> column
1109+
returned from <function>pg_backup_stop</function> in your backup at
1110+
<filename>global/pg_control</filename>. This version of pg_control contains
1111+
safeguards against recovery without backup_label present and is guaranteed
1112+
not to be torn.
1113+
</para>
1114+
1115+
<para>
1116+
You should also omit from the backup the files within the
11051117
cluster's <filename>pg_wal/</filename> subdirectory. This
11061118
slight adjustment is worthwhile because it reduces the risk
11071119
of mistakes when restoring. This is easy to arrange if

doc/src/sgml/func.sgml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29156,8 +29156,9 @@ stats_timestamp | 2025-03-24 13:55:47.796698+01
2915629156
The result of the function is a single record.
2915729157
The <parameter>lsn</parameter> column holds the backup's ending
2915829158
write-ahead log location (which again can be ignored). The second
29159-
column returns the contents of the backup label file, and the third
29160-
column returns the contents of the tablespace map file. These must be
29159+
column returns the contents of the backup label file, the third column
29160+
returns the contents of the tablespace map file, and the fourth column
29161+
returns the contents of pg_control. These must be
2916129162
stored as part of the backup and are required as part of the restore
2916229163
process.
2916329164
</para>

src/backend/access/transam/xlogfuncs.c

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -113,22 +113,26 @@ pg_backup_start(PG_FUNCTION_ARGS)
113113
*
114114
* The backup_label contains the user-supplied label string (typically this
115115
* would be used to tell where the backup dump will be stored), the starting
116-
* time, starting WAL location for the dump and so on. It is the caller's
117-
* responsibility to write the backup_label and tablespace_map files in the
118-
* data folder that will be restored from this backup.
116+
* time, starting WAL location for the dump and so on. The pg_control file
117+
* contains a consistent copy of pg_control that also has a safeguard against
118+
* being used without backup_label. It is the caller's responsibility to write
119+
* the backup_label, pg_control, and tablespace_map files in the data folder
120+
* that will be restored from this backup.
119121
*
120122
* Permission checking for this function is managed through the normal
121123
* GRANT system.
122124
*/
123125
Datum
124126
pg_backup_stop(PG_FUNCTION_ARGS)
125127
{
126-
#define PG_BACKUP_STOP_V2_COLS 3
128+
#define PG_BACKUP_STOP_V2_COLS 4
127129
TupleDesc tupdesc;
128130
Datum values[PG_BACKUP_STOP_V2_COLS] = {0};
129131
bool nulls[PG_BACKUP_STOP_V2_COLS] = {0};
130132
bool waitforarchive = PG_GETARG_BOOL(0);
131133
char *backup_label;
134+
bytea *pg_control_bytea;
135+
uint8_t pg_control[PG_CONTROL_FILE_SIZE];
132136
SessionBackupState status = get_backup_status();
133137

134138
/* Initialize attributes information in the tuple descriptor */
@@ -150,9 +154,17 @@ pg_backup_stop(PG_FUNCTION_ARGS)
150154
/* Build the contents of backup_label */
151155
backup_label = build_backup_content(backup_state, false);
152156

157+
/* Build the contents of pg_control */
158+
backup_control_file(pg_control);
159+
160+
pg_control_bytea = (bytea *) palloc(PG_CONTROL_FILE_SIZE + VARHDRSZ);
161+
SET_VARSIZE(pg_control_bytea, PG_CONTROL_FILE_SIZE + VARHDRSZ);
162+
memcpy(VARDATA(pg_control_bytea), pg_control, PG_CONTROL_FILE_SIZE);
163+
153164
values[0] = LSNGetDatum(backup_state->stoppoint);
154165
values[1] = CStringGetTextDatum(backup_label);
155166
values[2] = CStringGetTextDatum(tablespace_map->data);
167+
values[3] = PointerGetDatum(pg_control_bytea);
156168

157169
/* Deallocate backup-related variables */
158170
pfree(backup_label);

src/backend/catalog/system_functions.sql

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -390,7 +390,7 @@ CREATE OR REPLACE FUNCTION
390390

391391
CREATE OR REPLACE FUNCTION pg_backup_stop (
392392
wait_for_archive boolean DEFAULT true, OUT lsn pg_lsn,
393-
OUT labelfile text, OUT spcmapfile text)
393+
OUT labelfile text, OUT spcmapfile text, OUT controlfile bytea)
394394
RETURNS record STRICT VOLATILE LANGUAGE internal as 'pg_backup_stop'
395395
PARALLEL RESTRICTED;
396396

src/include/catalog/pg_proc.dat

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6699,8 +6699,8 @@
66996699
{ oid => '2739', descr => 'finish taking an online backup',
67006700
proname => 'pg_backup_stop', provolatile => 'v', proparallel => 'r',
67016701
prorettype => 'record', proargtypes => 'bool',
6702-
proallargtypes => '{bool,pg_lsn,text,text}', proargmodes => '{i,o,o,o}',
6703-
proargnames => '{wait_for_archive,lsn,labelfile,spcmapfile}',
6702+
proallargtypes => '{bool,pg_lsn,text,text,bytea}', proargmodes => '{i,o,o,o,o}',
6703+
proargnames => '{wait_for_archive,lsn,labelfile,spcmapfile,controlfile}',
67046704
prosrc => 'pg_backup_stop' },
67056705
{ oid => '3436', descr => 'promote standby server',
67066706
proname => 'pg_promote', provolatile => 'v', prorettype => 'bool',

src/test/recovery/t/042_low_level_backup.pl

Lines changed: 64 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,42 @@
1313
use PostgreSQL::Test::Utils;
1414
use Test::More;
1515

16+
# Decode hex to binary
17+
sub decode_hex
18+
{
19+
my ($encoded) = @_;
20+
my $decoded;
21+
22+
$encoded =~ s/^\s+|\s+$//g;
23+
24+
for (my $idx = 0; $idx < length($encoded); $idx += 2)
25+
{
26+
$decoded .= pack('C', hex(substr($encoded, $idx, 2)));
27+
}
28+
29+
return $decoded;
30+
}
31+
32+
# Get backup_label/pg_control from pg_stop_backup()
33+
sub stop_backup_result
34+
{
35+
my ($psql) = @_;
36+
37+
my $encoded = $psql->query_safe(
38+
"select encode(labelfile::bytea, 'hex') || ',' || " .
39+
" encode(controlfile, 'hex')" .
40+
" from pg_backup_stop()");
41+
42+
my @result;
43+
44+
foreach my $column (split(',', $encoded))
45+
{
46+
push(@result, decode_hex($column));
47+
}
48+
49+
return @result;
50+
}
51+
1652
# Start primary node with archiving.
1753
my $node_primary = PostgreSQL::Test::Cluster->new('primary');
1854
$node_primary->init(has_archiving => 1, allows_streaming => 1);
@@ -80,8 +116,7 @@
80116
'SELECT pg_walfile_name(pg_current_wal_lsn())');
81117

82118
# Stop backup and get backup_label, the last segment is archived.
83-
my $backup_label =
84-
$psql->query_safe("select labelfile from pg_backup_stop()");
119+
(my $backup_label, my $pg_control) = stop_backup_result($psql);
85120

86121
$psql->quit;
87122

@@ -118,10 +153,36 @@
118153
$node_replica->teardown_node;
119154
$node_replica->clean_node;
120155

156+
# Save only pg_control into the backup to demonstrate that pg_control returned
157+
# from pg_stop_backup() will only perform recovery when backup_label is present.
158+
open(my $fh, ">", "$backup_dir/global/pg_control")
159+
or die "could not open pg_control";
160+
binmode($fh);
161+
syswrite($fh, $pg_control);
162+
close($fh);
163+
164+
$node_replica = PostgreSQL::Test::Cluster->new('replica_fail2');
165+
$node_replica->init_from_backup($node_primary, $backup_name,
166+
has_restoring => 1);
167+
168+
my $res = run_log(
169+
[
170+
'pg_ctl', '-D', $node_replica->data_dir, '-l',
171+
$node_replica->logfile, 'start'
172+
]);
173+
ok(!$res, 'invalid recovery startup fails');
174+
175+
my $logfile = slurp_file($node_replica->logfile());
176+
ok($logfile =~ qr/could not find backup_label required for recovery/,
177+
'could not find backup_label required for recovery');
178+
179+
$node_replica->teardown_node;
180+
$node_replica->clean_node;
181+
121182
# Save backup_label into the backup directory and recover using the primary's
122183
# archive. This time recovery will succeed and the canary table will be
123184
# present.
124-
open my $fh, ">>", "$backup_dir/backup_label"
185+
open $fh, ">>", "$backup_dir/backup_label"
125186
or die "could not open backup_label";
126187
# Binary mode is required for Windows, as the backup_label parsing is not
127188
# able to cope with CRLFs.

0 commit comments

Comments
 (0)