Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit fef5b47

Browse files
committed
Ensure that a standby is able to follow a primary on a newer timeline.
Commit 709d003 refactored WAL-reading code, but accidentally caused WalSndSegmentOpen() to fail to follow a timeline switch while reading from a historic timeline. This issue caused a standby to fail to follow a primary on a newer timeline when WAL archiving is enabled. If there is a timeline switch within the segment, WalSndSegmentOpen() should read from the WAL segment belonging to the new timeline. But previously since it failed to follow a timeline switch, it tried to read the WAL segment with old timeline. When WAL archiving is enabled, that WAL segment with old timeline doesn't exist because it's renamed to .partial. This leads a primary to have tried to read non-existent WAL segment, and which caused replication to faill with the error "ERROR: requested WAL segment ... has already been removed". This commit fixes WalSndSegmentOpen() so that it's able to follow a timeline switch, to ensure that a standby is able to follow a primary on a newer timeline even when WAL archiving is enabled. This commit also adds the regression test to check whether a standby is able to follow a primary on a newer timeline when WAL archiving is enabled. Back-patch to v13 where the bug was introduced. Reported-by: Kyotaro Horiguchi Author: Kyotaro Horiguchi, tweaked by Fujii Masao Reviewed-by: Alvaro Herrera, Fujii Masao Discussion: https://postgr.es/m/20201209.174314.282492377848029776.horikyota.ntt@gmail.com
1 parent aef8948 commit fef5b47

File tree

2 files changed

+40
-4
lines changed

2 files changed

+40
-4
lines changed

src/backend/replication/walsender.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -2491,7 +2491,7 @@ WalSndSegmentOpen(XLogReaderState *state, XLogSegNo nextSegNo,
24912491
XLogSegNo endSegNo;
24922492

24932493
XLByteToSeg(sendTimeLineValidUpto, endSegNo, state->segcxt.ws_segsize);
2494-
if (state->seg.ws_segno == endSegNo)
2494+
if (nextSegNo == endSegNo)
24952495
*tli_p = sendTimeLineNextTLI;
24962496
}
24972497

src/test/recovery/t/004_timeline_switch.pl

+39-3
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,16 @@
11
# Test for timeline switch
2-
# Ensure that a cascading standby is able to follow a newly-promoted standby
3-
# on a new timeline.
42
use strict;
53
use warnings;
64
use File::Path qw(rmtree);
75
use PostgresNode;
86
use TestLib;
9-
use Test::More tests => 2;
7+
use Test::More tests => 3;
108

119
$ENV{PGDATABASE} = 'postgres';
1210

11+
# Ensure that a cascading standby is able to follow a newly-promoted standby
12+
# on a new timeline.
13+
1314
# Initialize primary node
1415
my $node_primary = get_new_node('primary');
1516
$node_primary->init(allows_streaming => 1);
@@ -66,3 +67,38 @@
6667
my $result =
6768
$node_standby_2->safe_psql('postgres', "SELECT count(*) FROM tab_int");
6869
is($result, qq(2000), 'check content of standby 2');
70+
71+
72+
# Ensure that a standby is able to follow a primary on a newer timeline
73+
# when WAL archiving is enabled.
74+
75+
# Initialize primary node
76+
my $node_primary_2 = get_new_node('primary_2');
77+
$node_primary_2->init(allows_streaming => 1, has_archiving => 1);
78+
$node_primary_2->start;
79+
80+
# Take backup
81+
$node_primary_2->backup($backup_name);
82+
83+
# Create standby node
84+
my $node_standby_3 = get_new_node('standby_3');
85+
$node_standby_3->init_from_backup($node_primary_2, $backup_name,
86+
has_streaming => 1);
87+
88+
# Restart primary node in standby mode and promote it, switching it
89+
# to a new timeline.
90+
$node_primary_2->set_standby_mode;
91+
$node_primary_2->restart;
92+
$node_primary_2->promote;
93+
94+
# Start standby node, create some content on primary and check its presence
95+
# in standby, to ensure that the timeline switch has been done.
96+
$node_standby_3->start;
97+
$node_primary_2->safe_psql('postgres',
98+
"CREATE TABLE tab_int AS SELECT 1 AS a");
99+
$node_primary_2->wait_for_catchup($node_standby_3, 'replay',
100+
$node_primary_2->lsn('write'));
101+
102+
my $result_2 =
103+
$node_standby_3->safe_psql('postgres', "SELECT count(*) FROM tab_int");
104+
is($result_2, qq(1), 'check content of standby 3');

0 commit comments

Comments
 (0)