Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit ed48e35

Browse files
committed
Clean up TAP tests' usage of wait_for_catchup().
By default, wait_for_catchup() waits for the replication connection to reach the primary's write LSN. That's fine, but in an apparent attempt to save one query round-trip, it was coded so that we executed pg_current_wal_lsn() again during each probe query. Thus, we presented the standby with a moving target to be reached. (While the test script itself couldn't be causing the write LSN to advance while it's blocked in wait_for_catchup(), it's plenty plausible that background activity such as autovacuum is emitting more WAL.) That could make the test take longer than necessary, and potentially it could mask bugs by allowing the standby to process more WAL than a strict interpretation of the test scenario allows. So, change wait_for_catchup() to do it "by the book", explicitly collecting the write LSN to wait for at the outset. Also, various call sites were instructing wait_for_catchup() to wait for the standby to reach the primary's insert LSN rather than its write LSN. This also seems like a bad idea. While in most test scenarios those are the same, if they are different then the inserted-but-not-yet-written WAL is not presently available to the standby. The test isn't doing anything to make it become so, so again we have the potential for unwanted test delay, perhaps even a test timeout. (Again, background activity would be needed to make this more than a hypothetical problem.) Hence, change the callers where necessary so that the wait target is always the primary's write LSN. While at it, simplify callers by making use of wait_for_catchup's default arguments wherever possible (the preceding change makes this possible in more places than it was before). And rewrite wait_for_catchup's documentation a bit. Patch by me; thanks to Julien Rouhaud for review. Discussion: https://postgr.es/m/2368336.1641843098@sss.pgh.pa.us
1 parent 269b532 commit ed48e35

13 files changed

+59
-84
lines changed

src/bin/pg_basebackup/t/020_pg_receivewal.pl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -290,7 +290,7 @@
290290
"CREATE_REPLICATION_SLOT $archive_slot PHYSICAL (RESERVE_WAL)",
291291
replication => 1);
292292
# Wait for standby catchup
293-
$primary->wait_for_catchup($standby, 'replay', $primary->lsn('write'));
293+
$primary->wait_for_catchup($standby);
294294
# Get a walfilename from before the promotion to make sure it is archived
295295
# after promotion
296296
my $standby_slot = $standby->slot($archive_slot);

src/bin/pg_rewind/t/007_standby_source.pl

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@
7474
"INSERT INTO tbl1 values ('in A, before promotion')");
7575
$node_a->safe_psql('postgres', 'CHECKPOINT');
7676

77-
my $lsn = $node_a->lsn('insert');
77+
my $lsn = $node_a->lsn('write');
7878
$node_a->wait_for_catchup('node_b', 'write', $lsn);
7979
$node_b->wait_for_catchup('node_c', 'write', $lsn);
8080

@@ -93,8 +93,7 @@
9393
"INSERT INTO tbl1 VALUES ('in A, after C was promoted')");
9494

9595
# make sure it's replicated to B before we continue
96-
$lsn = $node_a->lsn('insert');
97-
$node_a->wait_for_catchup('node_b', 'replay', $lsn);
96+
$node_a->wait_for_catchup('node_b');
9897

9998
# Also insert a new row in the standby, which won't be present in the
10099
# old primary.
@@ -161,8 +160,7 @@
161160
$node_a->safe_psql('postgres',
162161
"INSERT INTO tbl1 values ('in A, after rewind')");
163162

164-
$lsn = $node_a->lsn('insert');
165-
$node_b->wait_for_catchup('node_c', 'replay', $lsn);
163+
$node_b->wait_for_catchup('node_c', 'replay', $node_a->lsn('write'));
166164

167165
check_query(
168166
'SELECT * FROM tbl1',

src/bin/pg_rewind/t/008_min_recovery_point.pl

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -69,8 +69,7 @@
6969
$node_3->start;
7070

7171
# Wait until node 3 has connected and caught up
72-
my $lsn = $node_1->lsn('insert');
73-
$node_1->wait_for_catchup('node_3', 'replay', $lsn);
72+
$node_1->wait_for_catchup('node_3');
7473

7574
#
7675
# Swap the roles of node_1 and node_3, so that node_1 follows node_3.
@@ -106,8 +105,7 @@
106105
#
107106

108107
# make sure node_1 is full caught up with node_3 first
109-
$lsn = $node_3->lsn('insert');
110-
$node_3->wait_for_catchup('node_1', 'replay', $lsn);
108+
$node_3->wait_for_catchup('node_1');
111109

112110
$node_1->promote;
113111
# Force a checkpoint after promotion, like earlier.

src/test/perl/PostgreSQL/Test/Cluster.pm

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -2496,22 +2496,27 @@ sub lsn
24962496
24972497
=item $node->wait_for_catchup(standby_name, mode, target_lsn)
24982498
2499-
Wait for the node with application_name standby_name (usually from node->name,
2500-
also works for logical subscriptions)
2501-
until its replication location in pg_stat_replication equals or passes the
2502-
upstream's WAL insert point at the time this function is called. By default
2503-
the replay_lsn is waited for, but 'mode' may be specified to wait for any of
2504-
sent|write|flush|replay. The connection catching up must be in a streaming
2505-
state.
2499+
Wait for the replication connection with application_name standby_name until
2500+
its 'mode' replication column in pg_stat_replication equals or passes the
2501+
specified or default target_lsn. By default the replay_lsn is waited for,
2502+
but 'mode' may be specified to wait for any of sent|write|flush|replay.
2503+
The replication connection must be in a streaming state.
2504+
2505+
When doing physical replication, the standby is usually identified by
2506+
passing its PostgreSQL::Test::Cluster instance. When doing logical
2507+
replication, standby_name identifies a subscription.
2508+
2509+
The default value of target_lsn is $node->lsn('write'), which ensures
2510+
that the standby has caught up to what has been committed on the primary.
2511+
If you pass an explicit value of target_lsn, it should almost always be
2512+
the primary's write LSN; so this parameter is seldom needed except when
2513+
querying some intermediate replication node rather than the primary.
25062514
25072515
If there is no active replication connection from this peer, waits until
25082516
poll_query_until timeout.
25092517
25102518
Requires that the 'postgres' db exists and is accessible.
25112519
2512-
target_lsn may be any arbitrary lsn, but is typically $primary_node->lsn('insert').
2513-
If omitted, pg_current_wal_lsn() is used.
2514-
25152520
This is not a test. It die()s on failure.
25162521
25172522
=cut
@@ -2531,23 +2536,18 @@ sub wait_for_catchup
25312536
{
25322537
$standby_name = $standby_name->name;
25332538
}
2534-
my $lsn_expr;
2535-
if (defined($target_lsn))
2536-
{
2537-
$lsn_expr = "'$target_lsn'";
2538-
}
2539-
else
2539+
if (!defined($target_lsn))
25402540
{
2541-
$lsn_expr = 'pg_current_wal_lsn()';
2541+
$target_lsn = $self->lsn('write');
25422542
}
25432543
print "Waiting for replication conn "
25442544
. $standby_name . "'s "
25452545
. $mode
25462546
. "_lsn to pass "
2547-
. $lsn_expr . " on "
2547+
. $target_lsn . " on "
25482548
. $self->name . "\n";
25492549
my $query =
2550-
qq[SELECT $lsn_expr <= ${mode}_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name = '$standby_name';];
2550+
qq[SELECT '$target_lsn' <= ${mode}_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name = '$standby_name';];
25512551
$self->poll_query_until('postgres', $query)
25522552
or croak "timed out waiting for catchup";
25532553
print "done\n";

src/test/recovery/t/001_stream_rep.pl

Lines changed: 11 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -47,10 +47,9 @@
4747
"CREATE TABLE tab_int AS SELECT generate_series(1,1002) AS a");
4848

4949
# Wait for standbys to catch up
50-
$node_primary->wait_for_catchup($node_standby_1, 'replay',
51-
$node_primary->lsn('insert'));
52-
$node_standby_1->wait_for_catchup($node_standby_2, 'replay',
53-
$node_standby_1->lsn('replay'));
50+
my $primary_lsn = $node_primary->lsn('write');
51+
$node_primary->wait_for_catchup($node_standby_1, 'replay', $primary_lsn);
52+
$node_standby_1->wait_for_catchup($node_standby_2, 'replay', $primary_lsn);
5453

5554
my $result =
5655
$node_standby_1->safe_psql('postgres', "SELECT count(*) FROM tab_int");
@@ -67,10 +66,9 @@
6766
"CREATE SEQUENCE seq1; SELECT nextval('seq1')");
6867

6968
# Wait for standbys to catch up
70-
$node_primary->wait_for_catchup($node_standby_1, 'replay',
71-
$node_primary->lsn('insert'));
72-
$node_standby_1->wait_for_catchup($node_standby_2, 'replay',
73-
$node_standby_1->lsn('replay'));
69+
$primary_lsn = $node_primary->lsn('write');
70+
$node_primary->wait_for_catchup($node_standby_1, 'replay', $primary_lsn);
71+
$node_standby_1->wait_for_catchup($node_standby_2, 'replay', $primary_lsn);
7472

7573
$result = $node_standby_1->safe_psql('postgres', "SELECT * FROM seq1");
7674
print "standby 1: $result\n";
@@ -374,10 +372,10 @@ sub replay_check
374372
my $newval = $node_primary->safe_psql('postgres',
375373
'INSERT INTO replayed(val) SELECT coalesce(max(val),0) + 1 AS newval FROM replayed RETURNING val'
376374
);
377-
$node_primary->wait_for_catchup($node_standby_1, 'replay',
378-
$node_primary->lsn('insert'));
379-
$node_standby_1->wait_for_catchup($node_standby_2, 'replay',
380-
$node_standby_1->lsn('replay'));
375+
my $primary_lsn = $node_primary->lsn('write');
376+
$node_primary->wait_for_catchup($node_standby_1, 'replay', $primary_lsn);
377+
$node_standby_1->wait_for_catchup($node_standby_2, 'replay', $primary_lsn);
378+
381379
$node_standby_1->safe_psql('postgres',
382380
qq[SELECT 1 FROM replayed WHERE val = $newval])
383381
or die "standby_1 didn't replay primary value $newval";
@@ -481,8 +479,7 @@ sub replay_check
481479
my $newval = $node_primary->safe_psql('postgres',
482480
'INSERT INTO replayed(val) SELECT coalesce(max(val),0) + 1 AS newval FROM replayed RETURNING val'
483481
);
484-
$node_primary->wait_for_catchup($node_standby_2, 'replay',
485-
$node_primary->lsn('insert'));
482+
$node_primary->wait_for_catchup($node_standby_2);
486483
my $is_replayed = $node_standby_2->safe_psql('postgres',
487484
qq[SELECT 1 FROM replayed WHERE val = $newval]);
488485
is($is_replayed, qq(1), "standby_2 didn't replay primary value $newval");

src/test/recovery/t/004_timeline_switch.pl

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,7 @@
3838
"CREATE TABLE tab_int AS SELECT generate_series(1,1000) AS a");
3939

4040
# Wait until standby has replayed enough data on standby 1
41-
$node_primary->wait_for_catchup($node_standby_1, 'replay',
42-
$node_primary->lsn('write'));
41+
$node_primary->wait_for_catchup($node_standby_1);
4342

4443
# Stop and remove primary
4544
$node_primary->teardown_node;
@@ -64,8 +63,7 @@
6463
# to ensure that the timeline switch has been done.
6564
$node_standby_1->safe_psql('postgres',
6665
"INSERT INTO tab_int VALUES (generate_series(1001,2000))");
67-
$node_standby_1->wait_for_catchup($node_standby_2, 'replay',
68-
$node_standby_1->lsn('write'));
66+
$node_standby_1->wait_for_catchup($node_standby_2);
6967

7068
my $result =
7169
$node_standby_2->safe_psql('postgres', "SELECT count(*) FROM tab_int");
@@ -103,8 +101,7 @@
103101
$node_standby_3->start;
104102
$node_primary_2->safe_psql('postgres',
105103
"CREATE TABLE tab_int AS SELECT 1 AS a");
106-
$node_primary_2->wait_for_catchup($node_standby_3, 'replay',
107-
$node_primary_2->lsn('write'));
104+
$node_primary_2->wait_for_catchup($node_standby_3);
108105

109106
my $result_2 =
110107
$node_standby_3->safe_psql('postgres', "SELECT count(*) FROM tab_int");

src/test/recovery/t/010_logical_decoding_timelines.pl

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -88,8 +88,7 @@
8888
# db and associated slot.
8989
is($node_primary->psql('postgres', 'DROP DATABASE dropme'),
9090
0, 'dropped DB with logical slot OK on primary');
91-
$node_primary->wait_for_catchup($node_replica, 'replay',
92-
$node_primary->lsn('insert'));
91+
$node_primary->wait_for_catchup($node_replica);
9392
is( $node_replica->safe_psql(
9493
'postgres', q[SELECT 1 FROM pg_database WHERE datname = 'dropme']),
9594
'',

src/test/recovery/t/012_subtransactions.pl

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -103,8 +103,7 @@
103103
BEGIN;
104104
SELECT hs_subxids(127);
105105
COMMIT;");
106-
$node_primary->wait_for_catchup($node_standby, 'replay',
107-
$node_primary->lsn('insert'));
106+
$node_primary->wait_for_catchup($node_standby);
108107
$node_standby->psql(
109108
'postgres',
110109
"SELECT coalesce(sum(id),-1) FROM t_012_tbl",
@@ -150,8 +149,7 @@
150149
BEGIN;
151150
SELECT hs_subxids(127);
152151
PREPARE TRANSACTION 'xact_012_1';");
153-
$node_primary->wait_for_catchup($node_standby, 'replay',
154-
$node_primary->lsn('insert'));
152+
$node_primary->wait_for_catchup($node_standby);
155153
$node_standby->psql(
156154
'postgres',
157155
"SELECT coalesce(sum(id),-1) FROM t_012_tbl",
@@ -187,8 +185,7 @@
187185
BEGIN;
188186
SELECT hs_subxids(201);
189187
PREPARE TRANSACTION 'xact_012_1';");
190-
$node_primary->wait_for_catchup($node_standby, 'replay',
191-
$node_primary->lsn('insert'));
188+
$node_primary->wait_for_catchup($node_standby);
192189
$node_standby->psql(
193190
'postgres',
194191
"SELECT coalesce(sum(id),-1) FROM t_012_tbl",

src/test/recovery/t/015_promotion_pages.pl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@
4444
# problematic WAL records.
4545
$alpha->safe_psql('postgres', 'vacuum verbose test1');
4646
# Wait for last record to have been replayed on the standby.
47-
$alpha->wait_for_catchup($bravo, 'replay', $alpha->lsn('insert'));
47+
$alpha->wait_for_catchup($bravo);
4848

4949
# Now force a checkpoint on the standby. This seems unnecessary but for "some"
5050
# reason, the previous checkpoint on the primary does not reflect on the standby
@@ -60,7 +60,7 @@
6060
$alpha->safe_psql('postgres', 'truncate test2');
6161

6262
# Wait again for all records to be replayed.
63-
$alpha->wait_for_catchup($bravo, 'replay', $alpha->lsn('insert'));
63+
$alpha->wait_for_catchup($bravo);
6464

6565
# Do the promotion, which reinitializes minRecoveryPoint in the control
6666
# file so as WAL is replayed up to the end.

src/test/recovery/t/016_min_consistency.pl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ sub find_largest_lsn
7878
$primary->safe_psql('postgres', 'UPDATE test1 SET a = a + 1;');
7979

8080
# Wait for last record to have been replayed on the standby.
81-
$primary->wait_for_catchup($standby, 'replay', $primary->lsn('insert'));
81+
$primary->wait_for_catchup($standby);
8282

8383
# Fill in the standby's shared buffers with the data filled in
8484
# previously.
@@ -99,7 +99,7 @@ sub find_largest_lsn
9999
"SELECT pg_relation_filepath('test1'::regclass);");
100100

101101
# Wait for last record to have been replayed on the standby.
102-
$primary->wait_for_catchup($standby, 'replay', $primary->lsn('insert'));
102+
$primary->wait_for_catchup($standby);
103103

104104
# Issue a restart point on the standby now, which makes the checkpointer
105105
# update minRecoveryPoint.

src/test/recovery/t/019_replslot_limit.pl

Lines changed: 6 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -49,8 +49,7 @@
4949
$node_standby->start;
5050

5151
# Wait until standby has replayed enough data
52-
my $start_lsn = $node_primary->lsn('write');
53-
$node_primary->wait_for_catchup($node_standby, 'replay', $start_lsn);
52+
$node_primary->wait_for_catchup($node_standby);
5453

5554
# Stop standby
5655
$node_standby->stop;
@@ -84,8 +83,7 @@
8483
# The standby can reconnect to primary
8584
$node_standby->start;
8685

87-
$start_lsn = $node_primary->lsn('write');
88-
$node_primary->wait_for_catchup($node_standby, 'replay', $start_lsn);
86+
$node_primary->wait_for_catchup($node_standby);
8987

9088
$node_standby->stop;
9189

@@ -115,8 +113,7 @@
115113

116114
# The standby can reconnect to primary
117115
$node_standby->start;
118-
$start_lsn = $node_primary->lsn('write');
119-
$node_primary->wait_for_catchup($node_standby, 'replay', $start_lsn);
116+
$node_primary->wait_for_catchup($node_standby);
120117
$node_standby->stop;
121118

122119
# wal_keep_size overrides max_slot_wal_keep_size
@@ -135,8 +132,7 @@
135132

136133
# The standby can reconnect to primary
137134
$node_standby->start;
138-
$start_lsn = $node_primary->lsn('write');
139-
$node_primary->wait_for_catchup($node_standby, 'replay', $start_lsn);
135+
$node_primary->wait_for_catchup($node_standby);
140136
$node_standby->stop;
141137

142138
# Advance WAL again without checkpoint, reducing remain by 6 MB.
@@ -163,8 +159,7 @@
163159
# The standby still can connect to primary before a checkpoint
164160
$node_standby->start;
165161

166-
$start_lsn = $node_primary->lsn('write');
167-
$node_primary->wait_for_catchup($node_standby, 'replay', $start_lsn);
162+
$node_primary->wait_for_catchup($node_standby);
168163

169164
$node_standby->stop;
170165

@@ -334,7 +329,7 @@
334329
has_streaming => 1);
335330
$node_standby3->append_conf('postgresql.conf', "primary_slot_name = 'rep3'");
336331
$node_standby3->start;
337-
$node_primary3->wait_for_catchup($node_standby3->name, 'replay');
332+
$node_primary3->wait_for_catchup($node_standby3);
338333
my $senderpid = $node_primary3->safe_psql('postgres',
339334
"SELECT pid FROM pg_stat_activity WHERE backend_type = 'walsender'");
340335
like($senderpid, qr/^[0-9]+$/, "have walsender pid $senderpid");

src/test/recovery/t/021_row_visibility.pl

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -76,8 +76,7 @@
7676
#
7777
$node_primary->psql('postgres',
7878
"INSERT INTO test_visibility VALUES ('first insert')");
79-
$node_primary->wait_for_catchup($node_standby, 'replay',
80-
$node_primary->lsn('insert'));
79+
$node_primary->wait_for_catchup($node_standby);
8180

8281
ok( send_query_and_wait(
8382
\%psql_standby,
@@ -98,8 +97,7 @@
9897
'UPDATE');
9998

10099
$node_primary->psql('postgres', "SELECT txid_current();"); # ensure WAL flush
101-
$node_primary->wait_for_catchup($node_standby, 'replay',
102-
$node_primary->lsn('insert'));
100+
$node_primary->wait_for_catchup($node_standby);
103101

104102
ok( send_query_and_wait(
105103
\%psql_standby,
@@ -112,8 +110,7 @@
112110
#
113111
ok(send_query_and_wait(\%psql_primary, q[COMMIT;], qr/^COMMIT$/m), 'COMMIT');
114112

115-
$node_primary->wait_for_catchup($node_standby, 'replay',
116-
$node_primary->lsn('insert'));
113+
$node_primary->wait_for_catchup($node_standby);
117114

118115
ok( send_query_and_wait(
119116
\%psql_standby,
@@ -142,8 +139,7 @@
142139
qr/^PREPARE TRANSACTION$/m),
143140
'prepared will_abort');
144141

145-
$node_primary->wait_for_catchup($node_standby, 'replay',
146-
$node_primary->lsn('insert'));
142+
$node_primary->wait_for_catchup($node_standby);
147143

148144
ok( send_query_and_wait(
149145
\%psql_standby,
@@ -154,8 +150,7 @@
154150
# For some variation, finish prepared xacts via separate connections
155151
$node_primary->safe_psql('postgres', "COMMIT PREPARED 'will_commit';");
156152
$node_primary->safe_psql('postgres', "ROLLBACK PREPARED 'will_abort';");
157-
$node_primary->wait_for_catchup($node_standby, 'replay',
158-
$node_primary->lsn('insert'));
153+
$node_primary->wait_for_catchup($node_standby);
159154

160155
ok( send_query_and_wait(
161156
\%psql_standby,

0 commit comments

Comments
 (0)