Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit dde70cc

Browse files
Emit cascaded standby message on shutdown only when appropriate.
Adds additional test for active walsenders and closes a race condition for when we failover when a new walsender was connecting. Reported and fixed bu Fujii Masao. Review by Heikki Linnakangas
1 parent 39039e6 commit dde70cc

File tree

2 files changed

+32
-2
lines changed

2 files changed

+32
-2
lines changed

src/backend/postmaster/postmaster.c

+3-2
Original file line numberDiff line numberDiff line change
@@ -2328,10 +2328,11 @@ reaper(SIGNAL_ARGS)
23282328
* XXX should avoid the need for disconnection. When we do,
23292329
* am_cascading_walsender should be replaced with RecoveryInProgress()
23302330
*/
2331-
if (max_wal_senders > 0)
2331+
if (max_wal_senders > 0 && CountChildren(BACKEND_TYPE_WALSND) > 0)
23322332
{
23332333
ereport(LOG,
2334-
(errmsg("terminating all walsender processes to force cascaded standby(s) to update timeline and reconnect")));
2334+
(errmsg("terminating all walsender processes to force cascaded "
2335+
"standby(s) to update timeline and reconnect")));
23352336
SignalSomeChildren(SIGUSR2, BACKEND_TYPE_WALSND);
23362337
}
23372338

src/backend/replication/walsender.c

+29
Original file line numberDiff line numberDiff line change
@@ -368,6 +368,35 @@ StartReplication(StartReplicationCmd *cmd)
368368
MarkPostmasterChildWalSender();
369369
SendPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE);
370370

371+
/*
372+
* When promoting a cascading standby, postmaster sends SIGUSR2 to
373+
* any cascading walsenders to kill them. But there is a corner-case where
374+
* such walsender fails to receive SIGUSR2 and survives a standby promotion
375+
* unexpectedly. This happens when postmaster sends SIGUSR2 before
376+
* the walsender marks itself as a WAL sender, because postmaster sends
377+
* SIGUSR2 to only the processes marked as a WAL sender.
378+
*
379+
* To avoid this corner-case, if recovery is NOT in progress even though
380+
* the walsender is cascading one, we do the same thing as SIGUSR2 signal
381+
* handler does, i.e., set walsender_ready_to_stop to true. Which causes
382+
* the walsender to end later.
383+
*
384+
* When terminating cascading walsenders, usually postmaster writes
385+
* the log message announcing the terminations. But there is a race condition
386+
* here. If there is no walsender except this process before reaching here,
387+
* postmaster thinks that there is no walsender and suppresses that
388+
* log message. To handle this case, we always emit that log message here.
389+
* This might cause duplicate log messages, but which is less likely to happen,
390+
* so it's not worth writing some code to suppress them.
391+
*/
392+
if (am_cascading_walsender && !RecoveryInProgress())
393+
{
394+
ereport(LOG,
395+
(errmsg("terminating walsender process to force cascaded standby "
396+
"to update timeline and reconnect")));
397+
walsender_ready_to_stop = true;
398+
}
399+
371400
/*
372401
* We assume here that we're logging enough information in the WAL for
373402
* log-shipping, since this is checked in PostmasterMain().

0 commit comments

Comments
 (0)