Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 15251c0

Browse files
committed
Pause recovery for insufficient parameter settings
When certain parameters are changed on a physical replication primary, this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL record. The standby then checks whether its own settings are at least as big as the ones on the primary. If not, the standby shuts down with a fatal error. This patch changes this behavior for hot standbys to pause recovery at that point instead. That allows read traffic on the standby to continue while database administrators figure out next steps. When recovery is unpaused, the server shuts down (as before). The idea is to fix the parameters while recovery is paused and then restart when there is a maintenance window. Reviewed-by: Sergei Kornilov <sk@zsrv.org> Discussion: https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com
1 parent 708d165 commit 15251c0

File tree

2 files changed

+93
-17
lines changed

2 files changed

+93
-17
lines changed

doc/src/sgml/high-availability.sgml

+39-12
Original file line numberDiff line numberDiff line change
@@ -2129,18 +2129,14 @@ LOG: database system is ready to accept read only connections
21292129
</para>
21302130

21312131
<para>
2132-
The setting of some parameters on the standby will need reconfiguration
2133-
if they have been changed on the primary. For these parameters,
2134-
the value on the standby must
2135-
be equal to or greater than the value on the primary.
2136-
Therefore, if you want to increase these values, you should do so on all
2137-
standby servers first, before applying the changes to the primary server.
2138-
Conversely, if you want to decrease these values, you should do so on the
2139-
primary server first, before applying the changes to all standby servers.
2140-
If these parameters
2141-
are not set high enough then the standby will refuse to start.
2142-
Higher values can then be supplied and the server
2143-
restarted to begin recovery again. These parameters are:
2132+
The settings of some parameters determine the size of shared memory for
2133+
tracking transaction IDs, locks, and prepared transactions. These shared
2134+
memory structures must be no smaller on a standby than on the primary in
2135+
order to ensure that the standby does not run out of shared memory during
2136+
recovery. For example, if the primary had used a prepared transaction but
2137+
the standby had not allocated any shared memory for tracking prepared
2138+
transactions, then recovery could not continue until the standby's
2139+
configuration is changed. The parameters affected are:
21442140

21452141
<itemizedlist>
21462142
<listitem>
@@ -2169,6 +2165,37 @@ LOG: database system is ready to accept read only connections
21692165
</para>
21702166
</listitem>
21712167
</itemizedlist>
2168+
2169+
The easiest way to ensure this does not become a problem is to have these
2170+
parameters set on the standbys to values equal to or greater than on the
2171+
primary. Therefore, if you want to increase these values, you should do
2172+
so on all standby servers first, before applying the changes to the
2173+
primary server. Conversely, if you want to decrease these values, you
2174+
should do so on the primary server first, before applying the changes to
2175+
all standby servers. Keep in mind that when a standby is promoted, it
2176+
becomes the new reference for the required parameter settings for the
2177+
standbys that follow it. Therefore, to avoid this becoming a problem
2178+
during a switchover or failover, it is recommended to keep these settings
2179+
the same on all standby servers.
2180+
</para>
2181+
2182+
<para>
2183+
The WAL tracks changes to these parameters on the
2184+
primary. If a hot standby processes WAL that indicates that the current
2185+
value on the primary is higher than its own value, it will log a warning
2186+
and pause recovery, for example:
2187+
<screen>
2188+
WARNING: hot standby is not possible because of insufficient parameter settings
2189+
DETAIL: max_connections = 80 is a lower setting than on the primary server, where its value was 100.
2190+
LOG: recovery has paused
2191+
DETAIL: If recovery is unpaused, the server will shut down.
2192+
HINT: You can then restart the server after making the necessary configuration changes.
2193+
</screen>
2194+
At that point, the settings on the standby need to be updated and the
2195+
instance restarted before recovery can continue. If the standby is not a
2196+
hot standby, then when it encounters the incompatible parameter change, it
2197+
will shut down immediately without pausing, since there is then no value
2198+
in keeping it up.
21722199
</para>
21732200

21742201
<para>

src/backend/access/transam/xlog.c

+54-5
Original file line numberDiff line numberDiff line change
@@ -6261,12 +6261,61 @@ static void
62616261
RecoveryRequiresIntParameter(const char *param_name, int currValue, int minValue)
62626262
{
62636263
if (currValue < minValue)
6264-
ereport(ERROR,
6264+
{
6265+
if (LocalHotStandbyActive)
6266+
{
6267+
bool warned_for_promote = false;
6268+
6269+
ereport(WARNING,
6270+
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
6271+
errmsg("hot standby is not possible because of insufficient parameter settings"),
6272+
errdetail("%s = %d is a lower setting than on the primary server, where its value was %d.",
6273+
param_name,
6274+
currValue,
6275+
minValue)));
6276+
6277+
SetRecoveryPause(true);
6278+
6279+
ereport(LOG,
6280+
(errmsg("recovery has paused"),
6281+
errdetail("If recovery is unpaused, the server will shut down."),
6282+
errhint("You can then restart the server after making the necessary configuration changes.")));
6283+
6284+
while (RecoveryIsPaused())
6285+
{
6286+
HandleStartupProcInterrupts();
6287+
6288+
if (CheckForStandbyTrigger())
6289+
{
6290+
if (!warned_for_promote)
6291+
ereport(WARNING,
6292+
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
6293+
errmsg("promotion is not possible because of insufficient parameter settings"),
6294+
/* Repeat the detail from above so it's easy to find in the log. */
6295+
errdetail("%s = %d is a lower setting than on the primary server, where its value was %d.",
6296+
param_name,
6297+
currValue,
6298+
minValue),
6299+
errhint("Restart the server after making the necessary configuration changes.")));
6300+
warned_for_promote = true;
6301+
}
6302+
6303+
pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE);
6304+
pg_usleep(1000000L); /* 1000 ms */
6305+
pgstat_report_wait_end();
6306+
}
6307+
}
6308+
6309+
ereport(FATAL,
62656310
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
6266-
errmsg("hot standby is not possible because %s = %d is a lower setting than on the primary server (its value was %d)",
6267-
param_name,
6268-
currValue,
6269-
minValue)));
6311+
errmsg("recovery aborted because of insufficient parameter settings"),
6312+
/* Repeat the detail from above so it's easy to find in the log. */
6313+
errdetail("%s = %d is a lower setting than on the primary server, where its value was %d.",
6314+
param_name,
6315+
currValue,
6316+
minValue),
6317+
errhint("You can restart the server after making the necessary configuration changes.")));
6318+
}
62706319
}
62716320

62726321
/*

0 commit comments

Comments
 (0)