Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 3887e94

Browse files
committed
Check for too many postmaster children before spawning a bgworker.
The postmaster's code path for spawning a bgworker neglected to check whether we already have the max number of live child processes. That's a bit hard to hit, since it would necessarily be a transient condition; but if we do, AssignPostmasterChildSlot() fails causing a postmaster crash, as seen in a report from Bhargav Kamineni. To fix, invoke canAcceptConnections() in the bgworker code path, as we do in the other code paths that spawn children. Since we don't want the same pmState tests in this case, add a child-process-type parameter to canAcceptConnections() so that it can know what to do. Back-patch to 9.5. In principle the same hazard exists in 9.4, but the code is enough different that this patch wouldn't quite fix it there. Given the tiny usage of bgworkers in that branch it doesn't seem worth creating a variant patch for it. Discussion: https://postgr.es/m/18733.1570382257@sss.pgh.pa.us
1 parent 400d5ff commit 3887e94

File tree

1 file changed

+33
-13
lines changed

1 file changed

+33
-13
lines changed

src/backend/postmaster/postmaster.c

+33-13
Original file line numberDiff line numberDiff line change
@@ -409,7 +409,7 @@ static void SendNegotiateProtocolVersion(List *unrecognized_protocol_options);
409409
static void processCancelRequest(Port *port, void *pkt);
410410
static int initMasks(fd_set *rmask);
411411
static void report_fork_failure_to_client(Port *port, int errnum);
412-
static CAC_state canAcceptConnections(void);
412+
static CAC_state canAcceptConnections(int backend_type);
413413
static bool RandomCancelKey(int32 *cancel_key);
414414
static void signal_child(pid_t pid, int signal);
415415
static bool SignalSomeChildren(int signal, int targets);
@@ -2398,24 +2398,30 @@ processCancelRequest(Port *port, void *pkt)
23982398
}
23992399

24002400
/*
2401-
* canAcceptConnections --- check to see if database state allows connections.
2401+
* canAcceptConnections --- check to see if database state allows connections
2402+
* of the specified type. backend_type can be BACKEND_TYPE_NORMAL,
2403+
* BACKEND_TYPE_AUTOVAC, or BACKEND_TYPE_BGWORKER. (Note that we don't yet
2404+
* know whether a NORMAL connection might turn into a walsender.)
24022405
*/
24032406
static CAC_state
2404-
canAcceptConnections(void)
2407+
canAcceptConnections(int backend_type)
24052408
{
24062409
CAC_state result = CAC_OK;
24072410

24082411
/*
24092412
* Can't start backends when in startup/shutdown/inconsistent recovery
2410-
* state.
2413+
* state. We treat autovac workers the same as user backends for this
2414+
* purpose. However, bgworkers are excluded from this test; we expect
2415+
* bgworker_should_start_now() decided whether the DB state allows them.
24112416
*
24122417
* In state PM_WAIT_BACKUP only superusers can connect (this must be
24132418
* allowed so that a superuser can end online backup mode); we return
24142419
* CAC_WAITBACKUP code to indicate that this must be checked later. Note
24152420
* that neither CAC_OK nor CAC_WAITBACKUP can safely be returned until we
24162421
* have checked for too many children.
24172422
*/
2418-
if (pmState != PM_RUN)
2423+
if (pmState != PM_RUN &&
2424+
backend_type != BACKEND_TYPE_BGWORKER)
24192425
{
24202426
if (pmState == PM_WAIT_BACKUP)
24212427
result = CAC_WAITBACKUP; /* allow superusers only */
@@ -2435,9 +2441,9 @@ canAcceptConnections(void)
24352441
/*
24362442
* Don't start too many children.
24372443
*
2438-
* We allow more connections than we can have backends here because some
2444+
* We allow more connections here than we can have backends because some
24392445
* might still be authenticating; they might fail auth, or some existing
2440-
* backend might exit before the auth cycle is completed. The exact
2446+
* backend might exit before the auth cycle is completed. The exact
24412447
* MaxBackends limit is enforced when a new backend tries to join the
24422448
* shared-inval backend array.
24432449
*
@@ -4114,7 +4120,7 @@ BackendStartup(Port *port)
41144120
bn->cancel_key = MyCancelKey;
41154121

41164122
/* Pass down canAcceptConnections state */
4117-
port->canAcceptConnections = canAcceptConnections();
4123+
port->canAcceptConnections = canAcceptConnections(BACKEND_TYPE_NORMAL);
41184124
bn->dead_end = (port->canAcceptConnections != CAC_OK &&
41194125
port->canAcceptConnections != CAC_WAITBACKUP);
41204126

@@ -5486,7 +5492,7 @@ StartAutovacuumWorker(void)
54865492
* we have to check to avoid race-condition problems during DB state
54875493
* changes.
54885494
*/
5489-
if (canAcceptConnections() == CAC_OK)
5495+
if (canAcceptConnections(BACKEND_TYPE_AUTOVAC) == CAC_OK)
54905496
{
54915497
/*
54925498
* Compute the cancel key that will be assigned to this session. We
@@ -5731,12 +5737,13 @@ do_start_bgworker(RegisteredBgWorker *rw)
57315737

57325738
/*
57335739
* Allocate and assign the Backend element. Note we must do this before
5734-
* forking, so that we can handle out of memory properly.
5740+
* forking, so that we can handle failures (out of memory or child-process
5741+
* slots) cleanly.
57355742
*
57365743
* Treat failure as though the worker had crashed. That way, the
5737-
* postmaster will wait a bit before attempting to start it again; if it
5738-
* tried again right away, most likely it'd find itself repeating the
5739-
* out-of-memory or fork failure condition.
5744+
* postmaster will wait a bit before attempting to start it again; if we
5745+
* tried again right away, most likely we'd find ourselves hitting the
5746+
* same resource-exhaustion condition.
57405747
*/
57415748
if (!assign_backendlist_entry(rw))
57425749
{
@@ -5862,6 +5869,19 @@ assign_backendlist_entry(RegisteredBgWorker *rw)
58625869
{
58635870
Backend *bn;
58645871

5872+
/*
5873+
* Check that database state allows another connection. Currently the
5874+
* only possible failure is CAC_TOOMANY, so we just log an error message
5875+
* based on that rather than checking the error code precisely.
5876+
*/
5877+
if (canAcceptConnections(BACKEND_TYPE_BGWORKER) != CAC_OK)
5878+
{
5879+
ereport(LOG,
5880+
(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
5881+
errmsg("no slot available for new worker process")));
5882+
return false;
5883+
}
5884+
58655885
/*
58665886
* Compute the cancel key that will be assigned to this session. We
58675887
* probably don't need cancel keys for background workers, but we'd better

0 commit comments

Comments
 (0)