Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 463a2eb

Browse files
committed
postmaster: Commonalize FatalError paths
This includes some behavioral changes: - Previously PM_WAIT_XLOG_ARCHIVAL wasn't handled in HandleFatalError(), that doesn't seem quite right. - Previously a fatal error in PM_WAIT_XLOG_SHUTDOWN lead to jumping back to PM_WAIT_BACKENDS, no we go to PM_WAIT_DEAD_END. Jumping backwards doesn't seem quite right and we didn't do so when checkpointer failed to fork during a shutdown. - Previously a checkpointer fork failure didn't call SetQuitSignalReason(), which would lead to quickdie() reporting "terminating connection because of unexpected SIGQUIT signal" which seems even worse than the PMQUIT_FOR_CRASH message. If I saw that in the log I'd suspect somebody outside of postgres sent SIGQUITs Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/kgng5nrvnlv335evmsuvpnh354rw7qyazl73kdysev2cr2v5zu@m3cfzxicm5kp
1 parent 8edd8c7 commit 463a2eb

File tree

1 file changed

+58
-16
lines changed

1 file changed

+58
-16
lines changed

src/backend/postmaster/postmaster.c

Lines changed: 58 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2706,13 +2706,50 @@ HandleFatalError(QuitSignalReason reason, bool consider_sigabrt)
27062706

27072707
FatalError = true;
27082708

2709-
/* We now transit into a state of waiting for children to die */
2710-
if (pmState == PM_RECOVERY ||
2711-
pmState == PM_HOT_STANDBY ||
2712-
pmState == PM_RUN ||
2713-
pmState == PM_STOP_BACKENDS ||
2714-
pmState == PM_WAIT_XLOG_SHUTDOWN)
2715-
UpdatePMState(PM_WAIT_BACKENDS);
2709+
/*
2710+
* Choose the appropriate new state to react to the fatal error. Unless we
2711+
* were already in the process of shutting down, we go through
2712+
* PM_WAIT_BACKEND. For errors during the shutdown sequence, we directly
2713+
* switch to PM_WAIT_DEAD_END.
2714+
*/
2715+
switch (pmState)
2716+
{
2717+
case PM_INIT:
2718+
/* shouldn't have any children */
2719+
Assert(false);
2720+
break;
2721+
case PM_STARTUP:
2722+
/* should have been handled in process_pm_child_exit */
2723+
Assert(false);
2724+
break;
2725+
2726+
/* wait for children to die */
2727+
case PM_RECOVERY:
2728+
case PM_HOT_STANDBY:
2729+
case PM_RUN:
2730+
case PM_STOP_BACKENDS:
2731+
UpdatePMState(PM_WAIT_BACKENDS);
2732+
break;
2733+
2734+
case PM_WAIT_BACKENDS:
2735+
/* there might be more backends to wait for */
2736+
break;
2737+
2738+
case PM_WAIT_XLOG_SHUTDOWN:
2739+
case PM_WAIT_XLOG_ARCHIVAL:
2740+
2741+
/*
2742+
* NB: Similar code exists in PostmasterStateMachine()'s handling
2743+
* of FatalError in PM_STOP_BACKENDS/PM_WAIT_BACKENDS states.
2744+
*/
2745+
ConfigurePostmasterWaitSet(false);
2746+
UpdatePMState(PM_WAIT_DEAD_END);
2747+
break;
2748+
2749+
case PM_WAIT_DEAD_END:
2750+
case PM_NO_CHILDREN:
2751+
break;
2752+
}
27162753

27172754
/*
27182755
* .. and if this doesn't happen quickly enough, now the clock is ticking
@@ -2942,15 +2979,18 @@ PostmasterStateMachine(void)
29422979
{
29432980
/*
29442981
* Stop any dead-end children and stop creating new ones.
2982+
*
2983+
* NB: Similar code exists in HandleFatalErrors(), when the
2984+
* error happens in pmState > PM_WAIT_BACKENDS.
29452985
*/
29462986
UpdatePMState(PM_WAIT_DEAD_END);
29472987
ConfigurePostmasterWaitSet(false);
29482988
SignalChildren(SIGQUIT, btmask(B_DEAD_END_BACKEND));
29492989

29502990
/*
2951-
* We already SIGQUIT'd walsenders and the archiver, if any,
2952-
* when we started immediate shutdown or entered FatalError
2953-
* state.
2991+
* We already SIGQUIT'd auxiliary processes (other than
2992+
* logger), if any, when we started immediate shutdown or
2993+
* entered FatalError state.
29542994
*/
29552995
}
29562996
else
@@ -2981,13 +3021,15 @@ PostmasterStateMachine(void)
29813021
* We don't consult send_abort_for_crash here, as it's
29823022
* unlikely that dumping cores would illuminate the reason
29833023
* for checkpointer fork failure.
3024+
*
3025+
* XXX: It may be worth to introduce a different PMQUIT
3026+
* value that signals that the cluster is in a bad state,
3027+
* without a process having crashed. But right now this
3028+
* path is very unlikely to be reached, so it isn't
3029+
* obviously worthwhile adding a distinct error message in
3030+
* quickdie().
29843031
*/
2985-
FatalError = true;
2986-
UpdatePMState(PM_WAIT_DEAD_END);
2987-
ConfigurePostmasterWaitSet(false);
2988-
2989-
/* Kill the walsenders and archiver too */
2990-
SignalChildren(SIGQUIT, btmask_all_except(B_LOGGER));
3032+
HandleFatalError(PMQUIT_FOR_CRASH, false);
29913033
}
29923034
}
29933035
}

0 commit comments

Comments
 (0)