Accept SIGQUIT during error recovery in auxiliary processes.

The bgwriter, checkpointer, walwriter, and walreceiver processes claimed to allow SIGQUIT "at all times". In reality SIGQUIT would get re-blocked during error recovery, because we didn't update the actual signal mask immediately, so sigsetjmp() would save and reinstate a mask that includes SIGQUIT. This appears to be simply a coding oversight. There's never a good reason to hold off SIGQUIT in these processes, because it's going to just call _exit(2) which should be safe enough, especially since the postmaster is going to tear down shared memory afterwards. Hence, stick in PG_SETMASK() calls to install the modified BlockSig mask immediately. Also try to improve the comments around sigsetjmp blocks. Most of them were just referencing postgres.c, which is misleading because actually postgres.c manages the signals differently. No back-patch, since there's no evidence that this is causing any problems in the field. Discussion: https://postgr.es/m/CALDaNm1d1hHPZUg3xU4XjtWBOLCrA+-2cJcLpw-cePZ=GgDVfA@mail.gmail.com
author: Tom Lane 2020-09-11 20:01:28 +0000
committer: Tom Lane 2020-09-11 20:01:36 +0000
commit: 7634bd4f6d38bdef1fe442df5c2e0da73f1f90f4 (patch)
tree: 474ffdcd4bd467b31cbf930a4d92d174985b715d /src/backend/postmaster/checkpointer.c
parent: 3c99230b4f0d10c9eac5f4efdd2394eccb2af3a0 (diff)
1 files changed, 16 insertions, 2 deletions
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 624a3238b80..45f5deca72e 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -209,8 +209,9 @@ CheckpointerMain(void)
 	 */
 	pqsignal(SIGCHLD, SIG_DFL);
 
-	/* We allow SIGQUIT (quickdie) at all times */
+	/* We allow SIGQUIT (SignalHandlerForCrashExit) at all times */
 	sigdelset(&BlockSig, SIGQUIT);
+	PG_SETMASK(&BlockSig);
 
 	/*
 	 * Initialize so that first time-driven event happens at the correct time.
@@ -231,7 +232,20 @@ CheckpointerMain(void)
 	/*
 	 * If an exception is encountered, processing resumes here.
 	 *
-	 * See notes in postgres.c about the design of this coding.
+	 * You might wonder why this isn't coded as an infinite loop around a
+	 * PG_TRY construct.  The reason is that this is the bottom of the
+	 * exception stack, and so with PG_TRY there would be no exception handler
+	 * in force at all during the CATCH part.  By leaving the outermost setjmp
+	 * always active, we have at least some chance of recovering from an error
+	 * during error recovery.  (If we get into an infinite loop thereby, it
+	 * will soon be stopped by overflow of elog.c's internal state stack.)
+	 *
+	 * Note that we use sigsetjmp(..., 1), so that the prevailing signal mask
+	 * (to wit, BlockSig) will be restored when longjmp'ing to here.  Thus,
+	 * signals other than SIGQUIT will be blocked until we complete error
+	 * recovery.  It might seem that this policy makes the HOLD_INTERRUPTS()
+	 * call redundant, but it is not since InterruptPending might be set
+	 * already.
 	 */
 	if (sigsetjmp(local_sigjmp_buf, 1) != 0)
 	{
author	Tom Lane	2020-09-11 20:01:28 +0000
committer	Tom Lane	2020-09-11 20:01:36 +0000
commit	7634bd4f6d38bdef1fe442df5c2e0da73f1f90f4 (patch)
tree	474ffdcd4bd467b31cbf930a4d92d174985b715d /src/backend/postmaster/checkpointer.c
parent	3c99230b4f0d10c9eac5f4efdd2394eccb2af3a0 (diff)