Perform an immediate shutdown if the postmaster.pid file is removed.

The postmaster now checks every minute or so (worst case, at most two minutes) that postmaster.pid is still there and still contains its own PID. If not, it performs an immediate shutdown, as though it had received SIGQUIT. The original goal behind this change was to ensure that failed buildfarm runs would get fully cleaned up, even if the test scripts had left a postmaster running, which is not an infrequent occurrence. When the buildfarm script removes a test postmaster's $PGDATA directory, its next check on postmaster.pid will fail and cause it to exit. Previously, manual intervention was often needed to get rid of such orphaned postmasters, since they'd block new test postmasters from obtaining the expected socket address. However, by checking postmaster.pid and not something else, we can provide additional robustness: manual removal of postmaster.pid is a frequent DBA mistake, and now we can at least limit the damage that will ensue if a new postmaster is started while the old one is still alive. Back-patch to all supported branches, since we won't get the desired improvement in buildfarm reliability otherwise.
author: Tom Lane 2015-10-06 21:15:27 +0000
committer: Tom Lane 2015-10-06 21:15:52 +0000
commit: 7e2a18a9161fee7e67642863f72b51d77d3e996f (patch)
tree: e831f031db610653ab1a6512ff8dc743ef7feb79 /src/backend/postmaster
parent: 8f6bb851bdb75d741b3a7543bbf681e3cd7e65dc (diff)
1 files changed, 44 insertions, 14 deletions
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index baa43b203f1..aab2f4ca702 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1602,9 +1602,10 @@ ServerLoop(void)
 	fd_set		readmask;
 	int			nSockets;
 	time_t		now,
+				last_lockfile_recheck_time,
 				last_touch_time;
 
-	last_touch_time = time(NULL);
+	last_lockfile_recheck_time = last_touch_time = time(NULL);
 
 	nSockets = initMasks(&readmask);
 
@@ -1754,19 +1755,6 @@ ServerLoop(void)
 		if (StartWorkerNeeded || HaveCrashedWorker)
 			maybe_start_bgworker();
 
-		/*
-		 * Touch Unix socket and lock files every 58 minutes, to ensure that
-		 * they are not removed by overzealous /tmp-cleaning tasks.  We assume
-		 * no one runs cleaners with cutoff times of less than an hour ...
-		 */
-		now = time(NULL);
-		if (now - last_touch_time >= 58 * SECS_PER_MINUTE)
-		{
-			TouchSocketFiles();
-			TouchSocketLockFiles();
-			last_touch_time = now;
-		}
-
 #ifdef HAVE_PTHREAD_IS_THREADED_NP
 
 		/*
@@ -1793,6 +1781,48 @@ ServerLoop(void)
 			/* reset flag so we don't SIGKILL again */
 			AbortStartTime = 0;
 		}
+
+		/*
+		 * Lastly, check to see if it's time to do some things that we don't
+		 * want to do every single time through the loop, because they're a
+		 * bit expensive.  Note that there's up to a minute of slop in when
+		 * these tasks will be performed, since DetermineSleepTime() will let
+		 * us sleep at most that long.
+		 */
+		now = time(NULL);
+
+		/*
+		 * Once a minute, verify that postmaster.pid hasn't been removed or
+		 * overwritten.  If it has, we force a shutdown.  This avoids having
+		 * postmasters and child processes hanging around after their database
+		 * is gone, and maybe causing problems if a new database cluster is
+		 * created in the same place.  It also provides some protection
+		 * against a DBA foolishly removing postmaster.pid and manually
+		 * starting a new postmaster.  Data corruption is likely to ensue from
+		 * that anyway, but we can minimize the damage by aborting ASAP.
+		 */
+		if (now - last_lockfile_recheck_time >= 1 * SECS_PER_MINUTE)
+		{
+			if (!RecheckDataDirLockFile())
+			{
+				ereport(LOG,
+						(errmsg("performing immediate shutdown because data directory lock file is invalid")));
+				kill(MyProcPid, SIGQUIT);
+			}
+			last_lockfile_recheck_time = now;
+		}
+
+		/*
+		 * Touch Unix socket and lock files every 58 minutes, to ensure that
+		 * they are not removed by overzealous /tmp-cleaning tasks.  We assume
+		 * no one runs cleaners with cutoff times of less than an hour ...
+		 */
+		if (now - last_touch_time >= 58 * SECS_PER_MINUTE)
+		{
+			TouchSocketFiles();
+			TouchSocketLockFiles();
+			last_touch_time = now;
+		}
 	}
 }
author	Tom Lane	2015-10-06 21:15:27 +0000
committer	Tom Lane	2015-10-06 21:15:52 +0000
commit	7e2a18a9161fee7e67642863f72b51d77d3e996f (patch)
tree	e831f031db610653ab1a6512ff8dc743ef7feb79 /src/backend/postmaster
parent	8f6bb851bdb75d741b3a7543bbf681e3cd7e65dc (diff)