Support clean switchover.

MasaoFujii · MasaoFujii · commit 985bd7d49726 · 2013-06-26T02:14:37.000+09:00
In replication, when we shutdown the master, walsender tries to send
all the outstanding WAL records to the standby, and then to exit. This
basically means that all the WAL records are fully synced between
two servers after the clean shutdown of the master. So, after
promoting the standby to new master, we can restart the stopped
master as new standby without the need for a fresh backup from
new master.

But there was one problem so far: though walsender tries to send all
the outstanding WAL records, it doesn't wait for them to be replicated
to the standby. Then, before receiving all the WAL records,
walreceiver can detect the closure of connection and exit. We cannot
guarantee that there is no missing WAL in the standby after clean
shutdown of the master. In this case, backup from new master is
required when restarting the stopped master as new standby.

This patch fixes this problem. It just changes walsender so that it
waits for all the outstanding WAL records to be replicated to the
standby before closing the replication connection.

Per discussion, this is a fix that needs to get backpatched rather than
new feature. So, back-patch to 9.1 where enough infrastructure for
this exists.

Patch by me, reviewed by Andres Freund.
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
@@ -27,7 +27,8 @@
  * If the server is shut down, postmaster sends us SIGUSR2 after all
  * regular backends have exited and the shutdown checkpoint has been written.
  * This instruct walsender to send any outstanding WAL, including the
- * shutdown checkpoint record, and then exit.
+ * shutdown checkpoint record, wait for it to be replicated to the standby,
+ * and then exit.
  *
  *
  * Portions Copyright (c) 2010-2013, PostgreSQL Global Development Group
@@ -1045,15 +1046,17 @@ WalSndLoop(void)
 
 			/*
 			 * When SIGUSR2 arrives, we send any outstanding logs up to the
-			 * shutdown checkpoint record (i.e., the latest record) and exit.
+			 * shutdown checkpoint record (i.e., the latest record), wait
+			 * for them to be replicated to the standby, and exit.
 			 * This may be a normal termination at shutdown, or a promotion,
 			 * the walsender is not sure which.
 			 */
 			if (walsender_ready_to_stop)
 			{
 				/* ... let's just be real sure we're caught up ... */
 				XLogSend(&caughtup);
-				if (caughtup && !pq_is_send_pending())
+				if (caughtup && sentPtr == MyWalSnd->flush &&
+					!pq_is_send_pending())
 				{
 					/* Inform the standby that XLOG streaming is done */
 					EndCommand("COPY 0", DestRemote);
@@ -1728,7 +1731,8 @@ WalSndLastCycleHandler(SIGNAL_ARGS)
 	/*
 	 * If replication has not yet started, die like with SIGTERM. If
 	 * replication is active, only set a flag and wake up the main loop. It
-	 * will send any outstanding WAL, and then exit gracefully.
+	 * will send any outstanding WAL, wait for it to be replicated to
+	 * the standby, and then exit gracefully.
 	 */
 	if (!replication_active)
 		kill(MyProcPid, SIGTERM);