Since
5a991ef8 we're explicitly asking for feedback from the receiving
side when shutting down walsender, if there's not yet replicated
data.
Unfortunately we didn't remember (i.e. set waiting_for_ping_response to
true) having asked for feedback, leading to scenarios in which replies
were requested at a high frequency.
I can't reproduce this problem on my laptop, I think that's because the
problem requires a significant TCP window to manifest due to the
!pq_is_send_pending() condition. But since this clearly is a bug, let's
fix it. There's quite possibly more wrong than just this though.
While fiddling with WalSndDone(), I rewrote a hard to understand comment
about looking at the flush vs. the write position.
Reported-By: Nick Cleaton, Magnus Hagander
Author: Nick Cleaton
Discussion: CAFgz3kus=rC_avEgBV=+hRK5HYJ8vXskJRh8yEAbahJGTzF2VQ@mail.gmail.com
CABUevExsjROqDcD0A2rnJ6HK6FuKGyewJr3PL12pw85BHFGS2Q@mail.gmail.com
Backpatch: 9.4, were
5a991ef8 introduced the use of feedback messages
during shutdown.
send_data();
/*
- * Check a write location to see whether all the WAL have successfully
- * been replicated if this walsender is connecting to a standby such as
- * pg_receivexlog which always returns an invalid flush location.
- * Otherwise, check a flush location.
+ * To figure out whether all WAL has successfully been replicated, check
+ * flush location if valid, write otherwise. Tools like pg_receivexlog
+ * will usually (unless in synchronous mode) return an invalid flush
+ * location.
*/
replicatedPtr = XLogRecPtrIsInvalid(MyWalSnd->flush) ?
MyWalSnd->write : MyWalSnd->flush;
+
if (WalSndCaughtUp && sentPtr == replicatedPtr &&
!pq_is_send_pending())
{
proc_exit(0);
}
if (!waiting_for_ping_response)
+ {
WalSndKeepalive(true);
+ waiting_for_ping_response = true;
+ }
}
/*