Reword overly-optimistic comment about backup checksum verification.

robertmhaas · robertmhaas · commit e76cbb6cd64d · 2023-03-06T10:35:15.000-05:00
The comment implies that a single retry is sufficient to avoid spurious checksum failures, but in fact no number of retries is sufficient for that purpose. Update the comment accordingly. Patch by me, reviewed by Michael Paquier. Discussion: http://postgr.es/m/CA+TgmoZ_fFAoU6mrHt9QBs+dcYhN6yXenGTTMRebZNhtwPwHyg@mail.gmail.com
diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c
@@ -1602,11 +1602,21 @@ sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
 						 * Retry the block on the first failure.  It's
 						 * possible that we read the first 4K page of the
 						 * block just before postgres updated the entire block
-						 * so it ends up looking torn to us.  We only need to
-						 * retry once because the LSN should be updated to
-						 * something we can ignore on the next pass.  If the
-						 * error happens again then it is a true validation
-						 * failure.
+						 * so it ends up looking torn to us. If, before we
+						 * retry the read, the concurrent write of the block
+						 * finishes, the page LSN will be updated and we'll
+						 * realize that we should ignore this block.
+						 *
+						 * There's no guarantee that this will actually
+						 * happen, though: the torn write could take an
+						 * arbitrarily long time to complete. Retrying multiple
+						 * times wouldn't fix this problem, either, though
+						 * it would reduce the chances of it happening in
+						 * practice. The only real fix here seems to be to
+						 * have some kind of interlock that allows us to wait
+						 * until we can be certain that no write to the block
+						 * is in progress. Since we don't have any such thing
+						 * right now, we just do this and hope for the best.
 						 */
 						if (block_retry == false)
 						{