@@ -1602,11 +1602,21 @@ sendFile(bbsink *sink, const char *readfilename, const char *tarfilename,
1602
1602
* Retry the block on the first failure. It's
1603
1603
* possible that we read the first 4K page of the
1604
1604
* block just before postgres updated the entire block
1605
- * so it ends up looking torn to us. We only need to
1606
- * retry once because the LSN should be updated to
1607
- * something we can ignore on the next pass. If the
1608
- * error happens again then it is a true validation
1609
- * failure.
1605
+ * so it ends up looking torn to us. If, before we
1606
+ * retry the read, the concurrent write of the block
1607
+ * finishes, the page LSN will be updated and we'll
1608
+ * realize that we should ignore this block.
1609
+ *
1610
+ * There's no guarantee that this will actually
1611
+ * happen, though: the torn write could take an
1612
+ * arbitrarily long time to complete. Retrying multiple
1613
+ * times wouldn't fix this problem, either, though
1614
+ * it would reduce the chances of it happening in
1615
+ * practice. The only real fix here seems to be to
1616
+ * have some kind of interlock that allows us to wait
1617
+ * until we can be certain that no write to the block
1618
+ * is in progress. Since we don't have any such thing
1619
+ * right now, we just do this and hope for the best.
1610
1620
*/
1611
1621
if (block_retry == false)
1612
1622
{
0 commit comments