Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit ee937f0

Browse files
committed
Fix data loss when restarting the bulk_write facility
If a user started a bulk write operation on a fork with existing data to append data in bulk, the bulk_write machinery would zero out all previously written pages up to the last page written by the new bulk_write operation. This is not an issue for PostgreSQL itself, because we never use the bulk_write facility on a non-empty fork. But there are use cases where it makes sense. TimescaleDB extension is known to do that to merge partitions, for example. Backpatch to v17, where the bulk_write machinery was introduced. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Reported-By: Erik Nordström <erik@timescale.com> Reviewed-by: Erik Nordström <erik@timescale.com> Discussion: https://www.postgresql.org/message-id/CACAa4VJ%2BQY4pY7M0ECq29uGkrOygikYtao1UG9yCDFosxaps9g@mail.gmail.com
1 parent aac831c commit ee937f0

File tree

1 file changed

+11
-8
lines changed

1 file changed

+11
-8
lines changed

src/backend/storage/smgr/bulk_write.c

+11-8
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,10 @@
44
* Efficiently and reliably populate a new relation
55
*
66
* The assumption is that no other backends access the relation while we are
7-
* loading it, so we can take some shortcuts. Do not mix operations through
8-
* the regular buffer manager and the bulk loading interface!
7+
* loading it, so we can take some shortcuts. Pages already present in the
8+
* indicated fork when the bulk write operation is started are not modified
9+
* unless explicitly written to. Do not mix operations through the regular
10+
* buffer manager and the bulk loading interface!
911
*
1012
* We bypass the buffer manager to avoid the locking overhead, and call
1113
* smgrextend() directly. A downside is that the pages will need to be
@@ -68,7 +70,7 @@ struct BulkWriteState
6870
PendingWrite pending_writes[MAX_PENDING_WRITES];
6971

7072
/* Current size of the relation */
71-
BlockNumber pages_written;
73+
BlockNumber relsize;
7274

7375
/* The RedoRecPtr at the time that the bulk operation started */
7476
XLogRecPtr start_RedoRecPtr;
@@ -105,7 +107,7 @@ smgr_bulk_start_smgr(SMgrRelation smgr, ForkNumber forknum, bool use_wal)
105107
state->use_wal = use_wal;
106108

107109
state->npending = 0;
108-
state->pages_written = 0;
110+
state->relsize = smgrnblocks(smgr, forknum);
109111

110112
state->start_RedoRecPtr = GetRedoRecPtr();
111113

@@ -279,7 +281,7 @@ smgr_bulk_flush(BulkWriteState *bulkstate)
279281

280282
PageSetChecksumInplace(page, blkno);
281283

282-
if (blkno >= bulkstate->pages_written)
284+
if (blkno >= bulkstate->relsize)
283285
{
284286
/*
285287
* If we have to write pages nonsequentially, fill in the space
@@ -288,17 +290,18 @@ smgr_bulk_flush(BulkWriteState *bulkstate)
288290
* space will read as zeroes anyway), but it should help to avoid
289291
* fragmentation. The dummy pages aren't WAL-logged though.
290292
*/
291-
while (blkno > bulkstate->pages_written)
293+
while (blkno > bulkstate->relsize)
292294
{
293295
/* don't set checksum for all-zero page */
294296
smgrextend(bulkstate->smgr, bulkstate->forknum,
295-
bulkstate->pages_written++,
297+
bulkstate->relsize,
296298
&zero_buffer,
297299
true);
300+
bulkstate->relsize++;
298301
}
299302

300303
smgrextend(bulkstate->smgr, bulkstate->forknum, blkno, page, true);
301-
bulkstate->pages_written = pending_writes[i].blkno + 1;
304+
bulkstate->relsize++;
302305
}
303306
else
304307
smgrwrite(bulkstate->smgr, bulkstate->forknum, blkno, page, true);

0 commit comments

Comments
 (0)