Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit b2e1eaa

Browse files
committed
Fix transient clobbering of shared buffers during WAL replay.
RestoreBkpBlocks was in the habit of zeroing and refilling the target buffer; which was perfectly safe when the code was written, but is unsafe during Hot Standby operation. The reason is that we have coding rules that allow backends to continue accessing a tuple in a heap relation while holding only a pin on its buffer. Such a backend could see transiently zeroed data, if WAL replay had occasion to change other data on the page. This has been shown to be the cause of bug #6425 from Duncan Rance (who deserves kudos for developing a sufficiently-reproducible test case) as well as Bridget Frey's re-report of bug #6200. It most likely explains the original report as well, though we don't yet have confirmation of that. To fix, change the code so that only bytes that are supposed to change will change, even transiently. This actually saves cycles in RestoreBkpBlocks, since it's not writing the same bytes twice. Also fix seq_redo, which has the same disease, though it has to work a bit harder to meet the requirement. So far as I can tell, no other WAL replay routines have this type of bug. In particular, the index-related replay routines, which would certainly be broken if they had to meet the same standard, are not at risk because we do not have coding rules that allow access to an index page when not holding a buffer lock on it. Back-patch to 9.0 where Hot Standby was added.
1 parent 8572cc4 commit b2e1eaa

File tree

2 files changed

+24
-9
lines changed

2 files changed

+24
-9
lines changed

src/backend/access/transam/xlog.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3608,9 +3608,9 @@ RestoreBkpBlocks(XLogRecPtr lsn, XLogRecord *record, bool cleanup)
36083608
}
36093609
else
36103610
{
3611-
/* must zero-fill the hole */
3612-
MemSet((char *) page, 0, BLCKSZ);
36133611
memcpy((char *) page, blk, bkpb.hole_offset);
3612+
/* must zero-fill the hole */
3613+
MemSet((char *) page + bkpb.hole_offset, 0, bkpb.hole_length);
36143614
memcpy((char *) page + (bkpb.hole_offset + bkpb.hole_length),
36153615
blk + bkpb.hole_offset,
36163616
BLCKSZ - (bkpb.hole_offset + bkpb.hole_length));

src/backend/commands/sequence.c

Lines changed: 22 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1507,6 +1507,7 @@ seq_redo(XLogRecPtr lsn, XLogRecord *record)
15071507
uint8 info = record->xl_info & ~XLR_INFO_MASK;
15081508
Buffer buffer;
15091509
Page page;
1510+
Page localpage;
15101511
char *item;
15111512
Size itemsz;
15121513
xl_seq_rec *xlrec = (xl_seq_rec *) XLogRecGetData(record);
@@ -1522,23 +1523,37 @@ seq_redo(XLogRecPtr lsn, XLogRecord *record)
15221523
Assert(BufferIsValid(buffer));
15231524
page = (Page) BufferGetPage(buffer);
15241525

1525-
/* Always reinit the page and reinstall the magic number */
1526-
/* See comments in DefineSequence */
1527-
PageInit((Page) page, BufferGetPageSize(buffer), sizeof(sequence_magic));
1528-
sm = (sequence_magic *) PageGetSpecialPointer(page);
1526+
/*
1527+
* We must always reinit the page and reinstall the magic number (see
1528+
* comments in fill_seq_with_data). However, since this WAL record type
1529+
* is also used for updating sequences, it's possible that a hot-standby
1530+
* backend is examining the page concurrently; so we mustn't transiently
1531+
* trash the buffer. The solution is to build the correct new page
1532+
* contents in local workspace and then memcpy into the buffer. Then
1533+
* only bytes that are supposed to change will change, even transiently.
1534+
* We must palloc the local page for alignment reasons.
1535+
*/
1536+
localpage = (Page) palloc(BufferGetPageSize(buffer));
1537+
1538+
PageInit(localpage, BufferGetPageSize(buffer), sizeof(sequence_magic));
1539+
sm = (sequence_magic *) PageGetSpecialPointer(localpage);
15291540
sm->magic = SEQ_MAGIC;
15301541

15311542
item = (char *) xlrec + sizeof(xl_seq_rec);
15321543
itemsz = record->xl_len - sizeof(xl_seq_rec);
15331544
itemsz = MAXALIGN(itemsz);
1534-
if (PageAddItem(page, (Item) item, itemsz,
1545+
if (PageAddItem(localpage, (Item) item, itemsz,
15351546
FirstOffsetNumber, false, false) == InvalidOffsetNumber)
15361547
elog(PANIC, "seq_redo: failed to add item to page");
15371548

1538-
PageSetLSN(page, lsn);
1539-
PageSetTLI(page, ThisTimeLineID);
1549+
PageSetLSN(localpage, lsn);
1550+
PageSetTLI(localpage, ThisTimeLineID);
1551+
1552+
memcpy(page, localpage, BufferGetPageSize(buffer));
15401553
MarkBufferDirty(buffer);
15411554
UnlockReleaseBuffer(buffer);
1555+
1556+
pfree(localpage);
15421557
}
15431558

15441559
void

0 commit comments

Comments
 (0)