Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 917dc7d

Browse files
committed
Fix WAL-logging of FSM and VM truncation.
When a relation is truncated, it is important that the FSM is truncated as well. Otherwise, after recovery, the FSM can return a page that has been truncated away, leading to errors like: ERROR: could not read block 28991 in file "base/16390/572026": read only 0 of 8192 bytes We were using MarkBufferDirtyHint() to dirty the buffer holding the last remaining page of the FSM, but during recovery, that might in fact not dirty the page, and the FSM update might be lost. To fix, use the stronger MarkBufferDirty() function. MarkBufferDirty() requires us to do WAL-logging ourselves, to protect from a torn page, if checksumming is enabled. Also fix an oversight in visibilitymap_truncate: it also needs to WAL-log when checksumming is enabled. Analysis by Pavan Deolasee. Discussion: <CABOikdNr5vKucqyZH9s1Mh0XebLs_jRhKv6eJfNnD2wxTn=_9A@mail.gmail.com>
1 parent b801e12 commit 917dc7d

File tree

3 files changed

+128
-1
lines changed

3 files changed

+128
-1
lines changed

src/backend/access/heap/visibilitymap.c

+16
Original file line numberDiff line numberDiff line change
@@ -508,6 +508,9 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
508508

509509
LockBuffer(mapBuffer, BUFFER_LOCK_EXCLUSIVE);
510510

511+
/* NO EREPORT(ERROR) from here till changes are logged */
512+
START_CRIT_SECTION();
513+
511514
/* Clear out the unwanted bytes. */
512515
MemSet(&map[truncByte + 1], 0, MAPSIZE - (truncByte + 1));
513516

@@ -523,7 +526,20 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
523526
*/
524527
map[truncByte] &= (1 << truncOffset) - 1;
525528

529+
/*
530+
* Truncation of a relation is WAL-logged at a higher-level, and we
531+
* will be called at WAL replay. But if checksums are enabled, we need
532+
* to still write a WAL record to protect against a torn page, if the
533+
* page is flushed to disk before the truncation WAL record. We cannot
534+
* use MarkBufferDirtyHint here, because that will not dirty the page
535+
* during recovery.
536+
*/
526537
MarkBufferDirty(mapBuffer);
538+
if (!InRecovery && RelationNeedsWAL(rel) && XLogHintBitIsNeeded())
539+
log_newpage_buffer(mapBuffer, false);
540+
541+
END_CRIT_SECTION();
542+
527543
UnlockReleaseBuffer(mapBuffer);
528544
}
529545
else

src/backend/storage/freespace/freespace.c

+19-1
Original file line numberDiff line numberDiff line change
@@ -327,8 +327,26 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
327327
if (!BufferIsValid(buf))
328328
return; /* nothing to do; the FSM was already smaller */
329329
LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
330+
331+
/* NO EREPORT(ERROR) from here till changes are logged */
332+
START_CRIT_SECTION();
333+
330334
fsm_truncate_avail(BufferGetPage(buf), first_removed_slot);
331-
MarkBufferDirtyHint(buf, false);
335+
336+
/*
337+
* Truncation of a relation is WAL-logged at a higher-level, and we
338+
* will be called at WAL replay. But if checksums are enabled, we need
339+
* to still write a WAL record to protect against a torn page, if the
340+
* page is flushed to disk before the truncation WAL record. We cannot
341+
* use MarkBufferDirtyHint here, because that will not dirty the page
342+
* during recovery.
343+
*/
344+
MarkBufferDirty(buf);
345+
if (!InRecovery && RelationNeedsWAL(rel) && XLogHintBitIsNeeded())
346+
log_newpage_buffer(buf, false);
347+
348+
END_CRIT_SECTION();
349+
332350
UnlockReleaseBuffer(buf);
333351

334352
new_nfsmblocks = fsm_logical_to_physical(first_removed_address) + 1;
+93
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# Test WAL replay of FSM changes.
2+
#
3+
# FSM changes don't normally need to be WAL-logged, except for truncation.
4+
# The FSM mustn't return a page that doesn't exist (anymore).
5+
use strict;
6+
use warnings;
7+
8+
use PostgresNode;
9+
use TestLib;
10+
use Test::More tests => 1;
11+
12+
my $node_master = get_new_node('master');
13+
$node_master->init(allows_streaming => 1);
14+
15+
$node_master->append_conf('postgresql.conf', qq{
16+
fsync = on
17+
wal_level = replica
18+
wal_log_hints = on
19+
max_prepared_transactions = 5
20+
autovacuum = off
21+
});
22+
23+
# Create a master node and its standby, initializing both with some data
24+
# at the same time.
25+
$node_master->start;
26+
27+
$node_master->backup('master_backup');
28+
my $node_standby = get_new_node('standby');
29+
$node_standby->init_from_backup($node_master, 'master_backup',
30+
has_streaming => 1);
31+
$node_standby->start;
32+
33+
$node_master->psql('postgres', qq{
34+
create table testtab (a int, b char(100));
35+
insert into testtab select generate_series(1,1000), 'foo';
36+
insert into testtab select generate_series(1,1000), 'foo';
37+
delete from testtab where ctid > '(8,0)';
38+
});
39+
40+
# Take a lock on the table to prevent following vacuum from truncating it
41+
$node_master->psql('postgres', qq{
42+
begin;
43+
lock table testtab in row share mode;
44+
prepare transaction 'p1';
45+
});
46+
47+
# Vacuum, update FSM without truncation
48+
$node_master->psql('postgres', 'vacuum verbose testtab');
49+
50+
# Force a checkpoint
51+
$node_master->psql('postgres', 'checkpoint');
52+
53+
# Now do some more insert/deletes, another vacuum to ensure full-page writes
54+
# are done
55+
$node_master->psql('postgres', qq{
56+
insert into testtab select generate_series(1,1000), 'foo';
57+
delete from testtab where ctid > '(8,0)';
58+
vacuum verbose testtab;
59+
});
60+
61+
# Ensure all buffers are now clean on the standby
62+
$node_standby->psql('postgres', 'checkpoint');
63+
64+
# Release the lock, vacuum again which should lead to truncation
65+
$node_master->psql('postgres', qq{
66+
rollback prepared 'p1';
67+
vacuum verbose testtab;
68+
});
69+
70+
$node_master->psql('postgres', 'checkpoint');
71+
my $until_lsn =
72+
$node_master->safe_psql('postgres', "SELECT pg_current_xlog_location();");
73+
74+
# Wait long enough for standby to receive and apply all WAL
75+
my $caughtup_query =
76+
"SELECT '$until_lsn'::pg_lsn <= pg_last_xlog_replay_location()";
77+
$node_standby->poll_query_until('postgres', $caughtup_query)
78+
or die "Timed out while waiting for standby to catch up";
79+
80+
# Promote the standby
81+
$node_standby->promote;
82+
$node_standby->poll_query_until('postgres',
83+
"SELECT NOT pg_is_in_recovery()")
84+
or die "Timed out while waiting for promotion of standby";
85+
$node_standby->psql('postgres', 'checkpoint');
86+
87+
# Restart to discard in-memory copy of FSM
88+
$node_standby->restart;
89+
90+
# Insert should work on standby
91+
is($node_standby->psql('postgres',
92+
qq{insert into testtab select generate_series(1,1000), 'foo';}),
93+
0, 'INSERT succeeds with truncated relation FSM');

0 commit comments

Comments
 (0)