Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 1a3de15

Browse files
committed
Dept. of further reflection: I looked around to see if any other callers
of XLogInsert had the same sort of checkpoint interlock problem as RecordTransactionCommit, and indeed I found some. Btree index build and ALTER TABLE SET TABLESPACE write data outside the friendly confines of the buffer manager, and therefore they have to take their own responsibility for checkpoint interlock. The easiest solution seems to be to force smgrimmedsync at the end of the index build or table copy, even when the operation is being WAL-logged. This is sufficient since the new index or table will be of interest to no one if we don't get as far as committing the current transaction.
1 parent 057ea34 commit 1a3de15

File tree

3 files changed

+46
-24
lines changed

3 files changed

+46
-24
lines changed

src/backend/access/nbtree/nbtsort.c

+20-10
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@
5656
* Portions Copyright (c) 1994, Regents of the University of California
5757
*
5858
* IDENTIFICATION
59-
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtsort.c,v 1.85 2004/07/21 22:31:20 tgl Exp $
59+
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtsort.c,v 1.86 2004/08/15 23:44:46 tgl Exp $
6060
*
6161
*-------------------------------------------------------------------------
6262
*/
@@ -322,16 +322,15 @@ _bt_blwritepage(BTWriteState *wstate, Page page, BlockNumber blkno)
322322
wstate->btws_zeropage = (Page) palloc0(BLCKSZ);
323323
smgrwrite(wstate->index->rd_smgr, wstate->btws_pages_written++,
324324
(char *) wstate->btws_zeropage,
325-
!wstate->btws_use_wal);
325+
true);
326326
}
327327

328328
/*
329-
* Now write the page. If not using WAL, say isTemp = true, to suppress
330-
* duplicate fsync. If we are using WAL, it surely isn't a temp index,
331-
* so !use_wal is a sufficient condition.
329+
* Now write the page. We say isTemp = true even if it's not a
330+
* temp index, because there's no need for smgr to schedule an fsync
331+
* for this write; we'll do it ourselves before ending the build.
332332
*/
333-
smgrwrite(wstate->index->rd_smgr, blkno, (char *) page,
334-
!wstate->btws_use_wal);
333+
smgrwrite(wstate->index->rd_smgr, blkno, (char *) page, true);
335334

336335
if (blkno == wstate->btws_pages_written)
337336
wstate->btws_pages_written++;
@@ -802,9 +801,20 @@ _bt_load(BTWriteState *wstate, BTSpool *btspool, BTSpool *btspool2)
802801
_bt_uppershutdown(wstate, state);
803802

804803
/*
805-
* If we weren't using WAL, and the index isn't temp, we must fsync it
806-
* down to disk before it's safe to commit the transaction.
804+
* If the index isn't temp, we must fsync it down to disk before it's
805+
* safe to commit the transaction. (For a temp index we don't care
806+
* since the index will be uninteresting after a crash anyway.)
807+
*
808+
* It's obvious that we must do this when not WAL-logging the build.
809+
* It's less obvious that we have to do it even if we did WAL-log the
810+
* index pages. The reason is that since we're building outside
811+
* shared buffers, a CHECKPOINT occurring during the build has no way
812+
* to flush the previously written data to disk (indeed it won't know
813+
* the index even exists). A crash later on would replay WAL from the
814+
* checkpoint, therefore it wouldn't replay our earlier WAL entries.
815+
* If we do not fsync those pages here, they might still not be on disk
816+
* when the crash occurs.
807817
*/
808-
if (!wstate->btws_use_wal && !wstate->index->rd_istemp)
818+
if (!wstate->index->rd_istemp)
809819
smgrimmedsync(wstate->index->rd_smgr);
810820
}

src/backend/commands/tablecmds.c

+19-8
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $PostgreSQL: pgsql/src/backend/commands/tablecmds.c,v 1.125 2004/08/13 04:50:28 tgl Exp $
11+
* $PostgreSQL: pgsql/src/backend/commands/tablecmds.c,v 1.126 2004/08/15 23:44:46 tgl Exp $
1212
*
1313
*-------------------------------------------------------------------------
1414
*/
@@ -5479,18 +5479,29 @@ copy_relation_data(Relation rel, SMgrRelation dst)
54795479
}
54805480

54815481
/*
5482-
* Now write the page. If not using WAL, say isTemp = true, to
5483-
* suppress duplicate fsync. If we are using WAL, it surely isn't a
5484-
* temp rel, so !use_wal is a sufficient condition.
5482+
* Now write the page. We say isTemp = true even if it's not a
5483+
* temp rel, because there's no need for smgr to schedule an fsync
5484+
* for this write; we'll do it ourselves below.
54855485
*/
5486-
smgrwrite(dst, blkno, buf, !use_wal);
5486+
smgrwrite(dst, blkno, buf, true);
54875487
}
54885488

54895489
/*
5490-
* If we weren't using WAL, and the rel isn't temp, we must fsync it
5491-
* down to disk before it's safe to commit the transaction.
5490+
* If the rel isn't temp, we must fsync it down to disk before it's
5491+
* safe to commit the transaction. (For a temp rel we don't care
5492+
* since the rel will be uninteresting after a crash anyway.)
5493+
*
5494+
* It's obvious that we must do this when not WAL-logging the copy.
5495+
* It's less obvious that we have to do it even if we did WAL-log the
5496+
* copied pages. The reason is that since we're copying outside
5497+
* shared buffers, a CHECKPOINT occurring during the copy has no way
5498+
* to flush the previously written data to disk (indeed it won't know
5499+
* the new rel even exists). A crash later on would replay WAL from the
5500+
* checkpoint, therefore it wouldn't replay our earlier WAL entries.
5501+
* If we do not fsync those pages here, they might still not be on disk
5502+
* when the crash occurs.
54925503
*/
5493-
if (!use_wal && !rel->rd_istemp)
5504+
if (!rel->rd_istemp)
54945505
smgrimmedsync(dst);
54955506
}
54965507

src/backend/storage/smgr/smgr.c

+7-6
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
*
1212
*
1313
* IDENTIFICATION
14-
* $PostgreSQL: pgsql/src/backend/storage/smgr/smgr.c,v 1.77 2004/07/17 03:28:55 tgl Exp $
14+
* $PostgreSQL: pgsql/src/backend/storage/smgr/smgr.c,v 1.78 2004/08/15 23:44:46 tgl Exp $
1515
*
1616
*-------------------------------------------------------------------------
1717
*/
@@ -621,14 +621,15 @@ smgrtruncate(SMgrRelation reln, BlockNumber nblocks, bool isTemp)
621621
*
622622
* Synchronously force all of the specified relation down to disk.
623623
*
624-
* This is really only useful for non-WAL-logged index building:
625-
* instead of incrementally WAL-logging the index build steps,
626-
* we can just write completed index pages to disk with smgrwrite
624+
* This is useful for building completely new relations (eg, new
625+
* indexes). Instead of incrementally WAL-logging the index build
626+
* steps, we can just write completed index pages to disk with smgrwrite
627627
* or smgrextend, and then fsync the completed index file before
628628
* committing the transaction. (This is sufficient for purposes of
629629
* crash recovery, since it effectively duplicates forcing a checkpoint
630-
* for the completed index. But it is *not* workable if one wishes
631-
* to use the WAL log for PITR or replication purposes.)
630+
* for the completed index. But it is *not* sufficient if one wishes
631+
* to use the WAL log for PITR or replication purposes: in that case
632+
* we have to make WAL entries as well.)
632633
*
633634
* The preceding writes should specify isTemp = true to avoid
634635
* duplicative fsyncs.

0 commit comments

Comments
 (0)