Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit fba8113

Browse files
committed
Teach CLUSTER to skip writing WAL if not needed (ie, not using archiving)
--- Simon. Also, code review and cleanup for the previous COPY-no-WAL patches --- Tom.
1 parent 4591fb1 commit fba8113

File tree

12 files changed

+282
-183
lines changed

12 files changed

+282
-183
lines changed

doc/src/sgml/perform.sgml

Lines changed: 85 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.63 2007/02/01 19:10:24 momjian Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.64 2007/03/29 00:15:36 tgl Exp $ -->
22

33
<chapter id="performance-tips">
44
<title>Performance Tips</title>
@@ -801,7 +801,7 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
801801
<command>EXECUTE</command> as many times as required. This avoids
802802
some of the overhead of repeatedly parsing and planning
803803
<command>INSERT</command>. Different interfaces provide this facility
804-
in different ways; look for Prepared Statements in the interface
804+
in different ways; look for <quote>prepared statements</> in the interface
805805
documentation.
806806
</para>
807807

@@ -815,14 +815,12 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
815815
<para>
816816
<command>COPY</command> is fastest when used within the same
817817
transaction as an earlier <command>CREATE TABLE</command> or
818-
<command>TRUNCATE</command> command. In those cases, no WAL
819-
needs to be written because in case of an error, the files
820-
containing the newly loaded data will be removed automatically.
821-
<command>CREATE TABLE AS SELECT</command> is also optimized
822-
to avoid writing WAL. <command>COPY</command> and
823-
<command>CREATE TABLE AS SELECT</command> will write WAL
824-
when <xref linkend="guc-archive-command"> is set and will not
825-
therefore be optimized in that case.
818+
<command>TRUNCATE</command> command. In such cases no WAL
819+
needs to be written, because in case of an error, the files
820+
containing the newly loaded data will be removed anyway.
821+
However, this consideration does not apply when
822+
<xref linkend="guc-archive-command"> is set, as all commands
823+
must write WAL in that case.
826824
</para>
827825

828826
</sect2>
@@ -897,23 +895,51 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
897895
<title>Turn off <varname>archive_command</varname></title>
898896

899897
<para>
900-
When loading large amounts of data you might want to unset the
901-
<xref linkend="guc-archive-command"> before loading. It might be
902-
faster to take a new base backup once the load has completed
903-
than to allow a large archive to accumulate.
898+
When loading large amounts of data into an installation that uses
899+
WAL archiving, you might want to disable archiving (unset the
900+
<xref linkend="guc-archive-command"> configuration variable)
901+
while loading. It might be
902+
faster to take a new base backup after the load has completed
903+
than to process a large amount of incremental WAL data.
904904
</para>
905905

906906
<para>
907-
This is particularly important advice because certain commands
908-
will perform more slowly when <varname>archive_command</varname>
909-
is set, as a result of their needing to write large amounts of WAL.
907+
Aside from avoiding the time for the archiver to process the WAL data,
908+
doing this will actually make certain commands faster, because they
909+
are designed not to write WAL at all if <varname>archive_command</varname>
910+
is unset. (They can guarantee crash safety more cheaply by doing an
911+
<function>fsync</> at the end than by writing WAL.)
910912
This applies to the following commands:
911-
<command>CREATE TABLE AS SELECT</command>,
912-
<command>CREATE INDEX</command> and also <command>COPY</command>, when
913-
it is executed in the same transaction as a prior
914-
<command>CREATE TABLE</command> or <command>TRUNCATE</command> command.
913+
<itemizedlist>
914+
<listitem>
915+
<para>
916+
<command>CREATE TABLE AS SELECT</command>
917+
</para>
918+
</listitem>
919+
<listitem>
920+
<para>
921+
<command>CREATE INDEX</command> (and variants such as
922+
<command>ALTER TABLE ADD PRIMARY KEY</command>)
923+
</para>
924+
</listitem>
925+
<listitem>
926+
<para>
927+
<command>ALTER TABLE SET TABLESPACE</command>
928+
</para>
929+
</listitem>
930+
<listitem>
931+
<para>
932+
<command>CLUSTER</command>
933+
</para>
934+
</listitem>
935+
<listitem>
936+
<para>
937+
<command>COPY FROM</command>, when the target table has been
938+
created or truncated earlier in the same transaction
939+
</para>
940+
</listitem>
941+
</itemizedlist>
915942
</para>
916-
917943
</sect2>
918944

919945
<sect2 id="populate-analyze">
@@ -950,15 +976,43 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
950976
By default, <application>pg_dump</> uses <command>COPY</>, and when
951977
it is generating a complete schema-and-data dump, it is careful to
952978
load data before creating indexes and foreign keys. So in this case
953-
the first several guidelines are handled automatically. What is left
954-
for you to do is to set appropriate (i.e., larger than normal) values
955-
for <varname>maintenance_work_mem</varname> and
956-
<varname>checkpoint_segments</varname>, as well as unsetting
957-
<varname>archive_command</varname> before loading the dump script,
958-
and then to run <command>ANALYZE</> afterwards and resetting
959-
<varname>archive_command</varname> if required. All of the
960-
parameters can be reset once the load has completed without needing
961-
to restart the server, as described in <xref linkend="config-setting">.
979+
several guidelines are handled automatically. What is left
980+
for you to do is to:
981+
<itemizedlist>
982+
<listitem>
983+
<para>
984+
Set appropriate (i.e., larger than normal) values for
985+
<varname>maintenance_work_mem</varname> and
986+
<varname>checkpoint_segments</varname>.
987+
</para>
988+
</listitem>
989+
<listitem>
990+
<para>
991+
If using WAL archiving, consider disabling it during the restore.
992+
To do that, unset <varname>archive_command</varname> before loading the
993+
dump script, and afterwards restore <varname>archive_command</varname>
994+
and take a fresh base backup.
995+
</para>
996+
</listitem>
997+
<listitem>
998+
<para>
999+
Consider whether the whole dump should be restored as a single
1000+
transaction. To do that, pass the <option>-1</> or
1001+
<option>--single-transaction</> command-line option to
1002+
<application>psql</> or <application>pg_restore</>. When using this
1003+
mode, even the smallest of errors will rollback the entire restore,
1004+
possibly discarding many hours of processing. Depending on how
1005+
interrelated the data is, that might seem preferable to manual cleanup,
1006+
or not. <command>COPY</> commands will run fastest if you use a single
1007+
transaction and have WAL archiving turned off.
1008+
</para>
1009+
</listitem>
1010+
<listitem>
1011+
<para>
1012+
Run <command>ANALYZE</> afterwards.
1013+
</para>
1014+
</listitem>
1015+
</itemizedlist>
9621016
</para>
9631017

9641018
<para>

src/backend/access/heap/heapam.c

Lines changed: 44 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $PostgreSQL: pgsql/src/backend/access/heap/heapam.c,v 1.229 2007/03/25 19:45:13 tgl Exp $
11+
* $PostgreSQL: pgsql/src/backend/access/heap/heapam.c,v 1.230 2007/03/29 00:15:37 tgl Exp $
1212
*
1313
*
1414
* INTERFACE ROUTINES
@@ -1360,11 +1360,14 @@ heap_get_latest_tid(Relation relation,
13601360
* that all new tuples go into new pages not containing any tuples from other
13611361
* transactions, that the relation gets fsync'd before commit, and that the
13621362
* transaction emits at least one WAL record to ensure RecordTransactionCommit
1363-
* will decide to WAL-log the commit. (see heap_sync() comments also)
1363+
* will decide to WAL-log the commit. (See also heap_sync() comments)
13641364
*
13651365
* use_fsm is passed directly to RelationGetBufferForTuple, which see for
13661366
* more info.
13671367
*
1368+
* Note that use_wal and use_fsm will be applied when inserting into the
1369+
* heap's TOAST table, too, if the tuple requires any out-of-line data.
1370+
*
13681371
* The return value is the OID assigned to the tuple (either here or by the
13691372
* caller), or InvalidOid if no OID. The header fields of *tup are updated
13701373
* to match the stored tuple; in particular tup->t_self receives the actual
@@ -1418,7 +1421,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
14181421
* into the relation; tup is the caller's original untoasted data.
14191422
*/
14201423
if (HeapTupleHasExternal(tup) || tup->t_len > TOAST_TUPLE_THRESHOLD)
1421-
heaptup = toast_insert_or_update(relation, tup, NULL, use_wal);
1424+
heaptup = toast_insert_or_update(relation, tup, NULL,
1425+
use_wal, use_fsm);
14221426
else
14231427
heaptup = tup;
14241428

@@ -1526,27 +1530,17 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
15261530
* simple_heap_insert - insert a tuple
15271531
*
15281532
* Currently, this routine differs from heap_insert only in supplying
1529-
* a default command ID. But it should be used rather than using
1530-
* heap_insert directly in most places where we are modifying system catalogs.
1533+
* a default command ID and not allowing access to the speedup options.
1534+
*
1535+
* This should be used rather than using heap_insert directly in most places
1536+
* where we are modifying system catalogs.
15311537
*/
15321538
Oid
15331539
simple_heap_insert(Relation relation, HeapTuple tup)
15341540
{
15351541
return heap_insert(relation, tup, GetCurrentCommandId(), true, true);
15361542
}
15371543

1538-
/*
1539-
* fast_heap_insert - insert a tuple with options to improve speed
1540-
*
1541-
* Currently, this routine allows specifying additional options for speed
1542-
* in certain cases, such as WAL-avoiding COPY command
1543-
*/
1544-
Oid
1545-
fast_heap_insert(Relation relation, HeapTuple tup, bool use_wal)
1546-
{
1547-
return heap_insert(relation, tup, GetCurrentCommandId(), use_wal, use_wal);
1548-
}
1549-
15501544
/*
15511545
* heap_delete - delete a tuple
15521546
*
@@ -2112,7 +2106,9 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
21122106
*/
21132107
if (need_toast)
21142108
{
2115-
heaptup = toast_insert_or_update(relation, newtup, &oldtup, true);
2109+
/* Note we always use WAL and FSM during updates */
2110+
heaptup = toast_insert_or_update(relation, newtup, &oldtup,
2111+
true, true);
21162112
newtupsize = MAXALIGN(heaptup->t_len);
21172113
}
21182114
else
@@ -3988,23 +3984,40 @@ heap2_desc(StringInfo buf, uint8 xl_info, char *rec)
39883984
appendStringInfo(buf, "UNKNOWN");
39893985
}
39903986

3991-
/* ----------------
3992-
* heap_sync - sync a heap, for use when no WAL has been written
3993-
*
3994-
* ----------------
3987+
/*
3988+
* heap_sync - sync a heap, for use when no WAL has been written
3989+
*
3990+
* This forces the heap contents (including TOAST heap if any) down to disk.
3991+
* If we skipped using WAL, and it's not a temp relation, we must force the
3992+
* relation down to disk before it's safe to commit the transaction. This
3993+
* requires writing out any dirty buffers and then doing a forced fsync.
3994+
*
3995+
* Indexes are not touched. (Currently, index operations associated with
3996+
* the commands that use this are WAL-logged and so do not need fsync.
3997+
* That behavior might change someday, but in any case it's likely that
3998+
* any fsync decisions required would be per-index and hence not appropriate
3999+
* to be done here.)
39954000
*/
39964001
void
39974002
heap_sync(Relation rel)
39984003
{
3999-
if (!rel->rd_istemp)
4004+
/* temp tables never need fsync */
4005+
if (rel->rd_istemp)
4006+
return;
4007+
4008+
/* main heap */
4009+
FlushRelationBuffers(rel);
4010+
/* FlushRelationBuffers will have opened rd_smgr */
4011+
smgrimmedsync(rel->rd_smgr);
4012+
4013+
/* toast heap, if any */
4014+
if (OidIsValid(rel->rd_rel->reltoastrelid))
40004015
{
4001-
/*
4002-
* If we skipped using WAL, and it's not a temp relation,
4003-
* we must force the relation down to disk before it's
4004-
* safe to commit the transaction. This requires forcing
4005-
* out any dirty buffers and then doing a forced fsync.
4006-
*/
4007-
FlushRelationBuffers(rel);
4008-
smgrimmedsync(rel->rd_smgr);
4016+
Relation toastrel;
4017+
4018+
toastrel = heap_open(rel->rd_rel->reltoastrelid, AccessShareLock);
4019+
FlushRelationBuffers(toastrel);
4020+
smgrimmedsync(toastrel->rd_smgr);
4021+
heap_close(toastrel, AccessShareLock);
40094022
}
40104023
}

src/backend/access/heap/tuptoaster.c

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $PostgreSQL: pgsql/src/backend/access/heap/tuptoaster.c,v 1.71 2007/02/27 23:48:07 tgl Exp $
11+
* $PostgreSQL: pgsql/src/backend/access/heap/tuptoaster.c,v 1.72 2007/03/29 00:15:37 tgl Exp $
1212
*
1313
*
1414
* INTERFACE ROUTINES
@@ -33,6 +33,7 @@
3333
#include "access/genam.h"
3434
#include "access/heapam.h"
3535
#include "access/tuptoaster.h"
36+
#include "access/xact.h"
3637
#include "catalog/catalog.h"
3738
#include "utils/fmgroids.h"
3839
#include "utils/pg_lzcompress.h"
@@ -42,7 +43,8 @@
4243
#undef TOAST_DEBUG
4344

4445
static void toast_delete_datum(Relation rel, Datum value);
45-
static Datum toast_save_datum(Relation rel, Datum value, bool use_wal);
46+
static Datum toast_save_datum(Relation rel, Datum value,
47+
bool use_wal, bool use_fsm);
4648
static varattrib *toast_fetch_datum(varattrib *attr);
4749
static varattrib *toast_fetch_datum_slice(varattrib *attr,
4850
int32 sliceoffset, int32 length);
@@ -333,6 +335,7 @@ toast_delete(Relation rel, HeapTuple oldtup)
333335
* Inputs:
334336
* newtup: the candidate new tuple to be inserted
335337
* oldtup: the old row version for UPDATE, or NULL for INSERT
338+
* use_wal, use_fsm: flags to be passed to heap_insert() for toast rows
336339
* Result:
337340
* either newtup if no toasting is needed, or a palloc'd modified tuple
338341
* that is what should actually get stored
@@ -342,7 +345,8 @@ toast_delete(Relation rel, HeapTuple oldtup)
342345
* ----------
343346
*/
344347
HeapTuple
345-
toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup, bool use_wal)
348+
toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
349+
bool use_wal, bool use_fsm)
346350
{
347351
HeapTuple result_tuple;
348352
TupleDesc tupleDesc;
@@ -618,7 +622,8 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup, bool us
618622
i = biggest_attno;
619623
old_value = toast_values[i];
620624
toast_action[i] = 'p';
621-
toast_values[i] = toast_save_datum(rel, toast_values[i], use_wal);
625+
toast_values[i] = toast_save_datum(rel, toast_values[i],
626+
use_wal, use_fsm);
622627
if (toast_free[i])
623628
pfree(DatumGetPointer(old_value));
624629

@@ -729,7 +734,8 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup, bool us
729734
i = biggest_attno;
730735
old_value = toast_values[i];
731736
toast_action[i] = 'p';
732-
toast_values[i] = toast_save_datum(rel, toast_values[i], use_wal);
737+
toast_values[i] = toast_save_datum(rel, toast_values[i],
738+
use_wal, use_fsm);
733739
if (toast_free[i])
734740
pfree(DatumGetPointer(old_value));
735741

@@ -977,14 +983,16 @@ toast_compress_datum(Datum value)
977983
* ----------
978984
*/
979985
static Datum
980-
toast_save_datum(Relation rel, Datum value, bool use_wal)
986+
toast_save_datum(Relation rel, Datum value,
987+
bool use_wal, bool use_fsm)
981988
{
982989
Relation toastrel;
983990
Relation toastidx;
984991
HeapTuple toasttup;
985992
TupleDesc toasttupDesc;
986993
Datum t_values[3];
987994
bool t_isnull[3];
995+
CommandId mycid = GetCurrentCommandId();
988996
varattrib *result;
989997
struct
990998
{
@@ -1063,7 +1071,7 @@ toast_save_datum(Relation rel, Datum value, bool use_wal)
10631071
if (!HeapTupleIsValid(toasttup))
10641072
elog(ERROR, "failed to build TOAST tuple");
10651073

1066-
fast_heap_insert(toastrel, toasttup, use_wal);
1074+
heap_insert(toastrel, toasttup, mycid, use_wal, use_fsm);
10671075

10681076
/*
10691077
* Create the index entry. We cheat a little here by not using

0 commit comments

Comments
 (0)