Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 7d6e281

Browse files
committed
Mention that EXPLAIN results on small tables shouldn't be extrapolated
to large tables. Recommend ANALYZE or VACUUM ANALYZE after populating a table.
1 parent 31fe394 commit 7d6e281

File tree

1 file changed

+45
-12
lines changed

1 file changed

+45
-12
lines changed

doc/src/sgml/perform.sgml

Lines changed: 45 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.6 2001/06/11 00:52:09 tgl Exp $
2+
$Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.7 2001/06/22 18:53:36 tgl Exp $
33
-->
44

55
<chapter id="performance-tips">
@@ -110,7 +110,7 @@ select * from pg_class where relname = 'tenk1';
110110
</programlisting>
111111

112112
you'll find out that tenk1 has 233 disk
113-
pages and 10000 tuples. So the cost is estimated at 233 block
113+
pages and 10000 tuples. So the cost is estimated at 233 page
114114
reads, defined as 1.0 apiece, plus 10000 * cpu_tuple_cost which is
115115
currently 0.01 (try <command>show cpu_tuple_cost</command>).
116116
</para>
@@ -248,6 +248,19 @@ Hash Join (cost=173.44..557.03 rows=47 width=296)
248248
10000 times. Note, however, that we are NOT charging 10000 times 173.32;
249249
the hash table setup is only done once in this plan type.
250250
</para>
251+
252+
<para>
253+
It is worth noting that EXPLAIN results should not be extrapolated
254+
to situations other than the one you are actually testing; for example,
255+
results on a toy-sized table can't be assumed to apply to large tables.
256+
The planner's cost estimates are not linear and so it may well choose
257+
a different plan for a larger or smaller table. An extreme example
258+
is that on a table that only occupies one disk page, you'll nearly
259+
always get a sequential scan plan whether indexes are available or not.
260+
The planner realizes that it's going to take one disk page read to
261+
process the table in any case, so there's no value in expending additional
262+
page reads to look at an index.
263+
</para>
251264
</sect1>
252265

253266
<sect1 id="explicit-joins">
@@ -375,10 +388,13 @@ SELECT * FROM d LEFT JOIN
375388

376389
<para>
377390
Turn off auto-commit and just do one commit at
378-
the end. Otherwise <productname>Postgres</productname> is doing a
379-
lot of work for each record
380-
added. In general when you are doing bulk inserts, you want
381-
to turn off some of the database features to gain speed.
391+
the end. (In plain SQL, this means issuing <command>BEGIN</command>
392+
at the start and <command>COMMIT</command> at the end. Some client
393+
libraries may do this behind your back, in which case you need to
394+
make sure the library does it when you want it done.)
395+
If you allow each insertion to be committed separately,
396+
<productname>Postgres</productname> is doing a lot of work for each
397+
record added.
382398
</para>
383399
</sect2>
384400

@@ -387,10 +403,11 @@ SELECT * FROM d LEFT JOIN
387403

388404
<para>
389405
Use <command>COPY FROM STDIN</command> to load all the records in one
390-
command, instead
391-
of a series of INSERT commands. This reduces parsing, planning, etc
406+
command, instead of using
407+
a series of <command>INSERT</command> commands. This reduces parsing,
408+
planning, etc
392409
overhead a great deal. If you do this then it's not necessary to fool
393-
around with autocommit, since it's only one command anyway.
410+
around with auto-commit, since it's only one command anyway.
394411
</para>
395412
</sect2>
396413

@@ -399,16 +416,32 @@ SELECT * FROM d LEFT JOIN
399416

400417
<para>
401418
If you are loading a freshly created table, the fastest way is to
402-
create the table, bulk-load with COPY, then create any indexes needed
419+
create the table, bulk-load with <command>COPY</command>, then create any
420+
indexes needed
403421
for the table. Creating an index on pre-existing data is quicker than
404422
updating it incrementally as each record is loaded.
405423
</para>
406424

407425
<para>
408426
If you are augmenting an existing table, you can <command>DROP
409-
INDEX</command>, load the table, then recreate the index. Of
427+
INDEX</command>, load the table, then recreate the index. Of
410428
course, the database performance for other users may be adversely
411-
affected during the time that the index is missing.
429+
affected during the time that the index is missing. One should also
430+
think twice before dropping UNIQUE indexes, since the error checking
431+
afforded by the UNIQUE constraint will be lost while the index is missing.
432+
</para>
433+
</sect2>
434+
435+
<sect2 id="populate-analyze">
436+
<title>ANALYZE Afterwards</title>
437+
438+
<para>
439+
It's a good idea to run <command>ANALYZE</command> or <command>VACUUM
440+
ANALYZE</command> anytime you've added or updated a lot of data,
441+
including just after initially populating a table. This ensures that
442+
the planner has up-to-date statistics about the table. With no statistics
443+
or obsolete statistics, the planner may make poor choices of query plans,
444+
leading to bad performance on queries that use your table.
412445
</para>
413446
</sect2>
414447
</sect1>

0 commit comments

Comments
 (0)