Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 88ba401

Browse files
committed
Update EXPLAIN discussion and examples to match current sources.
1 parent 01a819a commit 88ba401

File tree

1 file changed

+40
-41
lines changed

1 file changed

+40
-41
lines changed

doc/src/sgml/perform.sgml

Lines changed: 40 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.5 2001/05/17 21:50:16 petere Exp $
2+
$Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.6 2001/06/11 00:52:09 tgl Exp $
33
-->
44

55
<chapter id="performance-tips">
@@ -15,26 +15,19 @@ $Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.5 2001/05/17 21:50:16 pete
1515
<sect1 id="using-explain">
1616
<title>Using <command>EXPLAIN</command></title>
1717

18-
<note>
19-
<title>Author</title>
20-
<para>
21-
Written by Tom Lane, from e-mail dated 2000-03-27.
22-
</para>
23-
</note>
24-
2518
<para>
2619
<productname>Postgres</productname> devises a <firstterm>query
2720
plan</firstterm> for each query it is given. Choosing the right
2821
plan to match the query structure and the properties of the data
2922
is absolutely critical for good performance. You can use the
3023
<command>EXPLAIN</command> command to see what query plan the system
31-
creates for any query. Unfortunately,
32-
plan-reading is an art that deserves a tutorial, and I haven't
33-
had time to write one. Here is some quick &amp; dirty explanation.
24+
creates for any query.
25+
Plan-reading is an art that deserves an extensive tutorial, which
26+
this is not; but here is some basic information.
3427
</para>
3528

3629
<para>
37-
The numbers that are currently quoted by EXPLAIN are:
30+
The numbers that are currently quoted by <command>EXPLAIN</command> are:
3831

3932
<itemizedlist>
4033
<listitem>
@@ -94,12 +87,12 @@ $Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.5 2001/05/17 21:50:16 pete
9487
estimated selectivity of any WHERE-clause constraints that are being
9588
applied at this node. Ideally the top-level rows estimate will
9689
approximate the number of rows actually returned, updated, or deleted
97-
by the query (again, without considering the effects of LIMIT).
90+
by the query.
9891
</para>
9992

10093
<para>
10194
Here are some examples (using the regress test database after a
102-
vacuum analyze, and almost-7.0 sources):
95+
vacuum analyze, and 7.2 development sources):
10396

10497
<programlisting>
10598
regression=# explain select * from tenk1;
@@ -129,45 +122,51 @@ select * from pg_class where relname = 'tenk1';
129122
regression=# explain select * from tenk1 where unique1 &lt; 1000;
130123
NOTICE: QUERY PLAN:
131124

132-
Seq Scan on tenk1 (cost=0.00..358.00 rows=1000 width=148)
125+
Seq Scan on tenk1 (cost=0.00..358.00 rows=1003 width=148)
133126
</programlisting>
134127

135128
The estimate of output rows has gone down because of the WHERE clause.
136-
(This estimate is uncannily accurate because tenk1 is a particularly
137-
simple case --- the unique1 column has 10000 distinct values ranging
138-
from 0 to 9999, so the estimator's linear interpolation between min and
139-
max column values is dead-on.) However, the scan will still have to
140-
visit all 10000 rows, so the cost hasn't decreased; in fact it has gone
141-
up a bit to reflect the extra CPU time spent checking the WHERE
142-
condition.
129+
However, the scan will still have to visit all 10000 rows, so the cost
130+
hasn't decreased; in fact it has gone up a bit to reflect the extra CPU
131+
time spent checking the WHERE condition.
132+
</para>
133+
134+
<para>
135+
The actual number of rows this query would select is 1000, but the
136+
estimate is only approximate. If you try to duplicate this experiment,
137+
you will probably get a slightly different estimate; moreover, it will
138+
change after each <command>ANALYZE</command> command, because the
139+
statistics produced by <command>ANALYZE</command> are taken from a
140+
randomized sample of the table.
143141
</para>
144142

145143
<para>
146144
Modify the query to restrict the qualification even more:
147145

148146
<programlisting>
149-
regression=# explain select * from tenk1 where unique1 &lt; 100;
147+
regression=# explain select * from tenk1 where unique1 &lt; 50;
150148
NOTICE: QUERY PLAN:
151149

152-
Index Scan using tenk1_unique1 on tenk1 (cost=0.00..89.35 rows=100 width=148)
150+
Index Scan using tenk1_unique1 on tenk1 (cost=0.00..173.32 rows=47 width=148)
153151
</programlisting>
154152

155153
and you will see that if we make the WHERE condition selective
156154
enough, the planner will
157155
eventually decide that an indexscan is cheaper than a sequential scan.
158-
This plan will only have to visit 100 tuples because of the index,
159-
so it wins despite the fact that each individual fetch is expensive.
156+
This plan will only have to visit 50 tuples because of the index,
157+
so it wins despite the fact that each individual fetch is more expensive
158+
than reading a whole disk page sequentially.
160159
</para>
161160

162161
<para>
163162
Add another condition to the qualification:
164163

165164
<programlisting>
166-
regression=# explain select * from tenk1 where unique1 &lt; 100 and
165+
regression=# explain select * from tenk1 where unique1 &lt; 50 and
167166
regression-# stringu1 = 'xxx';
168167
NOTICE: QUERY PLAN:
169168

170-
Index Scan using tenk1_unique1 on tenk1 (cost=0.00..89.60 rows=1 width=148)
169+
Index Scan using tenk1_unique1 on tenk1 (cost=0.00..173.44 rows=1 width=148)
171170
</programlisting>
172171

173172
The added clause "stringu1 = 'xxx'" reduces the output-rows estimate,
@@ -178,22 +177,22 @@ Index Scan using tenk1_unique1 on tenk1 (cost=0.00..89.60 rows=1 width=148)
178177
Let's try joining two tables, using the fields we have been discussing:
179178

180179
<programlisting>
181-
regression=# explain select * from tenk1 t1, tenk2 t2 where t1.unique1 &lt; 100
180+
regression=# explain select * from tenk1 t1, tenk2 t2 where t1.unique1 &lt; 50
182181
regression-# and t1.unique2 = t2.unique2;
183182
NOTICE: QUERY PLAN:
184183

185-
Nested Loop (cost=0.00..144.07 rows=100 width=296)
184+
Nested Loop (cost=0.00..269.11 rows=47 width=296)
186185
-&gt; Index Scan using tenk1_unique1 on tenk1 t1
187-
(cost=0.00..89.35 rows=100 width=148)
186+
(cost=0.00..173.32 rows=47 width=148)
188187
-&gt; Index Scan using tenk2_unique2 on tenk2 t2
189-
(cost=0.00..0.53 rows=1 width=148)
188+
(cost=0.00..2.01 rows=1 width=148)
190189
</programlisting>
191190
</para>
192191

193192
<para>
194193
In this nested-loop join, the outer scan is the same indexscan we had
195194
in the example before last, and so its cost and row count are the same
196-
because we are applying the "unique1 &lt; 100" WHERE clause at that node.
195+
because we are applying the "unique1 &lt; 50" WHERE clause at that node.
197196
The "t1.unique2 = t2.unique2" clause isn't relevant yet, so it doesn't
198197
affect the outer scan's row count. For the inner scan, the
199198
current
@@ -203,7 +202,7 @@ Nested Loop (cost=0.00..144.07 rows=100 width=296)
203202
same inner-scan plan and costs that we'd get from, say, "explain select
204203
* from tenk2 where unique2 = 42". The loop node's costs are then set
205204
on the basis of the outer scan's cost, plus one repetition of the
206-
inner scan for each outer tuple (100 * 0.53, here), plus a little CPU
205+
inner scan for each outer tuple (47 * 2.01, here), plus a little CPU
207206
time for join processing.
208207
</para>
209208

@@ -226,27 +225,27 @@ Nested Loop (cost=0.00..144.07 rows=100 width=296)
226225
<programlisting>
227226
regression=# set enable_nestloop = off;
228227
SET VARIABLE
229-
regression=# explain select * from tenk1 t1, tenk2 t2 where t1.unique1 < 100
228+
regression=# explain select * from tenk1 t1, tenk2 t2 where t1.unique1 &lt; 50
230229
regression-# and t1.unique2 = t2.unique2;
231230
NOTICE: QUERY PLAN:
232231

233-
Hash Join (cost=89.60..574.10 rows=100 width=296)
232+
Hash Join (cost=173.44..557.03 rows=47 width=296)
234233
-&gt; Seq Scan on tenk2 t2
235234
(cost=0.00..333.00 rows=10000 width=148)
236-
-&gt; Hash (cost=89.35..89.35 rows=100 width=148)
235+
-&gt; Hash (cost=173.32..173.32 rows=47 width=148)
237236
-&gt; Index Scan using tenk1_unique1 on tenk1 t1
238-
(cost=0.00..89.35 rows=100 width=148)
237+
(cost=0.00..173.32 rows=47 width=148)
239238
</programlisting>
240239

241-
This plan proposes to extract the 100 interesting rows of tenk1
240+
This plan proposes to extract the 50 interesting rows of tenk1
242241
using ye same olde indexscan, stash them into an in-memory hash table,
243242
and then do a sequential scan of tenk2, probing into the hash table
244243
for possible matches of "t1.unique2 = t2.unique2" at each tenk2 tuple.
245244
The cost to read tenk1 and set up the hash table is entirely start-up
246245
cost for the hash join, since we won't get any tuples out until we can
247246
start reading tenk2. The total time estimate for the join also
248-
includes a pretty hefty charge for CPU time to probe the hash table
249-
10000 times. Note, however, that we are NOT charging 10000 times 89.35;
247+
includes a hefty charge for CPU time to probe the hash table
248+
10000 times. Note, however, that we are NOT charging 10000 times 173.32;
250249
the hash table setup is only done once in this plan type.
251250
</para>
252251
</sect1>

0 commit comments

Comments
 (0)