|
1 | 1 | <!--
|
2 |
| -$Header: /cvsroot/pgsql/doc/src/sgml/ref/analyze.sgml,v 1.14 2003/09/09 18:28:52 tgl Exp $ |
| 2 | +$Header: /cvsroot/pgsql/doc/src/sgml/ref/analyze.sgml,v 1.15 2003/09/11 17:31:45 momjian Exp $ |
3 | 3 | PostgreSQL documentation
|
4 | 4 | -->
|
5 | 5 |
|
@@ -28,10 +28,10 @@ ANALYZE [ VERBOSE ] [ <replaceable class="PARAMETER">table</replaceable> [ (<rep
|
28 | 28 | <title>Description</title>
|
29 | 29 |
|
30 | 30 | <para>
|
31 |
| - <command>ANALYZE</command> collects statistics about the contents of |
32 |
| - tables in the database, and stores the results in |
33 |
| - the system table <literal>pg_statistic</literal>. Subsequently, |
34 |
| - the query planner uses the statistics to help determine the most efficient |
| 31 | + <command>ANALYZE</command> collects statistics about the contents |
| 32 | + of tables in the database, and stores the results in the system |
| 33 | + table <literal>pg_statistic</literal>. Subsequently, the query |
| 34 | + planner uses these statistics to help determine the most efficient |
35 | 35 | execution plans for queries.
|
36 | 36 | </para>
|
37 | 37 |
|
@@ -90,49 +90,56 @@ ANALYZE [ VERBOSE ] [ <replaceable class="PARAMETER">table</replaceable> [ (<rep
|
90 | 90 | </para>
|
91 | 91 |
|
92 | 92 | <para>
|
93 |
| - Unlike <command>VACUUM FULL</command>, |
94 |
| - <command>ANALYZE</command> requires |
95 |
| - only a read lock on the target table, so it can run in parallel with |
96 |
| - other activity on the table. |
| 93 | + Unlike <command>VACUUM FULL</command>, <command>ANALYZE</command> |
| 94 | + requires only a read lock on the target table, so it can run in |
| 95 | + parallel with other activity on the table. |
97 | 96 | </para>
|
98 | 97 |
|
99 | 98 | <para>
|
100 |
| - For large tables, <command>ANALYZE</command> takes a random sample of the |
101 |
| - table contents, rather than examining every row. This allows even very |
102 |
| - large tables to be analyzed in a small amount of time. Note, however, |
103 |
| - that the statistics are only approximate, and will change slightly each |
104 |
| - time <command>ANALYZE</command> is run, even if the actual table contents |
105 |
| - did not change. This may result in small changes in the planner's |
106 |
| - estimated costs shown by <command>EXPLAIN</command>. |
| 99 | + The statistics collected by <command>ANALYZE</command> usually |
| 100 | + include a list of some of the most common values in each column and |
| 101 | + a histogram showing the approximate data distribution in each |
| 102 | + column. One or both of these may be omitted if |
| 103 | + <command>ANALYZE</command> deems them uninteresting (for example, |
| 104 | + in a unique-key column, there are no common values) or if the |
| 105 | + column data type does not support the appropriate operators. There |
| 106 | + is more information about the statistics in <xref |
| 107 | + linkend="maintenance">. |
107 | 108 | </para>
|
108 | 109 |
|
109 | 110 | <para>
|
110 |
| - The collected statistics usually include a list of some of the most common |
111 |
| - values in each column and a histogram showing the approximate data |
112 |
| - distribution in each column. One or both of these may be omitted if |
113 |
| - <command>ANALYZE</command> deems them uninteresting (for example, in |
114 |
| - a unique-key column, there are no common values) or if the column |
115 |
| - data type does not support the appropriate operators. There is more |
116 |
| - information about the statistics in <xref linkend="maintenance">. |
| 111 | + For large tables, <command>ANALYZE</command> takes a random sample |
| 112 | + of the table contents, rather than examining every row. This |
| 113 | + allows even very large tables to be analyzed in a small amount of |
| 114 | + time. Note, however, that the statistics are only approximate, and |
| 115 | + will change slightly each time <command>ANALYZE</command> is run, |
| 116 | + even if the actual table contents did not change. This may result |
| 117 | + in small changes in the planner's estimated costs shown by |
| 118 | + <command>EXPLAIN</command>. In rare situations, this |
| 119 | + non-determinism will cause the query optimizer to choose a |
| 120 | + different query plan between runs of <command>ANALYZE</command>. To |
| 121 | + avoid this, raise the amount of statistics collected by |
| 122 | + <command>ANALYZE</command>, as described below. |
117 | 123 | </para>
|
118 | 124 |
|
119 | 125 | <para>
|
120 | 126 | The extent of analysis can be controlled by adjusting the
|
121 |
| - <literal>default_statistics_target</> parameter variable, or on a |
122 |
| - column-by-column basis by setting the per-column |
123 |
| - statistics target with <command>ALTER TABLE ... ALTER COLUMN ... SET |
124 |
| - STATISTICS</command> (see |
125 |
| - <xref linkend="sql-altertable" endterm="sql-altertable-title">). The |
126 |
| - target value sets the maximum number of entries in the most-common-value |
127 |
| - list and the maximum number of bins in the histogram. The default |
128 |
| - target value is 10, but this can be adjusted up or down to trade off |
129 |
| - accuracy of planner estimates against the time taken for |
130 |
| - <command>ANALYZE</command> and the amount of space occupied |
131 |
| - in <literal>pg_statistic</literal>. |
132 |
| - In particular, setting the statistics target to zero disables collection of |
133 |
| - statistics for that column. It may be useful to do that for columns that |
134 |
| - are never used as part of the <literal>WHERE</>, <literal>GROUP BY</>, or <literal>ORDER BY</> clauses of |
135 |
| - queries, since the planner will have no use for statistics on such columns. |
| 127 | + <varname>DEFAULT_STATISTICS_TARGET</varname> parameter variable, or |
| 128 | + on a column-by-column basis by setting the per-column statistics |
| 129 | + target with <command>ALTER TABLE ... ALTER COLUMN ... SET |
| 130 | + STATISTICS</command> (see <xref linkend="sql-altertable" |
| 131 | + endterm="sql-altertable-title">). The target value sets the |
| 132 | + maximum number of entries in the most-common-value list and the |
| 133 | + maximum number of bins in the histogram. The default target value |
| 134 | + is 10, but this can be adjusted up or down to trade off accuracy of |
| 135 | + planner estimates against the time taken for |
| 136 | + <command>ANALYZE</command> and the amount of space occupied in |
| 137 | + <literal>pg_statistic</literal>. In particular, setting the |
| 138 | + statistics target to zero disables collection of statistics for |
| 139 | + that column. It may be useful to do that for columns that are |
| 140 | + never used as part of the <literal>WHERE</>, <literal>GROUP BY</>, |
| 141 | + or <literal>ORDER BY</> clauses of queries, since the planner will |
| 142 | + have no use for statistics on such columns. |
136 | 143 | </para>
|
137 | 144 |
|
138 | 145 | <para>
|
|
0 commit comments