1
- <!-- $Header: /cvsroot/pgsql/doc/src/sgml/indices.sgml,v 1.41 2003/05/15 15:50:18 petere Exp $ -->
1
+ <!-- $Header: /cvsroot/pgsql/doc/src/sgml/indices.sgml,v 1.42 2003/05/28 16:03:55 tgl Exp $ -->
2
2
3
3
<chapter id="indexes">
4
4
<title id="indexes-title">Indexes</title>
20
20
<title>Introduction</title>
21
21
22
22
<para>
23
- The classical example for the need of an index is if there is a
24
- table similar to this:
23
+ Suppose we have a table similar to this:
25
24
<programlisting>
26
25
CREATE TABLE test1 (
27
26
id integer,
@@ -32,24 +31,24 @@ CREATE TABLE test1 (
32
31
<programlisting>
33
32
SELECT content FROM test1 WHERE id = <replaceable>constant</replaceable>;
34
33
</programlisting>
35
- Ordinarily , the system would have to scan the entire
36
- <structname>test1</structname> table row by row to find all
34
+ With no advance preparation , the system would have to scan the entire
35
+ <structname>test1</structname> table, row by row, to find all
37
36
matching entries. If there are a lot of rows in
38
- <structname>test1</structname> and only a few rows (possibly zero
39
- or one) returned by the query, then this is clearly an inefficient
40
- method. If the system were instructed to maintain an index on the
41
- <structfield>id</structfield> column, then it could use a more
37
+ <structname>test1</structname> and only a few rows (perhaps only zero
38
+ or one) that would be returned by such a query, then this is clearly an
39
+ inefficient method. But if the system has been instructed to maintain an
40
+ index on the <structfield>id</structfield> column, then it can use a more
42
41
efficient method for locating matching rows. For instance, it
43
42
might only have to walk a few levels deep into a search tree.
44
43
</para>
45
44
46
45
<para>
47
- A similar approach is used in most books of non-fiction: Terms and
46
+ A similar approach is used in most books of non-fiction: terms and
48
47
concepts that are frequently looked up by readers are collected in
49
48
an alphabetic index at the end of the book. The interested reader
50
49
can scan the index relatively quickly and flip to the appropriate
51
- page, and would not have to read the entire book to find the
52
- interesting location . As it is the task of the author to
50
+ page(s), rather than having to read the entire book to find the
51
+ material of interest . Just as it is the task of the author to
53
52
anticipate the items that the readers are most likely to look up,
54
53
it is the task of the database programmer to foresee which indexes
55
54
would be of advantage.
@@ -73,13 +72,14 @@ CREATE INDEX test1_id_index ON test1 (id);
73
72
74
73
<para>
75
74
Once the index is created, no further intervention is required: the
76
- system will use the index when it thinks it would be more efficient
75
+ system will update the index when the table is modified, and it will
76
+ use the index in queries when it thinks this would be more efficient
77
77
than a sequential table scan. But you may have to run the
78
78
<command>ANALYZE</command> command regularly to update
79
79
statistics to allow the query planner to make educated decisions.
80
80
Also read <xref linkend="performance-tips"> for information about
81
81
how to find out whether an index is used and when and why the
82
- planner may choose to <emphasis>not</emphasis> use an index.
82
+ planner may choose <emphasis>not</emphasis> to use an index.
83
83
</para>
84
84
85
85
<para>
@@ -198,7 +198,7 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable>
198
198
than B-tree indexes, and the index size and build time for hash
199
199
indexes is much worse. Hash indexes also suffer poor performance
200
200
under high concurrency. For these reasons, hash index use is
201
- discouraged.
201
+ presently discouraged.
202
202
</para>
203
203
</note>
204
204
</para>
@@ -250,14 +250,13 @@ CREATE INDEX test2_mm_idx ON test2 (major, minor);
250
250
Currently, only the B-tree and GiST implementations support multicolumn
251
251
indexes. Up to 32 columns may be specified. (This limit can be
252
252
altered when building <productname>PostgreSQL</productname>; see the
253
- file <filename>pg_config .h</filename>.)
253
+ file <filename>pg_config_manual .h</filename>.)
254
254
</para>
255
255
256
256
<para>
257
257
The query planner can use a multicolumn index for queries that
258
- involve the leftmost column in the index definition and any number
259
- of columns listed to the right of it without a gap (when
260
- used with appropriate operators). For example,
258
+ involve the leftmost column in the index definition plus any number
259
+ of columns listed to the right of it, without a gap. For example,
261
260
an index on <literal>(a, b, c)</literal> can be used in queries
262
261
involving all of <literal>a</literal>, <literal>b</literal>, and
263
262
<literal>c</literal>, or in queries involving both
@@ -266,7 +265,9 @@ CREATE INDEX test2_mm_idx ON test2 (major, minor);
266
265
(In a query involving <literal>a</literal> and <literal>c</literal>
267
266
the planner might choose to use the index for
268
267
<literal>a</literal> only and treat <literal>c</literal> like an
269
- ordinary unindexed column.)
268
+ ordinary unindexed column.) Of course, each column must be used with
269
+ operators appropriate to the index type; clauses that involve other
270
+ operators will not be considered.
270
271
</para>
271
272
272
273
<para>
@@ -283,8 +284,8 @@ SELECT name FROM test2 WHERE major = <replaceable>constant</replaceable> OR mino
283
284
<para>
284
285
Multicolumn indexes should be used sparingly. Most of the time,
285
286
an index on a single column is sufficient and saves space and time.
286
- Indexes with more than three columns are almost certainly
287
- inappropriate .
287
+ Indexes with more than three columns are unlikely to be helpful
288
+ unless the usage of the table is extremely stylized .
288
289
</para>
289
290
</sect1>
290
291
@@ -332,19 +333,19 @@ CREATE UNIQUE INDEX <replaceable>name</replaceable> ON <replaceable>table</repla
332
333
</sect1>
333
334
334
335
335
- <sect1 id="indexes-functional ">
336
- <title>Functional Indexes</title>
336
+ <sect1 id="indexes-expressional ">
337
+ <title>Indexes on Expressions </title>
337
338
338
- <indexterm zone="indexes-functional ">
339
+ <indexterm zone="indexes-expressional ">
339
340
<primary>indexes</primary>
340
- <secondary>on functions </secondary>
341
+ <secondary>on expressions </secondary>
341
342
</indexterm>
342
343
343
344
<para>
344
- For a <firstterm>functional index</firstterm>, an index is defined
345
- on the result of a function applied to one or more columns of a
346
- single table. Functional indexes can be used to obtain fast access
347
- to data based on the result of function calls .
345
+ An index column need not be just a column of the underlying table,
346
+ but can be a function or scalar expression computed from one or
347
+ more columns of the table. This feature is useful to obtain fast
348
+ access to tables based on the results of computations .
348
349
</para>
349
350
350
351
<para>
@@ -362,20 +363,29 @@ CREATE INDEX test1_lower_col1_idx ON test1 (lower(col1));
362
363
</para>
363
364
364
365
<para>
365
- The function in the index definition can take more than one
366
- argument, but they must be table columns, not constants.
367
- Functional indexes are always single-column (namely, the function
368
- result) even if the function uses more than one input column; there
369
- cannot be multicolumn indexes that contain function calls.
366
+ As another example, if one often does queries like this:
367
+ <programlisting>
368
+ SELECT * FROM people WHERE (first_name || ' ' || last_name) = 'John Smith';
369
+ </programlisting>
370
+ then it might be worth creating an index like this:
371
+ <programlisting>
372
+ CREATE INDEX people_names ON people ((first_name || ' ' || last_name));
373
+ </programlisting>
370
374
</para>
371
375
372
- <tip>
373
- <para>
374
- The restrictions mentioned in the previous paragraph can easily be
375
- worked around by defining a custom function to use in the index
376
- definition that computes any desired result internally.
377
- </para>
378
- </tip>
376
+ <para>
377
+ The syntax of the <command>CREATE INDEX</> command normally requires
378
+ writing parentheses around index expressions, as shown in the second
379
+ example. The parentheses may be omitted when the expression is just
380
+ a function call, as in the first example.
381
+ </para>
382
+
383
+ <para>
384
+ Index expressions are relatively expensive to maintain, since the
385
+ derived expression(s) must be computed for each row upon insertion
386
+ or whenever it is updated. Therefore they should be used only when
387
+ queries that can use the index are very frequent.
388
+ </para>
379
389
</sect1>
380
390
381
391
@@ -391,8 +401,8 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable>
391
401
The operator class identifies the operators to be used by the index
392
402
for that column. For example, a B-tree index on the type <type>int4</type>
393
403
would use the <literal>int4_ops</literal> class; this operator
394
- class includes comparison functions for values of type <type>int4</type>. In
395
- practice the default operator class for the column's data type is
404
+ class includes comparison functions for values of type <type>int4</type>.
405
+ In practice the default operator class for the column's data type is
396
406
usually sufficient. The main point of having operator classes is
397
407
that for some data types, there could be more than one meaningful
398
408
ordering. For example, we might want to sort a complex-number data
@@ -427,24 +437,25 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable>
427
437
<literal>name_pattern_ops</literal> support B-tree indexes on
428
438
the types <type>text</type>, <type>varchar</type>,
429
439
<type>char</type>, and <type>name</type>, respectively. The
430
- difference to the ordinary operator classes is that the values
440
+ difference from the ordinary operator classes is that the values
431
441
are compared strictly character by character rather than
432
442
according to the locale-specific collation rules. This makes
433
443
these operator classes suitable for use by queries involving
434
444
pattern matching expressions (<literal>LIKE</literal> or POSIX
435
445
regular expressions) if the server does not use the standard
436
- <quote>C</quote> locale. As an example, to index a
446
+ <quote>C</quote> locale. As an example, you might index a
437
447
<type>varchar</type> column like this:
438
448
<programlisting>
439
449
CREATE INDEX test_index ON test_table (col varchar_pattern_ops);
440
450
</programlisting>
441
- If you do use the C locale, you should instead create an index
442
- with the default operator class. Also note that you should
451
+ If you do use the C locale, you may instead create an index
452
+ with the default operator class, and it will still be useful
453
+ for pattern-matching queries. Also note that you should
443
454
create an index with the default operator class if you want
444
455
queries involving ordinary comparisons to use an index. Such
445
456
queries cannot use the
446
457
<literal><replaceable>xxx</replaceable>_pattern_ops</literal>
447
- operator classes. It is possible, however, to create multiple
458
+ operator classes. It is allowed to create multiple
448
459
indexes on the same column with different operator classes.
449
460
</para>
450
461
</listitem>
0 commit comments