Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 6d8b2aa

Browse files
committed
docs: update guidelines on when to use GIN and GiST indexes
Report by Tomas Vondra Backpatch through 9.5
1 parent f8a5e57 commit 6d8b2aa

File tree

1 file changed

+19
-61
lines changed

1 file changed

+19
-61
lines changed

doc/src/sgml/textsearch.sgml

+19-61
Original file line numberDiff line numberDiff line change
@@ -3192,7 +3192,7 @@ SELECT plainto_tsquery('supernovae stars');
31923192
</sect1>
31933193

31943194
<sect1 id="textsearch-indexes">
3195-
<title>GiST and GIN Index Types</title>
3195+
<title>GIN and GiST Index Types</title>
31963196

31973197
<indexterm zone="textsearch-indexes">
31983198
<primary>text search</primary>
@@ -3213,18 +3213,17 @@ SELECT plainto_tsquery('supernovae stars');
32133213
<term>
32143214
<indexterm zone="textsearch-indexes">
32153215
<primary>index</primary>
3216-
<secondary>GiST</secondary>
3216+
<secondary>GIN</secondary>
32173217
<tertiary>text search</tertiary>
32183218
</indexterm>
32193219

3220-
<literal>CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING GIST (<replaceable>column</replaceable>);</literal>
3220+
<literal>CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING GIN (<replaceable>column</replaceable>);</literal>
32213221
</term>
32223222

32233223
<listitem>
32243224
<para>
3225-
Creates a GiST (Generalized Search Tree)-based index.
3226-
The <replaceable>column</replaceable> can be of <type>tsvector</> or
3227-
<type>tsquery</> type.
3225+
Creates a GIN (Generalized Inverted Index)-based index.
3226+
The <replaceable>column</replaceable> must be of <type>tsvector</> type.
32283227
</para>
32293228
</listitem>
32303229
</varlistentry>
@@ -3234,17 +3233,18 @@ SELECT plainto_tsquery('supernovae stars');
32343233
<term>
32353234
<indexterm zone="textsearch-indexes">
32363235
<primary>index</primary>
3237-
<secondary>GIN</secondary>
3236+
<secondary>GiST</secondary>
32383237
<tertiary>text search</tertiary>
32393238
</indexterm>
32403239

3241-
<literal>CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING GIN (<replaceable>column</replaceable>);</literal>
3240+
<literal>CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING GIST (<replaceable>column</replaceable>);</literal>
32423241
</term>
32433242

32443243
<listitem>
32453244
<para>
3246-
Creates a GIN (Generalized Inverted Index)-based index.
3247-
The <replaceable>column</replaceable> must be of <type>tsvector</> type.
3245+
Creates a GiST (Generalized Search Tree)-based index.
3246+
The <replaceable>column</replaceable> can be of <type>tsvector</> or
3247+
<type>tsquery</> type.
32483248
</para>
32493249
</listitem>
32503250
</varlistentry>
@@ -3253,13 +3253,18 @@ SELECT plainto_tsquery('supernovae stars');
32533253
</para>
32543254

32553255
<para>
3256-
There are substantial performance differences between the two index types,
3257-
so it is important to understand their characteristics.
3256+
GIN indexes are the preferred text search index type. As inverted
3257+
indexes, they contain an index entry for each word (lexeme), with a
3258+
compressed list of matching locations. Multi-word searches can find
3259+
the first match, then use the index to remove rows that are lacking
3260+
additional words. GIN indexes store only the words (lexemes) of
3261+
<type>tsvector</> values, and not their weight labels. Thus a table
3262+
row recheck is needed when using a query that involves weights.
32583263
</para>
32593264

32603265
<para>
32613266
A GiST index is <firstterm>lossy</firstterm>, meaning that the index
3262-
may produce false matches, and it is necessary
3267+
might produce false matches, and it is necessary
32633268
to check the actual table row to eliminate such false matches.
32643269
(<productname>PostgreSQL</productname> does this automatically when needed.)
32653270
GiST indexes are lossy because each document is represented in the
@@ -3280,53 +3285,6 @@ SELECT plainto_tsquery('supernovae stars');
32803285
recommended.
32813286
</para>
32823287

3283-
<para>
3284-
GIN indexes are not lossy for standard queries, but their performance
3285-
depends logarithmically on the number of unique words.
3286-
(However, GIN indexes store only the words (lexemes) of <type>tsvector</>
3287-
values, and not their weight labels. Thus a table row recheck is needed
3288-
when using a query that involves weights.)
3289-
</para>
3290-
3291-
<para>
3292-
In choosing which index type to use, GiST or GIN, consider these
3293-
performance differences:
3294-
3295-
<itemizedlist spacing="compact" mark="bullet">
3296-
<listitem>
3297-
<para>
3298-
GIN index lookups are about three times faster than GiST
3299-
</para>
3300-
</listitem>
3301-
<listitem>
3302-
<para>
3303-
GIN indexes take about three times longer to build than GiST
3304-
</para>
3305-
</listitem>
3306-
<listitem>
3307-
<para>
3308-
GIN indexes are moderately slower to update than GiST indexes, but
3309-
about 10 times slower if fast-update support was disabled
3310-
(see <xref linkend="gin-fast-update"> for details)
3311-
</para>
3312-
</listitem>
3313-
<listitem>
3314-
<para>
3315-
GIN indexes are two-to-three times larger than GiST indexes
3316-
</para>
3317-
</listitem>
3318-
</itemizedlist>
3319-
</para>
3320-
3321-
<para>
3322-
As a rule of thumb, <acronym>GIN</acronym> indexes are best for static data
3323-
because lookups are faster. For dynamic data, GiST indexes are
3324-
faster to update. Specifically, <acronym>GiST</acronym> indexes are very
3325-
good for dynamic data and fast if the number of unique words (lexemes) is
3326-
under 100,000, while <acronym>GIN</acronym> indexes will handle 100,000+
3327-
lexemes better but are slower to update.
3328-
</para>
3329-
33303288
<para>
33313289
Note that <acronym>GIN</acronym> index build time can often be improved
33323290
by increasing <xref linkend="guc-maintenance-work-mem">, while
@@ -3335,7 +3293,7 @@ SELECT plainto_tsquery('supernovae stars');
33353293
</para>
33363294

33373295
<para>
3338-
Partitioning of big collections and the proper use of GiST and GIN indexes
3296+
Partitioning of big collections and the proper use of GIN and GiST indexes
33393297
allows the implementation of very fast searches with online update.
33403298
Partitioning can be done at the database level using table inheritance,
33413299
or by distributing documents over

0 commit comments

Comments
 (0)