Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit b6e42bd

Browse files
committed
Update GIN limitations documentation to match current reality.
1 parent 06e2757 commit b6e42bd

File tree

1 file changed

+31
-21
lines changed

1 file changed

+31
-21
lines changed

doc/src/sgml/gin.sgml

+31-21
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/gin.sgml,v 2.18 2009/03/25 22:19:01 tgl Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/gin.sgml,v 2.19 2009/04/09 19:07:44 tgl Exp $ -->
22

33
<chapter id="GIN">
44
<title>GIN Indexes</title>
@@ -103,8 +103,10 @@
103103
If the query contains no keys then <function>extractQuery</>
104104
should store 0 or -1 into <literal>*nkeys</>, depending on the
105105
semantics of the operator. 0 means that every
106-
value matches the <literal>query</> and a sequential scan should be
107-
performed. -1 means nothing can match the <literal>query</>.
106+
value matches the <literal>query</> and a full-index scan should be
107+
performed (but see <xref linkend="gin-limit">).
108+
-1 means that nothing can match the <literal>query</>, and
109+
so the index scan can be skipped entirely.
108110
<literal>pmatch</> is an output argument for use when partial match
109111
is supported. To use it, <function>extractQuery</> must allocate
110112
an array of <literal>*nkeys</> booleans and store its address at
@@ -354,26 +356,20 @@
354356
<title>Limitations</title>
355357

356358
<para>
357-
<acronym>GIN</acronym> doesn't support full index scans: because there are
358-
often many keys per value, each heap pointer would be returned many times,
359-
and there is no easy way to prevent this.
359+
<acronym>GIN</acronym> doesn't support full index scans. The reason for
360+
this is that <function>extractValue</> is allowed to return zero keys,
361+
as for example might happen with an empty string or empty array. In such
362+
a case the indexed value will be unrepresented in the index. It is
363+
therefore impossible for <acronym>GIN</acronym> to guarantee that a
364+
scan of the index can find every row in the table.
360365
</para>
361366

362367
<para>
363-
When <function>extractQuery</function> returns zero keys,
364-
<acronym>GIN</acronym> will emit an error. Depending on the operator,
365-
a void query might match all, some, or none of the indexed values (for
366-
example, every array contains the empty array, but does not overlap the
367-
empty array), and <acronym>GIN</acronym> cannot determine the correct
368-
answer, nor produce a full-index-scan result if it could determine that
369-
that was correct.
370-
</para>
371-
372-
<para>
373-
It is not an error for <function>extractValue</> to return zero keys,
374-
but in this case the indexed value will be unrepresented in the index.
375-
This is another reason why full index scan is not useful &mdash; it would
376-
miss such rows.
368+
Because of this limitation, when <function>extractQuery</function> returns
369+
<literal>nkeys = 0</> to indicate that all values match the query,
370+
<acronym>GIN</acronym> will emit an error. (If there are multiple ANDed
371+
indexable operators in the query, this happens only if they all return zero
372+
for <literal>nkeys</>.)
377373
</para>
378374

379375
<para>
@@ -383,7 +379,21 @@
383379
<function>extractQuery</function> must convert an unrestricted search into
384380
a partial-match query that will scan the whole index. This is inefficient
385381
but might be necessary to avoid corner-case failures with operators such
386-
as <literal>LIKE</>.
382+
as <literal>LIKE</> or subset inclusion.
383+
</para>
384+
385+
<para>
386+
<acronym>GIN</acronym> assumes that indexable operators are strict.
387+
This means that <function>extractValue</> will not be called at all on
388+
a NULL value (so the value will go unindexed), and
389+
<function>extractQuery</function> will not be called on a NULL comparison
390+
value either (instead, the query is presumed to be unmatchable).
391+
</para>
392+
393+
<para>
394+
A possibly more serious limitation is that <acronym>GIN</acronym> cannot
395+
handle NULL keys &mdash; for example, an array containing a NULL cannot
396+
be handled except by ignoring the NULL.
387397
</para>
388398
</sect1>
389399

0 commit comments

Comments
 (0)