|
1 |
| -<!-- $PostgreSQL: pgsql/doc/src/sgml/gin.sgml,v 2.18 2009/03/25 22:19:01 tgl Exp $ --> |
| 1 | +<!-- $PostgreSQL: pgsql/doc/src/sgml/gin.sgml,v 2.19 2009/04/09 19:07:44 tgl Exp $ --> |
2 | 2 |
|
3 | 3 | <chapter id="GIN">
|
4 | 4 | <title>GIN Indexes</title>
|
|
103 | 103 | If the query contains no keys then <function>extractQuery</>
|
104 | 104 | should store 0 or -1 into <literal>*nkeys</>, depending on the
|
105 | 105 | semantics of the operator. 0 means that every
|
106 |
| - value matches the <literal>query</> and a sequential scan should be |
107 |
| - performed. -1 means nothing can match the <literal>query</>. |
| 106 | + value matches the <literal>query</> and a full-index scan should be |
| 107 | + performed (but see <xref linkend="gin-limit">). |
| 108 | + -1 means that nothing can match the <literal>query</>, and |
| 109 | + so the index scan can be skipped entirely. |
108 | 110 | <literal>pmatch</> is an output argument for use when partial match
|
109 | 111 | is supported. To use it, <function>extractQuery</> must allocate
|
110 | 112 | an array of <literal>*nkeys</> booleans and store its address at
|
|
354 | 356 | <title>Limitations</title>
|
355 | 357 |
|
356 | 358 | <para>
|
357 |
| - <acronym>GIN</acronym> doesn't support full index scans: because there are |
358 |
| - often many keys per value, each heap pointer would be returned many times, |
359 |
| - and there is no easy way to prevent this. |
| 359 | + <acronym>GIN</acronym> doesn't support full index scans. The reason for |
| 360 | + this is that <function>extractValue</> is allowed to return zero keys, |
| 361 | + as for example might happen with an empty string or empty array. In such |
| 362 | + a case the indexed value will be unrepresented in the index. It is |
| 363 | + therefore impossible for <acronym>GIN</acronym> to guarantee that a |
| 364 | + scan of the index can find every row in the table. |
360 | 365 | </para>
|
361 | 366 |
|
362 | 367 | <para>
|
363 |
| - When <function>extractQuery</function> returns zero keys, |
364 |
| - <acronym>GIN</acronym> will emit an error. Depending on the operator, |
365 |
| - a void query might match all, some, or none of the indexed values (for |
366 |
| - example, every array contains the empty array, but does not overlap the |
367 |
| - empty array), and <acronym>GIN</acronym> cannot determine the correct |
368 |
| - answer, nor produce a full-index-scan result if it could determine that |
369 |
| - that was correct. |
370 |
| - </para> |
371 |
| - |
372 |
| - <para> |
373 |
| - It is not an error for <function>extractValue</> to return zero keys, |
374 |
| - but in this case the indexed value will be unrepresented in the index. |
375 |
| - This is another reason why full index scan is not useful — it would |
376 |
| - miss such rows. |
| 368 | + Because of this limitation, when <function>extractQuery</function> returns |
| 369 | + <literal>nkeys = 0</> to indicate that all values match the query, |
| 370 | + <acronym>GIN</acronym> will emit an error. (If there are multiple ANDed |
| 371 | + indexable operators in the query, this happens only if they all return zero |
| 372 | + for <literal>nkeys</>.) |
377 | 373 | </para>
|
378 | 374 |
|
379 | 375 | <para>
|
|
383 | 379 | <function>extractQuery</function> must convert an unrestricted search into
|
384 | 380 | a partial-match query that will scan the whole index. This is inefficient
|
385 | 381 | but might be necessary to avoid corner-case failures with operators such
|
386 |
| - as <literal>LIKE</>. |
| 382 | + as <literal>LIKE</> or subset inclusion. |
| 383 | + </para> |
| 384 | + |
| 385 | + <para> |
| 386 | + <acronym>GIN</acronym> assumes that indexable operators are strict. |
| 387 | + This means that <function>extractValue</> will not be called at all on |
| 388 | + a NULL value (so the value will go unindexed), and |
| 389 | + <function>extractQuery</function> will not be called on a NULL comparison |
| 390 | + value either (instead, the query is presumed to be unmatchable). |
| 391 | + </para> |
| 392 | + |
| 393 | + <para> |
| 394 | + A possibly more serious limitation is that <acronym>GIN</acronym> cannot |
| 395 | + handle NULL keys — for example, an array containing a NULL cannot |
| 396 | + be handled except by ignoring the NULL. |
387 | 397 | </para>
|
388 | 398 | </sect1>
|
389 | 399 |
|
|
0 commit comments