Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit b576757

Browse files
committed
Add external documentation for KNNGIST.
1 parent 04910a3 commit b576757

File tree

5 files changed

+193
-65
lines changed

5 files changed

+193
-65
lines changed

doc/src/sgml/gist.sgml

Lines changed: 74 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@
7878

7979
<para>
8080
All it takes to get a <acronym>GiST</acronym> access method up and running
81-
is to implement seven user-defined methods, which define the behavior of
81+
is to implement several user-defined methods, which define the behavior of
8282
keys in the tree. Of course these methods have to be pretty fancy to
8383
support fancy queries, but for all the standard queries (B-trees,
8484
R-trees, etc.) they're relatively straightforward. In short,
@@ -93,19 +93,23 @@
9393

9494
<para>
9595
There are seven methods that an index operator class for
96-
<acronym>GiST</acronym> must provide. Correctness of the index is ensured
96+
<acronym>GiST</acronym> must provide, and an eighth that is optional.
97+
Correctness of the index is ensured
9798
by proper implementation of the <function>same</>, <function>consistent</>
9899
and <function>union</> methods, while efficiency (size and speed) of the
99100
index will depend on the <function>penalty</> and <function>picksplit</>
100101
methods.
101-
The remaining two methods are <function>compress</> and
102+
The remaining two basic methods are <function>compress</> and
102103
<function>decompress</>, which allow an index to have internal tree data of
103104
a different type than the data it indexes. The leaves are to be of the
104105
indexed data type, while the other tree nodes can be of any C struct (but
105106
you still have to follow <productname>PostgreSQL</> data type rules here,
106107
see about <literal>varlena</> for variable sized data). If the tree's
107108
internal data type exists at the SQL level, the <literal>STORAGE</> option
108109
of the <command>CREATE OPERATOR CLASS</> command can be used.
110+
The optional eighth method is <function>distance</>, which is needed
111+
if the operator class wishes to support ordered scans (nearest-neighbor
112+
searches).
109113
</para>
110114

111115
<variablelist>
@@ -567,6 +571,73 @@ my_same(PG_FUNCTION_ARGS)
567571
</listitem>
568572
</varlistentry>
569573

574+
<varlistentry>
575+
<term><function>distance</></term>
576+
<listitem>
577+
<para>
578+
Given an index entry <literal>p</> and a query value <literal>q</>,
579+
this function determines the index entry's
580+
<quote>distance</> from the query value. This function must be
581+
supplied if the operator class contains any ordering operators.
582+
A query using the ordering operator will be implemented by returning
583+
index entries with the smallest <quote>distance</> values first,
584+
so the results must be consistent with the operator's semantics.
585+
For a leaf index entry the result just represents the distance to
586+
the index entry; for an internal tree node, the result must be the
587+
smallest distance that any child entry could have.
588+
</para>
589+
590+
<para>
591+
The <acronym>SQL</> declaration of the function must look like this:
592+
593+
<programlisting>
594+
CREATE OR REPLACE FUNCTION my_distance(internal, data_type, smallint, oid)
595+
RETURNS float8
596+
AS 'MODULE_PATHNAME'
597+
LANGUAGE C STRICT;
598+
</programlisting>
599+
600+
And the matching code in the C module could then follow this skeleton:
601+
602+
<programlisting>
603+
Datum my_distance(PG_FUNCTION_ARGS);
604+
PG_FUNCTION_INFO_V1(my_distance);
605+
606+
Datum
607+
my_distance(PG_FUNCTION_ARGS)
608+
{
609+
GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
610+
data_type *query = PG_GETARG_DATA_TYPE_P(1);
611+
StrategyNumber strategy = (StrategyNumber) PG_GETARG_UINT16(2);
612+
/* Oid subtype = PG_GETARG_OID(3); */
613+
data_type *key = DatumGetDataType(entry-&gt;key);
614+
double retval;
615+
616+
/*
617+
* determine return value as a function of strategy, key and query.
618+
*/
619+
620+
PG_RETURN_FLOAT8(retval);
621+
}
622+
</programlisting>
623+
624+
The arguments to the <function>distance</> function are identical to
625+
the arguments of the <function>consistent</> function, except that no
626+
recheck flag is used. The distance to a leaf index entry must always
627+
be determined exactly, since there is no way to re-order the tuples
628+
once they are returned. Some approximation is allowed when determining
629+
the distance to an internal tree node, so long as the result is never
630+
greater than any child's actual distance. Thus, for example, distance
631+
to a bounding box is usually sufficient in geometric applications. The
632+
result value can be any finite <type>float8</> value. (Infinity and
633+
minus infinity are used internally to handle cases such as nulls, so it
634+
is not recommended that <function>distance</> functions return these
635+
values.)
636+
</para>
637+
638+
</listitem>
639+
</varlistentry>
640+
570641
</variablelist>
571642

572643
</sect1>

doc/src/sgml/indexam.sgml

Lines changed: 28 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -505,11 +505,31 @@ amrestrpos (IndexScanDesc scan);
505505

506506
<para>
507507
Some access methods return index entries in a well-defined order, others
508-
do not. If entries are returned in sorted order, the access method should
509-
set <structname>pg_am</>.<structfield>amcanorder</> true to indicate that
510-
it supports ordered scans.
511-
All such access methods must use btree-compatible strategy numbers for
512-
their equality and ordering operators.
508+
do not. There are actually two different ways that an access method can
509+
support sorted output:
510+
511+
<itemizedlist>
512+
<listitem>
513+
<para>
514+
Access methods that always return entries in the natural ordering
515+
of their data (such as btree) should set
516+
<structname>pg_am</>.<structfield>amcanorder</> to true.
517+
Currently, such access methods must use btree-compatible strategy
518+
numbers for their equality and ordering operators.
519+
</para>
520+
</listitem>
521+
<listitem>
522+
<para>
523+
Access methods that support ordering operators should set
524+
<structname>pg_am</>.<structfield>amcanorderbyop</> to true.
525+
This indicates that the index is capable of returning entries in
526+
an order satisfying <literal>ORDER BY</> <replaceable>index_key</>
527+
<replaceable>operator</> <replaceable>constant</>. Scan modifiers
528+
of that form can be passed to <function>amrescan</> as described
529+
previously.
530+
</para>
531+
</listitem>
532+
</itemizedlist>
513533
</para>
514534

515535
<para>
@@ -521,7 +541,7 @@ amrestrpos (IndexScanDesc scan);
521541
the normal front-to-back direction, so <function>amgettuple</> must return
522542
the last matching tuple in the index, rather than the first one as it
523543
normally would. (This will only occur for access
524-
methods that advertise they support ordered scans.) After the
544+
methods that set <structfield>amcanorder</> to true.) After the
525545
first call, <function>amgettuple</> must be prepared to advance the scan in
526546
either direction from the most recently returned entry. (But if
527547
<structname>pg_am</>.<structfield>amcanbackward</> is false, all subsequent
@@ -563,7 +583,8 @@ amrestrpos (IndexScanDesc scan);
563583
tuples at once and marking or restoring scan positions isn't
564584
supported. Secondly, the tuples are returned in a bitmap which doesn't
565585
have any specific ordering, which is why <function>amgetbitmap</> doesn't
566-
take a <literal>direction</> argument. Finally, <function>amgetbitmap</>
586+
take a <literal>direction</> argument. (Ordering operators will never be
587+
supplied for such a scan, either.) Finally, <function>amgetbitmap</>
567588
does not guarantee any locking of the returned tuples, with implications
568589
spelled out in <xref linkend="index-locking">.
569590
</para>

doc/src/sgml/indices.sgml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,11 @@ CREATE INDEX test1_id_index ON test1 (id);
167167
upper/lower case conversion.
168168
</para>
169169

170+
<para>
171+
B-tree indexes can also be used to retrieve data in sorted order.
172+
This is not always faster than a simple scan and sort, but it is
173+
often helpful.
174+
</para>
170175

171176
<para>
172177
<indexterm>
@@ -236,6 +241,18 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable>
236241
classes are available in the <literal>contrib</> collection or as separate
237242
projects. For more information see <xref linkend="GiST">.
238243
</para>
244+
245+
<para>
246+
GiST indexes are also capable of optimizing <quote>nearest-neighbor</>
247+
searches, such as
248+
<programlisting><![CDATA[
249+
SELECT * FROM places ORDER BY location <-> point '(101,456)' LIMIT 10;
250+
]]>
251+
</programlisting>
252+
which finds the ten places closest to a given target point. The ability
253+
to do this is again dependent on the particular operator class being used.
254+
</para>
255+
239256
<para>
240257
<indexterm>
241258
<primary>index</primary>

doc/src/sgml/xindex.sgml

Lines changed: 25 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -361,59 +361,74 @@
361361
</table>
362362

363363
<para>
364-
GiST indexes require seven support functions,
364+
GiST indexes require seven support functions, with an optional eighth, as
365365
shown in <xref linkend="xindex-gist-support-table">.
366366
</para>
367367

368368
<table tocentry="1" id="xindex-gist-support-table">
369369
<title>GiST Support Functions</title>
370-
<tgroup cols="2">
370+
<tgroup cols="3">
371371
<thead>
372372
<row>
373373
<entry>Function</entry>
374+
<entry>Description</entry>
374375
<entry>Support Number</entry>
375376
</row>
376377
</thead>
377378
<tbody>
378379
<row>
379-
<entry>consistent - determine whether key satisfies the
380+
<entry><function>consistent</></entry>
381+
<entry>determine whether key satisfies the
380382
query qualifier</entry>
381383
<entry>1</entry>
382384
</row>
383385
<row>
384-
<entry>union - compute union of a set of keys</entry>
386+
<entry><function>union</></entry>
387+
<entry>compute union of a set of keys</entry>
385388
<entry>2</entry>
386389
</row>
387390
<row>
388-
<entry>compress - compute a compressed representation of a key or value
391+
<entry><function>compress</></entry>
392+
<entry>compute a compressed representation of a key or value
389393
to be indexed</entry>
390394
<entry>3</entry>
391395
</row>
392396
<row>
393-
<entry>decompress - compute a decompressed representation of a
397+
<entry><function>decompress</></entry>
398+
<entry>compute a decompressed representation of a
394399
compressed key</entry>
395400
<entry>4</entry>
396401
</row>
397402
<row>
398-
<entry>penalty - compute penalty for inserting new key into subtree
403+
<entry><function>penalty</></entry>
404+
<entry>compute penalty for inserting new key into subtree
399405
with given subtree's key</entry>
400406
<entry>5</entry>
401407
</row>
402408
<row>
403-
<entry>picksplit - determine which entries of a page are to be moved
409+
<entry><function>picksplit</></entry>
410+
<entry>determine which entries of a page are to be moved
404411
to the new page and compute the union keys for resulting pages</entry>
405412
<entry>6</entry>
406413
</row>
407414
<row>
408-
<entry>equal - compare two keys and return true if they are equal</entry>
415+
<entry><function>equal</></entry>
416+
<entry>compare two keys and return true if they are equal</entry>
409417
<entry>7</entry>
410418
</row>
419+
<row>
420+
<entry><function>distance</></entry>
421+
<entry>
422+
(optional method) determine distance from key to query value
423+
</entry>
424+
<entry>8</entry>
425+
</row>
411426
</tbody>
412427
</tgroup>
413428
</table>
414429

415430
<para>
416-
GIN indexes require four support functions,
431+
GIN indexes require four support functions, with an optional fifth, as
417432
shown in <xref linkend="xindex-gin-support-table">.
418433
</para>
419434

0 commit comments

Comments
 (0)