Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit bb8f629

Browse files
committed
Move full text search operators, functions, and data type sections into
the main documentation, out of its own text search chapter.
1 parent 8bc225e commit bb8f629

File tree

3 files changed

+1282
-1271
lines changed

3 files changed

+1282
-1271
lines changed

doc/src/sgml/datatype.sgml

Lines changed: 144 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/datatype.sgml,v 1.207 2007/08/21 01:11:11 tgl Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/datatype.sgml,v 1.208 2007/08/29 20:37:14 momjian Exp $ -->
22

33
<chapter id="datatype">
44
<title id="datatype-title">Data Types</title>
@@ -234,6 +234,18 @@
234234
<entry>date and time, including time zone</entry>
235235
</row>
236236

237+
<row>
238+
<entry><type>tsquery</type></entry>
239+
<entry></entry>
240+
<entry>full text search query</entry>
241+
</row>
242+
243+
<row>
244+
<entry><type>tsvector</type></entry>
245+
<entry></entry>
246+
<entry>full text search document</entry>
247+
</row>
248+
237249
<row>
238250
<entry><type>uuid</type></entry>
239251
<entry></entry>
@@ -3264,6 +3276,137 @@ a0eebc999c0b4ef8bb6d6bb9bd380a11
32643276
</para>
32653277
</sect1>
32663278

3279+
<sect1 id="datatype-textsearch">
3280+
<title>Full Text Search</title>
3281+
3282+
<variablelist>
3283+
3284+
<indexterm zone="datatype-textsearch">
3285+
<primary>tsvector</primary>
3286+
</indexterm>
3287+
3288+
<varlistentry>
3289+
<term><firstterm>tsvector</firstterm></term>
3290+
<listitem>
3291+
3292+
<para>
3293+
<type>tsvector</type> is a data type that represents a document and is
3294+
optimized for full text searching. In the simplest case,
3295+
<type>tsvector</type> is a sorted list of lexemes, so even without indexes
3296+
full text searches perform better than standard <literal>~</literal> and
3297+
<literal>LIKE</literal> operations:
3298+
3299+
<programlisting>
3300+
SELECT 'a fat cat sat on a mat and ate a fat rat'::tsvector;
3301+
tsvector
3302+
----------------------------------------------------
3303+
'a' 'on' 'and' 'ate' 'cat' 'fat' 'mat' 'rat' 'sat'
3304+
</programlisting>
3305+
3306+
Notice, that <literal>space</literal> is also a lexeme:
3307+
3308+
<programlisting>
3309+
SELECT 'space '' '' is a lexeme'::tsvector;
3310+
tsvector
3311+
----------------------------------
3312+
'a' 'is' ' ' 'space' 'lexeme'
3313+
</programlisting>
3314+
3315+
Each lexeme, optionally, can have positional information which is used for
3316+
<varname>proximity ranking</varname>:
3317+
3318+
<programlisting>
3319+
SELECT 'a:1 fat:2 cat:3 sat:4 on:5 a:6 mat:7 and:8 ate:9 a:10 fat:11 rat:12'::tsvector;
3320+
tsvector
3321+
-------------------------------------------------------------------------------
3322+
'a':1,6,10 'on':5 'and':8 'ate':9 'cat':3 'fat':2,11 'mat':7 'rat':12 'sat':4
3323+
</programlisting>
3324+
3325+
Each lexeme position also can be labeled as <literal>A</literal>,
3326+
<literal>B</literal>, <literal>C</literal>, <literal>D</literal>,
3327+
where <literal>D</literal> is the default. These labels can be used to group
3328+
lexemes into different <emphasis>importance</emphasis> or
3329+
<emphasis>rankings</emphasis>, for example to reflect document structure.
3330+
Actual values can be assigned at search time and used during the calculation
3331+
of the document rank. This is very useful for controlling search results.
3332+
</para>
3333+
3334+
<para>
3335+
The concatenation operator, e.g. <literal>tsvector || tsvector</literal>,
3336+
can "construct" a document from several parts. The order is important if
3337+
<type>tsvector</type> contains positional information. Of course,
3338+
it is also possible to build a document using different tables:
3339+
3340+
<programlisting>
3341+
SELECT 'fat:1 cat:2'::tsvector || 'fat:1 rat:2'::tsvector;
3342+
?column?
3343+
---------------------------
3344+
'cat':2 'fat':1,3 'rat':4
3345+
3346+
SELECT 'fat:1 rat:2'::tsvector || 'fat:1 cat:2'::tsvector;
3347+
?column?
3348+
---------------------------
3349+
'cat':4 'fat':1,3 'rat':2
3350+
</programlisting>
3351+
3352+
</para>
3353+
3354+
</listitem>
3355+
3356+
</varlistentry>
3357+
3358+
<indexterm zone="datatype-textsearch">
3359+
<primary>tsquery</primary>
3360+
</indexterm>
3361+
3362+
<varlistentry>
3363+
<term><firstterm>tsquery</firstterm></term>
3364+
<listitem>
3365+
3366+
<para>
3367+
<type>tsquery</type> is a data type for textual queries which supports
3368+
the boolean operators <literal>&amp;</literal> (AND), <literal>|</literal> (OR),
3369+
and parentheses. A <type>tsquery</type> consists of lexemes
3370+
(optionally labeled by letters) with boolean operators in between:
3371+
3372+
<programlisting>
3373+
SELECT 'fat &amp; cat'::tsquery;
3374+
tsquery
3375+
---------------
3376+
'fat' &amp; 'cat'
3377+
SELECT 'fat:ab &amp; cat'::tsquery;
3378+
tsquery
3379+
------------------
3380+
'fat':AB &amp; 'cat'
3381+
</programlisting>
3382+
3383+
Labels can be used to restrict the search region, which allows the
3384+
development of different search engines using the same full text index.
3385+
</para>
3386+
3387+
<para>
3388+
<type>tsqueries</type> can be concatenated using <literal>&amp;&amp;</literal> (AND)
3389+
and <literal>||</literal> (OR) operators:
3390+
3391+
<programlisting>
3392+
SELECT 'a &amp; b'::tsquery &amp;&amp; 'c | d'::tsquery;
3393+
?column?
3394+
---------------------------
3395+
'a' &amp; 'b' &amp; ( 'c' | 'd' )
3396+
3397+
SELECT 'a &amp; b'::tsquery || 'c|d'::tsquery;
3398+
?column?
3399+
---------------------------
3400+
'a' &amp; 'b' | ( 'c' | 'd' )
3401+
</programlisting>
3402+
3403+
</para>
3404+
</listitem>
3405+
</varlistentry>
3406+
</variablelist>
3407+
3408+
</sect1>
3409+
32673410
<sect1 id="datatype-xml">
32683411
<title><acronym>XML</> Type</title>
32693412

0 commit comments

Comments
 (0)