Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit a4d4f59

Browse files
committed
Doc: improve documentation about ts_headline() function.
Now that I've had my nose in that code, I thought the docs about it left something to be desired.
1 parent c9b0c67 commit a4d4f59

File tree

1 file changed

+57
-47
lines changed

1 file changed

+57
-47
lines changed

doc/src/sgml/textsearch.sgml

Lines changed: 57 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1295,64 +1295,75 @@ ts_headline(<optional> <replaceable class="parameter">config</replaceable> <type
12951295
<itemizedlist spacing="compact" mark="bullet">
12961296
<listitem>
12971297
<para>
1298-
<literal>StartSel</literal>, <literal>StopSel</literal>: the strings with
1299-
which to delimit query words appearing in the document, to distinguish
1300-
them from other excerpted words. You must double-quote these strings
1301-
if they contain spaces or commas.
1298+
<literal>MaxWords</literal>, <literal>MinWords</literal> (integers):
1299+
these numbers determine the longest and shortest headlines to output.
1300+
The default values are 35 and 15.
13021301
</para>
13031302
</listitem>
13041303
<listitem>
13051304
<para>
1306-
<literal>MaxWords</literal>, <literal>MinWords</literal>: these numbers
1307-
determine the longest and shortest headlines to output.
1305+
<literal>ShortWord</literal> (integer): words of this length or less
1306+
will be dropped at the start and end of a headline, unless they are
1307+
query terms. The default value of three eliminates common English
1308+
articles.
13081309
</para>
13091310
</listitem>
13101311
<listitem>
13111312
<para>
1312-
<literal>ShortWord</literal>: words of this length or less will be
1313-
dropped at the start and end of a headline. The default
1314-
value of three eliminates common English articles.
1313+
<literal>HighlightAll</literal> (boolean): if
1314+
<literal>true</literal> the whole document will be used as the
1315+
headline, ignoring the preceding three parameters. The default
1316+
is <literal>false</literal>.
13151317
</para>
13161318
</listitem>
13171319
<listitem>
13181320
<para>
1319-
<literal>HighlightAll</literal>: Boolean flag; if
1320-
<literal>true</literal> the whole document will be used as the
1321-
headline, ignoring the preceding three parameters.
1321+
<literal>MaxFragments</literal> (integer): maximum number of text
1322+
fragments to display. The default value of zero selects a
1323+
non-fragment-based headline generation method. A value greater
1324+
than zero selects fragment-based headline generation (see below).
13221325
</para>
13231326
</listitem>
13241327
<listitem>
13251328
<para>
1326-
<literal>MaxFragments</literal>: maximum number of text excerpts
1327-
or fragments to display. The default value of zero selects a
1328-
non-fragment-oriented headline generation method. A value greater than
1329-
zero selects fragment-based headline generation. This method
1330-
finds text fragments with as many query words as possible and
1331-
stretches those fragments around the query words. As a result
1332-
query words are close to the middle of each fragment and have words on
1333-
each side. Each fragment will be of at most <literal>MaxWords</literal> and
1334-
words of length <literal>ShortWord</literal> or less are dropped at the start
1335-
and end of each fragment. If not all query words are found in the
1336-
document, then a single fragment of the first <literal>MinWords</literal>
1337-
in the document will be displayed.
1329+
<literal>StartSel</literal>, <literal>StopSel</literal> (strings):
1330+
the strings with which to delimit query words appearing in the
1331+
document, to distinguish them from other excerpted words. The
1332+
default values are <quote><literal>&lt;b&gt;</literal></quote> and
1333+
<quote><literal>&lt;/b&gt;</literal></quote>, which can be suitable
1334+
for HTML output.
13381335
</para>
13391336
</listitem>
13401337
<listitem>
13411338
<para>
1342-
<literal>FragmentDelimiter</literal>: When more than one fragment is
1343-
displayed, the fragments will be separated by this string.
1339+
<literal>FragmentDelimiter</literal> (string): When more than one
1340+
fragment is displayed, the fragments will be separated by this string.
1341+
The default is <quote><literal> ... </literal></quote>.
13441342
</para>
13451343
</listitem>
13461344
</itemizedlist>
13471345

13481346
These option names are recognized case-insensitively.
1349-
Any unspecified options receive these defaults:
1347+
You must double-quote string values if they contain spaces or commas.
1348+
</para>
13501349

1351-
<programlisting>
1352-
StartSel=&lt;b&gt;, StopSel=&lt;/b&gt;,
1353-
MaxWords=35, MinWords=15, ShortWord=3, HighlightAll=FALSE,
1354-
MaxFragments=0, FragmentDelimiter=" ... "
1355-
</programlisting>
1350+
<para>
1351+
In non-fragment-based headline
1352+
generation, <function>ts_headline</function> locates matches for the
1353+
given <replaceable class="parameter">query</replaceable> and chooses a
1354+
single one to display, preferring matches that have more query words
1355+
within the allowed headline length.
1356+
In fragment-based headline generation, <function>ts_headline</function>
1357+
locates the query matches and splits each match
1358+
into <quote>fragments</quote> of no more than <literal>MaxWords</literal>
1359+
words each, preferring fragments with more query words, and when
1360+
possible <quote>stretching</quote> fragments to include surrounding
1361+
words. The fragment-based mode is thus more useful when the query
1362+
matches span large sections of the document, or when it's desirable to
1363+
display multiple matches.
1364+
In either mode, if no query matches can be identified, then a single
1365+
fragment of the first <literal>MinWords</literal> words in the document
1366+
will be displayed.
13561367
</para>
13571368

13581369
<para>
@@ -1364,25 +1375,24 @@ SELECT ts_headline('english',
13641375
is to find all documents containing given query terms
13651376
and return them in order of their similarity to the
13661377
query.',
1367-
to_tsquery('query &amp; similarity'));
1368-
ts_headline
1378+
to_tsquery('english', 'query &amp; similarity'));
1379+
ts_headline
13691380
------------------------------------------------------------
1370-
containing given &lt;b&gt;query&lt;/b&gt; terms
1371-
and return them in order of their &lt;b&gt;similarity&lt;/b&gt; to the
1381+
containing given &lt;b&gt;query&lt;/b&gt; terms +
1382+
and return them in order of their &lt;b&gt;similarity&lt;/b&gt; to the+
13721383
&lt;b&gt;query&lt;/b&gt;.
13731384

13741385
SELECT ts_headline('english',
1375-
'The most common type of search
1376-
is to find all documents containing given query terms
1377-
and return them in order of their similarity to the
1378-
query.',
1379-
to_tsquery('query &amp; similarity'),
1380-
'StartSel = &lt;, StopSel = &gt;');
1381-
ts_headline
1382-
-------------------------------------------------------
1383-
containing given &lt;query&gt; terms
1384-
and return them in order of their &lt;similarity&gt; to the
1385-
&lt;query&gt;.
1386+
'Search terms may occur
1387+
many times in a document,
1388+
requiring ranking of the search matches to decide which
1389+
occurrences to display in the result.',
1390+
to_tsquery('english', 'search &amp; term'),
1391+
'MaxFragments=10, MaxWords=7, MinWords=3, StartSel=&lt;&lt;, StopSel=&gt;&gt;');
1392+
ts_headline
1393+
------------------------------------------------------------
1394+
&lt;&lt;Search&gt;&gt; &lt;&lt;terms&gt;&gt; may occur +
1395+
many times ... ranking of the &lt;&lt;search&gt;&gt; matches to decide
13861396
</screen>
13871397
</para>
13881398

0 commit comments

Comments
 (0)