Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit de08582

Browse files
committed
Update discussion of tsearch2 migration. I'm not entirely sure about
the division of material between here and the tsearch2 contrib page, but at least it's not obviously unfinished any more.
1 parent 42e3ab3 commit de08582

File tree

1 file changed

+50
-72
lines changed

1 file changed

+50
-72
lines changed

doc/src/sgml/textsearch.sgml

Lines changed: 50 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/textsearch.sgml,v 1.31 2007/11/10 15:39:34 momjian Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/textsearch.sgml,v 1.32 2007/11/14 03:26:24 tgl Exp $ -->
22

33
<chapter id="textsearch">
44
<title id="textsearch-title">Full Text Search</title>
@@ -3489,99 +3489,77 @@ Parser: "pg_catalog.default"
34893489
<title>Migration from Pre-8.3 Text Search</title>
34903490

34913491
<para>
3492-
This area needs lots of work. Here is a quick list of known issues:
3492+
Applications that used the <filename>contrib/tsearch2</> add-on module
3493+
for text searching will need some adjustments to work with the
3494+
built-in features:
34933495
</para>
34943496

3495-
<itemizedlist mark="bullet">
3497+
<itemizedlist>
34963498
<listitem>
34973499
<para>
3498-
The old contrib/tsearch2 objects <emphasis>must</> be removed from
3499-
the pg_dump output from a pre-8.3 database. While many of them won't
3500-
load for lack of a tsearch2.so library, some do and cause problems.
3501-
We have a working perl script for doing this with a custom- or tar-format
3502-
backup, but there is a proposal to incorporate the functionality directly
3503-
into pg_restore. Neither approach will help for pg_dumpall output.
3500+
Some functions have been renamed or had small adjustments in their
3501+
argument lists, and all of them are now in the <literal>pg_catalog</>
3502+
schema, whereas in a previous installation they would have been in
3503+
<literal>public</> or another non-system schema. There is a new
3504+
version of <filename>contrib/tsearch2</> (see <xref linkend="tsearch2">)
3505+
that provides a compatibility layer to solve most problems in this
3506+
area.
35043507
</para>
35053508
</listitem>
35063509

35073510
<listitem>
35083511
<para>
3509-
The old dump may include schema-qualified references to the old
3510-
contrib/tsearch2 objects; for example <literal>public.tsvector</>
3511-
columns in table definitions. These will fail since the objects
3512-
are now in the pg_catalog schema. Given current pg_dump behavior
3513-
this will happen only for tables that are in a different schema
3514-
from the tsearch2 objects; which makes it more likely to bite
3515-
people who carefully put their tsearch2 objects in a
3516-
non-<literal>public</> schema.
3517-
</para>
3518-
3519-
<para>
3520-
Question: will restore-time failures of this type happen for
3521-
any objects other than the tsvector and tsquery datatypes?
3522-
</para>
3523-
3524-
<para>
3525-
The basic alternatives for fixing this seem to involve creating
3526-
a dummy linkage, such as a public.tsvector domain linking to the
3527-
base pg_catalog.tsvector type (which only helps for the datatypes);
3528-
or stripping the schema references out of the dump. We could
3529-
just recommend that users do this manually, or try to provide
3530-
some tools to help.
3531-
</para>
3532-
</listitem>
3533-
3534-
<listitem>
3535-
<para>
3536-
We have renamed the built-in tsvector update triggers, and changed
3537-
their arguments too. This will result in CREATE TRIGGER commands
3538-
failing during load, which can be ignored, but users will need to
3539-
re-issue them with suitable argument adjustment. We probably
3540-
can't automate that for them. Also, the old tsearch2 trigger
3541-
function offered an option to invoke functions, which was removed
3542-
as being a security hole. Users who were relying on that will need to
3543-
write custom trigger functions as a substitute. I think all we
3544-
can do here is document what to do to fix it.
3512+
The old <filename>contrib/tsearch2</> functions and other objects
3513+
<emphasis>must</> be suppressed when loading <application>pg_dump</>
3514+
output from a pre-8.3 database. While many of them won't load anyway,
3515+
a few will and then cause problems. One simple way to deal with this
3516+
is to load the new <filename>contrib/tsearch2</> module before restoring
3517+
the dump; then it will block the old objects from being loaded.
35453518
</para>
35463519
</listitem>
35473520

35483521
<listitem>
35493522
<para>
3550-
We have renamed a number of other functions besides the triggers,
3551-
compared to the tsearch2 versions. This seems unlikely to cause
3552-
any problems during dump/reload but it will require adjustments in
3553-
the bodies of stored procedures and in client application code.
3554-
Again, not much to do except document it.
3523+
Text search configuration setup is completely different now.
3524+
Instead of manually inserting rows into configuration tables,
3525+
search is configured through the specialized SQL commands shown
3526+
earlier in this chapter. There is not currently any automated
3527+
support for converting an existing custom configuration for 8.3;
3528+
you're on your own here.
35553529
</para>
35563530
</listitem>
35573531

35583532
<listitem>
35593533
<para>
3560-
Configuration setup is completely different now. Can we provide
3561-
any automated assistance for translating an old custom setup?
3562-
It probably can't be 100% automatic in any case, so maybe documentation
3563-
is the best we can do here too. Aside from the inside-the-database
3564-
differences, outside-the-database configuration files now have
3565-
prescribed location and extensions, which was not true before.
3566-
</para>
3567-
</listitem>
3534+
Most types of dictionaries rely on some outside-the-database
3535+
configuration files. These are largely compatible with pre-8.3
3536+
usage, but note the following differences:
35683537

3569-
<listitem>
3570-
<para>
3571-
Relocation of configuration from add-on tables into core system catalogs
3572-
will break client queries that looked at the add-on tables.
3573-
</para>
3574-
</listitem>
3538+
<itemizedlist spacing="compact" mark="bullet">
3539+
<listitem>
3540+
<para>
3541+
Configuration files now must be placed in a single specified
3542+
directory (<filename>$SHAREDIR/tsearch_data</>), and must have
3543+
a specific extension depending on the type of file, as noted
3544+
previously in the descriptions of the various dictionary types.
3545+
This restriction was added to forestall security problems.
3546+
</para>
3547+
</listitem>
35753548

3576-
<listitem>
3577-
<para>
3578-
Thesaurus files now use <literal>?</> for stop words.
3579-
</para>
3580-
</listitem>
3549+
<listitem>
3550+
<para>
3551+
Configuration files must be encoded in UTF-8 encoding,
3552+
regardless of what database encoding is used.
3553+
</para>
3554+
</listitem>
35813555

3582-
<listitem>
3583-
<para>
3584-
What else?
3556+
<listitem>
3557+
<para>
3558+
In thesaurus configuration files, stop words must be marked with
3559+
<literal>?</>.
3560+
</para>
3561+
</listitem>
3562+
</itemizedlist>
35853563
</para>
35863564
</listitem>
35873565

0 commit comments

Comments
 (0)