|
1 |
| -<!-- $PostgreSQL: pgsql/doc/src/sgml/textsearch.sgml,v 1.31 2007/11/10 15:39:34 momjian Exp $ --> |
| 1 | +<!-- $PostgreSQL: pgsql/doc/src/sgml/textsearch.sgml,v 1.32 2007/11/14 03:26:24 tgl Exp $ --> |
2 | 2 |
|
3 | 3 | <chapter id="textsearch">
|
4 | 4 | <title id="textsearch-title">Full Text Search</title>
|
@@ -3489,99 +3489,77 @@ Parser: "pg_catalog.default"
|
3489 | 3489 | <title>Migration from Pre-8.3 Text Search</title>
|
3490 | 3490 |
|
3491 | 3491 | <para>
|
3492 |
| - This area needs lots of work. Here is a quick list of known issues: |
| 3492 | + Applications that used the <filename>contrib/tsearch2</> add-on module |
| 3493 | + for text searching will need some adjustments to work with the |
| 3494 | + built-in features: |
3493 | 3495 | </para>
|
3494 | 3496 |
|
3495 |
| - <itemizedlist mark="bullet"> |
| 3497 | + <itemizedlist> |
3496 | 3498 | <listitem>
|
3497 | 3499 | <para>
|
3498 |
| - The old contrib/tsearch2 objects <emphasis>must</> be removed from |
3499 |
| - the pg_dump output from a pre-8.3 database. While many of them won't |
3500 |
| - load for lack of a tsearch2.so library, some do and cause problems. |
3501 |
| - We have a working perl script for doing this with a custom- or tar-format |
3502 |
| - backup, but there is a proposal to incorporate the functionality directly |
3503 |
| - into pg_restore. Neither approach will help for pg_dumpall output. |
| 3500 | + Some functions have been renamed or had small adjustments in their |
| 3501 | + argument lists, and all of them are now in the <literal>pg_catalog</> |
| 3502 | + schema, whereas in a previous installation they would have been in |
| 3503 | + <literal>public</> or another non-system schema. There is a new |
| 3504 | + version of <filename>contrib/tsearch2</> (see <xref linkend="tsearch2">) |
| 3505 | + that provides a compatibility layer to solve most problems in this |
| 3506 | + area. |
3504 | 3507 | </para>
|
3505 | 3508 | </listitem>
|
3506 | 3509 |
|
3507 | 3510 | <listitem>
|
3508 | 3511 | <para>
|
3509 |
| - The old dump may include schema-qualified references to the old |
3510 |
| - contrib/tsearch2 objects; for example <literal>public.tsvector</> |
3511 |
| - columns in table definitions. These will fail since the objects |
3512 |
| - are now in the pg_catalog schema. Given current pg_dump behavior |
3513 |
| - this will happen only for tables that are in a different schema |
3514 |
| - from the tsearch2 objects; which makes it more likely to bite |
3515 |
| - people who carefully put their tsearch2 objects in a |
3516 |
| - non-<literal>public</> schema. |
3517 |
| - </para> |
3518 |
| - |
3519 |
| - <para> |
3520 |
| - Question: will restore-time failures of this type happen for |
3521 |
| - any objects other than the tsvector and tsquery datatypes? |
3522 |
| - </para> |
3523 |
| - |
3524 |
| - <para> |
3525 |
| - The basic alternatives for fixing this seem to involve creating |
3526 |
| - a dummy linkage, such as a public.tsvector domain linking to the |
3527 |
| - base pg_catalog.tsvector type (which only helps for the datatypes); |
3528 |
| - or stripping the schema references out of the dump. We could |
3529 |
| - just recommend that users do this manually, or try to provide |
3530 |
| - some tools to help. |
3531 |
| - </para> |
3532 |
| - </listitem> |
3533 |
| - |
3534 |
| - <listitem> |
3535 |
| - <para> |
3536 |
| - We have renamed the built-in tsvector update triggers, and changed |
3537 |
| - their arguments too. This will result in CREATE TRIGGER commands |
3538 |
| - failing during load, which can be ignored, but users will need to |
3539 |
| - re-issue them with suitable argument adjustment. We probably |
3540 |
| - can't automate that for them. Also, the old tsearch2 trigger |
3541 |
| - function offered an option to invoke functions, which was removed |
3542 |
| - as being a security hole. Users who were relying on that will need to |
3543 |
| - write custom trigger functions as a substitute. I think all we |
3544 |
| - can do here is document what to do to fix it. |
| 3512 | + The old <filename>contrib/tsearch2</> functions and other objects |
| 3513 | + <emphasis>must</> be suppressed when loading <application>pg_dump</> |
| 3514 | + output from a pre-8.3 database. While many of them won't load anyway, |
| 3515 | + a few will and then cause problems. One simple way to deal with this |
| 3516 | + is to load the new <filename>contrib/tsearch2</> module before restoring |
| 3517 | + the dump; then it will block the old objects from being loaded. |
3545 | 3518 | </para>
|
3546 | 3519 | </listitem>
|
3547 | 3520 |
|
3548 | 3521 | <listitem>
|
3549 | 3522 | <para>
|
3550 |
| - We have renamed a number of other functions besides the triggers, |
3551 |
| - compared to the tsearch2 versions. This seems unlikely to cause |
3552 |
| - any problems during dump/reload but it will require adjustments in |
3553 |
| - the bodies of stored procedures and in client application code. |
3554 |
| - Again, not much to do except document it. |
| 3523 | + Text search configuration setup is completely different now. |
| 3524 | + Instead of manually inserting rows into configuration tables, |
| 3525 | + search is configured through the specialized SQL commands shown |
| 3526 | + earlier in this chapter. There is not currently any automated |
| 3527 | + support for converting an existing custom configuration for 8.3; |
| 3528 | + you're on your own here. |
3555 | 3529 | </para>
|
3556 | 3530 | </listitem>
|
3557 | 3531 |
|
3558 | 3532 | <listitem>
|
3559 | 3533 | <para>
|
3560 |
| - Configuration setup is completely different now. Can we provide |
3561 |
| - any automated assistance for translating an old custom setup? |
3562 |
| - It probably can't be 100% automatic in any case, so maybe documentation |
3563 |
| - is the best we can do here too. Aside from the inside-the-database |
3564 |
| - differences, outside-the-database configuration files now have |
3565 |
| - prescribed location and extensions, which was not true before. |
3566 |
| - </para> |
3567 |
| - </listitem> |
| 3534 | + Most types of dictionaries rely on some outside-the-database |
| 3535 | + configuration files. These are largely compatible with pre-8.3 |
| 3536 | + usage, but note the following differences: |
3568 | 3537 |
|
3569 |
| - <listitem> |
3570 |
| - <para> |
3571 |
| - Relocation of configuration from add-on tables into core system catalogs |
3572 |
| - will break client queries that looked at the add-on tables. |
3573 |
| - </para> |
3574 |
| - </listitem> |
| 3538 | + <itemizedlist spacing="compact" mark="bullet"> |
| 3539 | + <listitem> |
| 3540 | + <para> |
| 3541 | + Configuration files now must be placed in a single specified |
| 3542 | + directory (<filename>$SHAREDIR/tsearch_data</>), and must have |
| 3543 | + a specific extension depending on the type of file, as noted |
| 3544 | + previously in the descriptions of the various dictionary types. |
| 3545 | + This restriction was added to forestall security problems. |
| 3546 | + </para> |
| 3547 | + </listitem> |
3575 | 3548 |
|
3576 |
| - <listitem> |
3577 |
| - <para> |
3578 |
| - Thesaurus files now use <literal>?</> for stop words. |
3579 |
| - </para> |
3580 |
| - </listitem> |
| 3549 | + <listitem> |
| 3550 | + <para> |
| 3551 | + Configuration files must be encoded in UTF-8 encoding, |
| 3552 | + regardless of what database encoding is used. |
| 3553 | + </para> |
| 3554 | + </listitem> |
3581 | 3555 |
|
3582 |
| - <listitem> |
3583 |
| - <para> |
3584 |
| - What else? |
| 3556 | + <listitem> |
| 3557 | + <para> |
| 3558 | + In thesaurus configuration files, stop words must be marked with |
| 3559 | + <literal>?</>. |
| 3560 | + </para> |
| 3561 | + </listitem> |
| 3562 | + </itemizedlist> |
3585 | 3563 | </para>
|
3586 | 3564 | </listitem>
|
3587 | 3565 |
|
|
0 commit comments