Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 2bfd1b1

Browse files
committed
Don't install ICU collation keyword variants
Users can still create them themselves. Instead, document Unicode TR 35 collation options for ICU, so users can create all this themselves. Reviewed-by: Peter Geoghegan <pg@bowt.ie>
1 parent 51e225d commit 2bfd1b1

File tree

2 files changed

+84
-85
lines changed

2 files changed

+84
-85
lines changed

doc/src/sgml/charset.sgml

+84-14
Original file line numberDiff line numberDiff line change
@@ -664,13 +664,6 @@ SELECT a COLLATE "C" &lt; b COLLATE "POSIX" FROM test1;
664664
</listitem>
665665
</varlistentry>
666666

667-
<varlistentry>
668-
<term><literal>de-u-co-phonebk-x-icu</literal></term>
669-
<listitem>
670-
<para>German collation, phone book variant</para>
671-
</listitem>
672-
</varlistentry>
673-
674667
<varlistentry>
675668
<term><literal>de-AT-x-icu</literal></term>
676669
<listitem>
@@ -683,13 +676,6 @@ SELECT a COLLATE "C" &lt; b COLLATE "POSIX" FROM test1;
683676
</listitem>
684677
</varlistentry>
685678

686-
<varlistentry>
687-
<term><literal>de-AT-u-co-phonebk-x-icu</literal></term>
688-
<listitem>
689-
<para>German collation for Austria, phone book variant</para>
690-
</listitem>
691-
</varlistentry>
692-
693679
<varlistentry>
694680
<term><literal>und-x-icu</literal> (for <quote>undefined</quote>)</term>
695681
<listitem>
@@ -709,6 +695,90 @@ SELECT a COLLATE "C" &lt; b COLLATE "POSIX" FROM test1;
709695
will draw an error along the lines of <quote>collation "de-x-icu" for
710696
encoding "WIN874" does not exist</>.
711697
</para>
698+
699+
<para>
700+
ICU allows collations to be customized beyond the basic language+country
701+
set that is preloaded by <command>initdb</command>. Users are encouraged
702+
to define their own collation objects that make use of these facilities to
703+
suit the sorting behavior to their requirements. Here are some examples:
704+
705+
<variablelist>
706+
<varlistentry>
707+
<term><literal>CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk')</literal></term>
708+
<listitem>
709+
<para>German collation with phone book collation type</para>
710+
</listitem>
711+
</varlistentry>
712+
713+
<varlistentry>
714+
<term><literal>CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji')</literal></term>
715+
<listitem>
716+
<para>
717+
Root collation with Emoji collation type, per Unicode Technical Standard #51
718+
</para>
719+
</listitem>
720+
</varlistentry>
721+
722+
<varlistentry>
723+
<term><literal>CREATE COLLATION digitslast (provider = icu, locale = 'en-u-kr-latn-digit')</literal></term>
724+
<listitem>
725+
<para>
726+
Sort digits after Latin letters. (The default is digits before letters.)
727+
</para>
728+
</listitem>
729+
</varlistentry>
730+
731+
<varlistentry>
732+
<term><literal>CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper')</literal></term>
733+
<listitem>
734+
<para>
735+
Sort upper-case letters before lower-case letters. (The default is
736+
lower-case letters first.)
737+
</para>
738+
</listitem>
739+
</varlistentry>
740+
741+
<varlistentry>
742+
<term><literal>CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-latn-digit')</literal></term>
743+
<listitem>
744+
<para>
745+
Combines both of the above options.
746+
</para>
747+
</listitem>
748+
</varlistentry>
749+
750+
<varlistentry>
751+
<term><literal>CREATE COLLATION numeric (provider = icu, locale = 'en-u-kn-true')</literal></term>
752+
<listitem>
753+
<para>
754+
Numeric ordering, sorts sequences of digits by their numeric value,
755+
for example: <literal>A-21</literal> &lt; <literal>A-123</literal>
756+
(also known as natural sort).
757+
</para>
758+
</listitem>
759+
</varlistentry>
760+
</variablelist>
761+
762+
See <ulink url="http://unicode.org/reports/tr35/tr35-collation.html">Unicode
763+
Technical Standard #35</ulink>
764+
and <ulink url="https://tools.ietf.org/html/bcp47">BCP 47</ulink> for
765+
details. The list of possible collation types (<literal>co</literal>
766+
subtag) can be found in
767+
the <ulink url="http://www.unicode.org/repos/cldr/trunk/common/bcp47/collation.xml">CLDR
768+
repository</ulink>.
769+
The <ulink url="https://ssl.icu-project.org/icu-bin/locexp">ICU Locale
770+
Explorer</ulink> can be used to check the details of a particular locale
771+
definition.
772+
</para>
773+
774+
<para>
775+
Note that while this system allows creating collations that <quote>ignore
776+
case</quote> or <quote>ignore accents</quote> or similar (using
777+
the <literal>ks</literal> key), PostgreSQL does not at the moment allow
778+
such collations to act in a truly case- or accent-insensitive manner. Any
779+
strings that compare equal according to the collation but are not
780+
byte-wise equal will be sorted according to their byte values.
781+
</para>
712782
</sect4>
713783
</sect3>
714784

src/backend/commands/collationcmds.c

-71
Original file line numberDiff line numberDiff line change
@@ -687,30 +687,11 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
687687
*/
688688
for (i = -1; i < uloc_countAvailable(); i++)
689689
{
690-
/*
691-
* In ICU 4.2, ucol_getKeywordValuesForLocale() sometimes returns
692-
* values that will not be accepted by uloc_toLanguageTag(). Skip
693-
* loading keyword variants in that version. (Both
694-
* ucol_getKeywordValuesForLocale() and uloc_toLanguageTag() are
695-
* new in ICU 4.2, so older versions are not supported at all.)
696-
*
697-
* XXX We have no information about ICU 4.3 through 4.7, but we
698-
* know the code below works with 4.8.
699-
*/
700-
#if U_ICU_VERSION_MAJOR_NUM > 4 || (U_ICU_VERSION_MAJOR_NUM == 4 && U_ICU_VERSION_MINOR_NUM > 2)
701-
#define LOAD_ICU_KEYWORD_VARIANTS
702-
#endif
703-
704690
const char *name;
705691
char *langtag;
706692
char *icucomment;
707693
const char *collcollate;
708694
Oid collid;
709-
#ifdef LOAD_ICU_KEYWORD_VARIANTS
710-
UEnumeration *en;
711-
UErrorCode status;
712-
const char *val;
713-
#endif
714695

715696
if (i == -1)
716697
name = ""; /* ICU root locale */
@@ -744,58 +725,6 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
744725
CreateComments(collid, CollationRelationId, 0,
745726
icucomment);
746727
}
747-
748-
/*
749-
* Add keyword variants, if enabled.
750-
*/
751-
#ifdef LOAD_ICU_KEYWORD_VARIANTS
752-
status = U_ZERO_ERROR;
753-
en = ucol_getKeywordValuesForLocale("collation", name, TRUE, &status);
754-
if (U_FAILURE(status))
755-
ereport(ERROR,
756-
(errmsg("could not get keyword values for locale \"%s\": %s",
757-
name, u_errorName(status))));
758-
759-
status = U_ZERO_ERROR;
760-
uenum_reset(en, &status);
761-
while ((val = uenum_next(en, NULL, &status)))
762-
{
763-
char *localeid = psprintf("%s@collation=%s", name, val);
764-
765-
langtag = get_icu_language_tag(localeid);
766-
collcollate = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : localeid;
767-
768-
/*
769-
* Be paranoid about not allowing any non-ASCII strings into
770-
* pg_collation
771-
*/
772-
if (!is_all_ascii(langtag) || !is_all_ascii(collcollate))
773-
continue;
774-
775-
collid = CollationCreate(psprintf("%s-x-icu", langtag),
776-
nspid, GetUserId(),
777-
COLLPROVIDER_ICU, -1,
778-
collcollate, collcollate,
779-
get_collation_actual_version(COLLPROVIDER_ICU, collcollate),
780-
true, true);
781-
if (OidIsValid(collid))
782-
{
783-
ncreated++;
784-
785-
CommandCounterIncrement();
786-
787-
icucomment = get_icu_locale_comment(localeid);
788-
if (icucomment)
789-
CreateComments(collid, CollationRelationId, 0,
790-
icucomment);
791-
}
792-
}
793-
if (U_FAILURE(status))
794-
ereport(ERROR,
795-
(errmsg("could not get keyword values for locale \"%s\": %s",
796-
name, u_errorName(status))));
797-
uenum_close(en);
798-
#endif /* LOAD_ICU_KEYWORD_VARIANTS */
799728
}
800729
}
801730
#endif /* USE_ICU */

0 commit comments

Comments
 (0)