Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 2d819a0

Browse files
committed
Introduce "builtin" collation provider.
New provider for collations, like "libc" or "icu", but without any external dependency. Initially, the only locale supported by the builtin provider is "C", which is identical to the libc provider's "C" locale. The libc provider's "C" locale has always been treated as a special case that uses an internal implementation, without using libc at all -- so the new builtin provider uses the same implementation. The builtin provider's locale is independent of the server environment variables LC_COLLATE and LC_CTYPE. Using the builtin provider, the database collation locale can be "C" while LC_COLLATE and LC_CTYPE are set to "en_US", which is impossible with the libc provider. By offering a new builtin provider, it clarifies that the semantics of a collation using this provider will never depend on libc, and makes it easier to document the behavior. Discussion: https://postgr.es/m/ab925f69-5f9d-f85e-b87c-bd2a44798659@joeconway.com Discussion: https://postgr.es/m/dd9261f4-7a98-4565-93ec-336c1c110d90@manitou-mail.org Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com Reviewed-by: Daniel Vérité, Peter Eisentraut, Jeremy Schneider
1 parent 6ab2e83 commit 2d819a0

25 files changed

+671
-158
lines changed

doc/src/sgml/charset.sgml

+73-17
Original file line numberDiff line numberDiff line change
@@ -342,22 +342,14 @@ initdb --locale=sv_SE
342342
<title>Locale Providers</title>
343343

344344
<para>
345-
<productname>PostgreSQL</productname> supports multiple <firstterm>locale
346-
providers</firstterm>. This specifies which library supplies the locale
347-
data. One standard provider name is <literal>libc</literal>, which uses
348-
the locales provided by the operating system C library. These are the
349-
locales used by most tools provided by the operating system. Another
350-
provider is <literal>icu</literal>, which uses the external
351-
ICU<indexterm><primary>ICU</primary></indexterm> library. ICU locales can
352-
only be used if support for ICU was configured when PostgreSQL was built.
345+
A locale provider specifies which library defines the locale behavior for
346+
collations and character classifications.
353347
</para>
354348

355349
<para>
356350
The commands and tools that select the locale settings, as described
357-
above, each have an option to select the locale provider. The examples
358-
shown earlier all use the <literal>libc</literal> provider, which is the
359-
default. Here is an example to initialize a database cluster using the
360-
ICU provider:
351+
above, each have an option to select the locale provider. Here is an
352+
example to initialize a database cluster using the ICU provider:
361353
<programlisting>
362354
initdb --locale-provider=icu --icu-locale=en
363355
</programlisting>
@@ -370,12 +362,76 @@ initdb --locale-provider=icu --icu-locale=en
370362
</para>
371363

372364
<para>
373-
Which locale provider to use depends on individual requirements. For most
374-
basic uses, either provider will give adequate results. For the libc
375-
provider, it depends on what the operating system offers; some operating
376-
systems are better than others. For advanced uses, ICU offers more locale
377-
variants and customization options.
365+
Regardless of the locale provider, the operating system is still used to
366+
provide some locale-aware behavior, such as messages (see <xref
367+
linkend="guc-lc-messages"/>).
378368
</para>
369+
370+
<para>
371+
The available locale providers are listed below:
372+
</para>
373+
374+
<variablelist>
375+
<varlistentry>
376+
<term><literal>builtin</literal></term>
377+
<listitem>
378+
<para>
379+
The <literal>builtin</literal> provider uses built-in operations. Only
380+
the <literal>C</literal> locale is supported for this provider.
381+
</para>
382+
<para>
383+
The <literal>C</literal> locale behavior is identical to the
384+
<literal>C</literal> locale in the libc provider. When using this
385+
locale, the behavior may depend on the database encoding.
386+
</para>
387+
</listitem>
388+
</varlistentry>
389+
390+
<varlistentry>
391+
<term><literal>icu</literal></term>
392+
<listitem>
393+
<para>
394+
The <literal>icu</literal> provider uses the external
395+
ICU<indexterm><primary>ICU</primary></indexterm>
396+
library. <productname>PostgreSQL</productname> must have been
397+
configured with support.
398+
</para>
399+
<para>
400+
ICU provides collation and character classification behavior that is
401+
independent of the operating system and database encoding, which is
402+
preferable if you expect to transition to other platforms without any
403+
change in results. <literal>LC_COLLATE</literal> and
404+
<literal>LC_CTYPE</literal> can be set independently of the ICU
405+
locale.
406+
</para>
407+
<note>
408+
<para>
409+
For the ICU provider, results may depend on the version of the ICU
410+
library used, as it is updated to reflect changes in natural language
411+
over time.
412+
</para>
413+
</note>
414+
</listitem>
415+
</varlistentry>
416+
417+
<varlistentry>
418+
<term><literal>libc</literal></term>
419+
<listitem>
420+
<para>
421+
The <literal>libc</literal> provider uses the operating system's C
422+
library. The collation and character classification behavior is
423+
controlled by the settings <literal>LC_COLLATE</literal> and
424+
<literal>LC_CTYPE</literal>, so they cannot be set independently.
425+
</para>
426+
<note>
427+
<para>
428+
The same locale name may have different behavior on different
429+
platforms when using the libc provider.
430+
</para>
431+
</note>
432+
</listitem>
433+
</varlistentry>
434+
</variablelist>
379435
</sect2>
380436

381437
<sect2 id="icu-locales">

doc/src/sgml/ref/create_collation.sgml

+8-3
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,11 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
9696
<replaceable>locale</replaceable>, you cannot specify either of those
9797
parameters.
9898
</para>
99+
<para>
100+
If <replaceable>provider</replaceable> is <literal>builtin</literal>,
101+
then <replaceable>locale</replaceable> must be specified and set to
102+
<literal>C</literal>.
103+
</para>
99104
</listitem>
100105
</varlistentry>
101106

@@ -129,9 +134,9 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
129134
<listitem>
130135
<para>
131136
Specifies the provider to use for locale services associated with this
132-
collation. Possible values are
133-
<literal>icu</literal><indexterm><primary>ICU</primary></indexterm>
134-
(if the server was built with ICU support) or <literal>libc</literal>.
137+
collation. Possible values are <literal>builtin</literal>,
138+
<literal>icu</literal><indexterm><primary>ICU</primary></indexterm> (if
139+
the server was built with ICU support) or <literal>libc</literal>.
135140
<literal>libc</literal> is the default. See <xref
136141
linkend="locale-providers"/> for details.
137142
</para>

doc/src/sgml/ref/create_database.sgml

+6-1
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,11 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
162162
linkend="create-database-lc-ctype"/>, or <xref
163163
linkend="create-database-icu-locale"/> individually.
164164
</para>
165+
<para>
166+
If <xref linkend="create-database-locale-provider"/> is
167+
<literal>builtin</literal>, then <replaceable>locale</replaceable>
168+
must be specified and set to <literal>C</literal>.
169+
</para>
165170
<tip>
166171
<para>
167172
The other locale settings <xref linkend="guc-lc-messages"/>, <xref
@@ -243,7 +248,7 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
243248
<listitem>
244249
<para>
245250
Specifies the provider to use for the default collation in this
246-
database. Possible values are
251+
database. Possible values are <literal>builtin</literal>,
247252
<literal>icu</literal><indexterm><primary>ICU</primary></indexterm>
248253
(if the server was built with ICU support) or <literal>libc</literal>.
249254
By default, the provider is the same as that of the <xref

doc/src/sgml/ref/createdb.sgml

+1-1
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,7 @@ PostgreSQL documentation
171171
</varlistentry>
172172

173173
<varlistentry>
174-
<term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
174+
<term><option>--locale-provider={<literal>builtin</literal>|<literal>libc</literal>|<literal>icu</literal>}</option></term>
175175
<listitem>
176176
<para>
177177
Specifies the locale provider for the database's default collation.

doc/src/sgml/ref/initdb.sgml

+16-1
Original file line numberDiff line numberDiff line change
@@ -286,6 +286,11 @@ PostgreSQL documentation
286286
environment that <command>initdb</command> runs in. Locale
287287
support is described in <xref linkend="locale"/>.
288288
</para>
289+
<para>
290+
If <option>--locale-provider</option> is <literal>builtin</literal>,
291+
<option>--locale</option> must be specified and set to
292+
<literal>C</literal>.
293+
</para>
289294
</listitem>
290295
</varlistentry>
291296

@@ -314,8 +319,18 @@ PostgreSQL documentation
314319
</listitem>
315320
</varlistentry>
316321

322+
<varlistentry id="app-initdb-builtin-locale">
323+
<term><option>--builtin-locale=<replaceable>locale</replaceable></option></term>
324+
<listitem>
325+
<para>
326+
Specifies the locale name when the builtin provider is used. Locale support
327+
is described in <xref linkend="locale"/>.
328+
</para>
329+
</listitem>
330+
</varlistentry>
331+
317332
<varlistentry id="app-initdb-option-locale-provider">
318-
<term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
333+
<term><option>--locale-provider={<literal>builtin</literal>|<literal>libc</literal>|<literal>icu</literal>}</option></term>
319334
<listitem>
320335
<para>
321336
This option sets the locale provider for databases created in the new

src/backend/catalog/pg_collation.c

+4-1
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,10 @@ CollationCreate(const char *collname, Oid collnamespace,
6464
Assert(collname);
6565
Assert(collnamespace);
6666
Assert(collowner);
67-
Assert((collcollate && collctype) || colllocale);
67+
Assert((collprovider == COLLPROVIDER_LIBC &&
68+
collcollate && collctype && !colllocale) ||
69+
(collprovider != COLLPROVIDER_LIBC &&
70+
!collcollate && !collctype && colllocale));
6871

6972
/*
7073
* Make sure there is no existing collation of same name & encoding.

src/backend/commands/collationcmds.c

+58-16
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
6666
DefElem *versionEl = NULL;
6767
char *collcollate;
6868
char *collctype;
69-
char *colllocale;
69+
const char *colllocale;
7070
char *collicurules;
7171
bool collisdeterministic;
7272
int collencoding;
@@ -213,7 +213,9 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
213213

214214
if (collproviderstr)
215215
{
216-
if (pg_strcasecmp(collproviderstr, "icu") == 0)
216+
if (pg_strcasecmp(collproviderstr, "builtin") == 0)
217+
collprovider = COLLPROVIDER_BUILTIN;
218+
else if (pg_strcasecmp(collproviderstr, "icu") == 0)
217219
collprovider = COLLPROVIDER_ICU;
218220
else if (pg_strcasecmp(collproviderstr, "libc") == 0)
219221
collprovider = COLLPROVIDER_LIBC;
@@ -243,7 +245,18 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
243245
if (lcctypeEl)
244246
collctype = defGetString(lcctypeEl);
245247

246-
if (collprovider == COLLPROVIDER_LIBC)
248+
if (collprovider == COLLPROVIDER_BUILTIN)
249+
{
250+
if (!colllocale)
251+
ereport(ERROR,
252+
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
253+
errmsg("parameter \"%s\" must be specified",
254+
"locale")));
255+
256+
colllocale = builtin_validate_locale(GetDatabaseEncoding(),
257+
colllocale);
258+
}
259+
else if (collprovider == COLLPROVIDER_LIBC)
247260
{
248261
if (!collcollate)
249262
ereport(ERROR,
@@ -303,7 +316,11 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
303316
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
304317
errmsg("ICU rules cannot be specified unless locale provider is ICU")));
305318

306-
if (collprovider == COLLPROVIDER_ICU)
319+
if (collprovider == COLLPROVIDER_BUILTIN)
320+
{
321+
collencoding = GetDatabaseEncoding();
322+
}
323+
else if (collprovider == COLLPROVIDER_ICU)
307324
{
308325
#ifdef USE_ICU
309326
/*
@@ -332,7 +349,16 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
332349
}
333350

334351
if (!collversion)
335-
collversion = get_collation_actual_version(collprovider, collprovider == COLLPROVIDER_ICU ? colllocale : collcollate);
352+
{
353+
const char *locale;
354+
355+
if (collprovider == COLLPROVIDER_LIBC)
356+
locale = collcollate;
357+
else
358+
locale = colllocale;
359+
360+
collversion = get_collation_actual_version(collprovider, locale);
361+
}
336362

337363
newoid = CollationCreate(collName,
338364
collNamespace,
@@ -433,8 +459,13 @@ AlterCollation(AlterCollationStmt *stmt)
433459
datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collversion, &isnull);
434460
oldversion = isnull ? NULL : TextDatumGetCString(datum);
435461

436-
datum = SysCacheGetAttrNotNull(COLLOID, tup, collForm->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colllocale : Anum_pg_collation_collcollate);
437-
newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
462+
if (collForm->collprovider == COLLPROVIDER_LIBC)
463+
datum = SysCacheGetAttrNotNull(COLLOID, tup, Anum_pg_collation_collcollate);
464+
else
465+
datum = SysCacheGetAttrNotNull(COLLOID, tup, Anum_pg_collation_colllocale);
466+
467+
newversion = get_collation_actual_version(collForm->collprovider,
468+
TextDatumGetCString(datum));
438469

439470
/* cannot change from NULL to non-NULL or vice versa */
440471
if ((!oldversion && newversion) || (oldversion && !newversion))
@@ -498,11 +529,16 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
498529

499530
provider = ((Form_pg_database) GETSTRUCT(dbtup))->datlocprovider;
500531

501-
datum = SysCacheGetAttrNotNull(DATABASEOID, dbtup,
502-
provider == COLLPROVIDER_ICU ?
503-
Anum_pg_database_datlocale : Anum_pg_database_datcollate);
504-
505-
locale = TextDatumGetCString(datum);
532+
if (provider == COLLPROVIDER_LIBC)
533+
{
534+
datum = SysCacheGetAttrNotNull(DATABASEOID, dbtup, Anum_pg_database_datcollate);
535+
locale = TextDatumGetCString(datum);
536+
}
537+
else
538+
{
539+
datum = SysCacheGetAttrNotNull(DATABASEOID, dbtup, Anum_pg_database_datlocale);
540+
locale = TextDatumGetCString(datum);
541+
}
506542

507543
ReleaseSysCache(dbtup);
508544
}
@@ -519,11 +555,17 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
519555

520556
provider = ((Form_pg_collation) GETSTRUCT(colltp))->collprovider;
521557
Assert(provider != COLLPROVIDER_DEFAULT);
522-
datum = SysCacheGetAttrNotNull(COLLOID, colltp,
523-
provider == COLLPROVIDER_ICU ?
524-
Anum_pg_collation_colllocale : Anum_pg_collation_collcollate);
525558

526-
locale = TextDatumGetCString(datum);
559+
if (provider == COLLPROVIDER_LIBC)
560+
{
561+
datum = SysCacheGetAttrNotNull(COLLOID, colltp, Anum_pg_collation_collcollate);
562+
locale = TextDatumGetCString(datum);
563+
}
564+
else
565+
{
566+
datum = SysCacheGetAttrNotNull(COLLOID, colltp, Anum_pg_collation_colllocale);
567+
locale = TextDatumGetCString(datum);
568+
}
527569

528570
ReleaseSysCache(colltp);
529571
}

0 commit comments

Comments
 (0)