Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit f2553d4

Browse files
committed
Add option to use ICU as global locale provider
This adds the option to use ICU as the default locale provider for either the whole cluster or a database. New options for initdb, createdb, and CREATE DATABASE are used to select this. Since some (legacy) code still uses the libc locale facilities directly, we still need to set the libc global locale settings even if ICU is otherwise selected. So pg_database now has three locale-related fields: the existing datcollate and datctype, which are always set, and a new daticulocale, which is only set if ICU is selected. A similar change is made in pg_collation for consistency, but in that case, only the libc-related fields or the ICU-related field is set, never both. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c161a%402ndquadrant.com
1 parent f6f0db4 commit f2553d4

35 files changed

+947
-167
lines changed

doc/src/sgml/catalogs.sgml

+9
Original file line numberDiff line numberDiff line change
@@ -2384,6 +2384,15 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
23842384
</para></entry>
23852385
</row>
23862386

2387+
<row>
2388+
<entry role="catalog_table_entry"><para role="column_definition">
2389+
<structfield>colliculocale</structfield> <type>text</type>
2390+
</para>
2391+
<para>
2392+
ICU locale ID for this collation object
2393+
</para></entry>
2394+
</row>
2395+
23872396
<row>
23882397
<entry role="catalog_table_entry"><para role="column_definition">
23892398
<structfield>collversion</structfield> <type>text</type>

doc/src/sgml/charset.sgml

+102
Original file line numberDiff line numberDiff line change
@@ -276,6 +276,108 @@ initdb --locale=sv_SE
276276
</para>
277277
</sect2>
278278

279+
<sect2>
280+
<title>Selecting Locales</title>
281+
282+
<para>
283+
Locales can be selected in different scopes depending on requirements.
284+
The above overview showed how locales are specified using
285+
<command>initdb</command> to set the defaults for the entire cluster. The
286+
following list shows where locales can be selected. Each item provides
287+
the defaults for the subsequent items, and each lower item allows
288+
overriding the defaults on a finer granularity.
289+
</para>
290+
291+
<orderedlist>
292+
<listitem>
293+
<para>
294+
As explained above, the environment of the operating system provides the
295+
defaults for the locales of a newly initialized database cluster. In
296+
many cases, this is enough: If the operating system is configured for
297+
the desired language/territory, then
298+
<productname>PostgreSQL</productname> will by default also behave
299+
according to that locale.
300+
</para>
301+
</listitem>
302+
303+
<listitem>
304+
<para>
305+
As shown above, command-line options for <command>initdb</command>
306+
specify the locale settings for a newly initialized database cluster.
307+
Use this if the operating system does not have the locale configuration
308+
you want for your database system.
309+
</para>
310+
</listitem>
311+
312+
<listitem>
313+
<para>
314+
A locale can be selected separately for each database. The SQL command
315+
<command>CREATE DATABASE</command> and its command-line equivalent
316+
<command>createdb</command> have options for that. Use this for example
317+
if a database cluster houses databases for multiple tennants with
318+
different requirements.
319+
</para>
320+
</listitem>
321+
322+
<listitem>
323+
<para>
324+
Locale settings can be made for individual table columns. This uses an
325+
SQL object called <firstterm>collation</firstterm> and is explained in
326+
<xref linkend="collation"/>. Use this for example to sort data in
327+
different languages or customize the sort order of a particular table.
328+
</para>
329+
</listitem>
330+
331+
<listitem>
332+
<para>
333+
Finally, locales can be selected for an individual query. Again, this
334+
uses SQL collation objects. This could be used to change the sort order
335+
based on run-time choices or for ad-hoc experimentation.
336+
</para>
337+
</listitem>
338+
</orderedlist>
339+
</sect2>
340+
341+
<sect2>
342+
<title>Locale Providers</title>
343+
344+
<para>
345+
<productname>PostgreSQL</productname> supports multiple <firstterm>locale
346+
providers</firstterm>. This specifies which library supplies the locale
347+
data. One standard provider name is <literal>libc</literal>, which uses
348+
the locales provided by the operating system C library. These are the
349+
locales that most tools provided by the operating system use. Another
350+
provider is <literal>icu</literal>, which uses the external
351+
ICU<indexterm><primary>ICU</primary></indexterm> library. ICU locales can
352+
only be used if support for ICU was configured when PostgreSQL was built.
353+
</para>
354+
355+
<para>
356+
The commands and tools that select the locale settings, as described
357+
above, each have an option to select the locale provider. The examples
358+
shown earlier all use the <literal>libc</literal> provider, which is the
359+
default. Here is an example to initialize a database cluster using the
360+
ICU provider:
361+
<programlisting>
362+
initdb --locale-provider=icu --icu-locale=en
363+
</programlisting>
364+
See the description of the respective commands and programs for the
365+
respective details. Note that you can mix locale providers on different
366+
granularities, for example use <literal>libc</literal> by default for the
367+
cluster but have one database that uses the <literal>icu</literal>
368+
provider, and then have collation objects using either provider within
369+
those databases.
370+
</para>
371+
372+
<para>
373+
Which locale provider to use depends on individual requirements. For most
374+
basic uses, either provider will give adequate results. For the libc
375+
provider, it depends on what the operating system offers; some operating
376+
systems are better than others. For advanced uses, ICU offers more locale
377+
variants and customization options.
378+
</para>
379+
</sect2>
380+
279381
<sect2>
280382
<title>Problems</title>
281383

doc/src/sgml/ref/create_database.sgml

+32
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,8 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
2828
[ LOCALE [=] <replaceable class="parameter">locale</replaceable> ]
2929
[ LC_COLLATE [=] <replaceable class="parameter">lc_collate</replaceable> ]
3030
[ LC_CTYPE [=] <replaceable class="parameter">lc_ctype</replaceable> ]
31+
[ ICU_LOCALE [=] <replaceable class="parameter">icu_locale</replaceable> ]
32+
[ LOCALE_PROVIDER [=] <replaceable class="parameter">locale_provider</replaceable> ]
3133
[ COLLATION_VERSION = <replaceable>collation_version</replaceable> ]
3234
[ TABLESPACE [=] <replaceable class="parameter">tablespace_name</replaceable> ]
3335
[ ALLOW_CONNECTIONS [=] <replaceable class="parameter">allowconn</replaceable> ]
@@ -160,6 +162,29 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
160162
</listitem>
161163
</varlistentry>
162164

165+
<varlistentry>
166+
<term><replaceable class="parameter">icu_locale</replaceable></term>
167+
<listitem>
168+
<para>
169+
Specifies the ICU locale ID if the ICU locale provider is used.
170+
</para>
171+
</listitem>
172+
</varlistentry>
173+
174+
<varlistentry>
175+
<term><replaceable>locale_provider</replaceable></term>
176+
177+
<listitem>
178+
<para>
179+
Specifies the provider to use for the default collation in this
180+
database. Possible values are:
181+
<literal>icu</literal>,<indexterm><primary>ICU</primary></indexterm>
182+
<literal>libc</literal>. <literal>libc</literal> is the default. The
183+
available choices depend on the operating system and build options.
184+
</para>
185+
</listitem>
186+
</varlistentry>
187+
163188
<varlistentry>
164189
<term><replaceable>collation_version</replaceable></term>
165190

@@ -314,6 +339,13 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
314339
indexes that would be affected.
315340
</para>
316341

342+
<para>
343+
There is currently no option to use a database locale with nondeterministic
344+
comparisons (see <link linkend="sql-createcollation"><command>CREATE
345+
COLLATION</command></link> for an explanation). If this is needed, then
346+
per-column collations would need to be used.
347+
</para>
348+
317349
<para>
318350
The <literal>CONNECTION LIMIT</literal> option is only enforced approximately;
319351
if two new sessions start at about the same time when just one

doc/src/sgml/ref/createdb.sgml

+19
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,25 @@ PostgreSQL documentation
147147
</listitem>
148148
</varlistentry>
149149

150+
<varlistentry>
151+
<term><option>--icu-locale=<replaceable class="parameter">locale</replaceable></option></term>
152+
<listitem>
153+
<para>
154+
Specifies the ICU locale ID to be used in this database, if the
155+
ICU locale provider is selected.
156+
</para>
157+
</listitem>
158+
</varlistentry>
159+
160+
<varlistentry>
161+
<term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
162+
<listitem>
163+
<para>
164+
Specifies the locale provider for the database's default collation.
165+
</para>
166+
</listitem>
167+
</varlistentry>
168+
150169
<varlistentry>
151170
<term><option>-O <replaceable class="parameter">owner</replaceable></option></term>
152171
<term><option>--owner=<replaceable class="parameter">owner</replaceable></option></term>

doc/src/sgml/ref/initdb.sgml

+54-18
Original file line numberDiff line numberDiff line change
@@ -86,30 +86,45 @@ PostgreSQL documentation
8686
</para>
8787

8888
<para>
89-
<command>initdb</command> initializes the database cluster's default
90-
locale and character set encoding. The character set encoding,
91-
collation order (<literal>LC_COLLATE</literal>) and character set classes
92-
(<literal>LC_CTYPE</literal>, e.g., upper, lower, digit) can be set separately
93-
for a database when it is created. <command>initdb</command> determines
94-
those settings for the template databases, which will
95-
serve as the default for all other databases.
89+
<command>initdb</command> initializes the database cluster's default locale
90+
and character set encoding. These can also be set separately for each
91+
database when it is created. <command>initdb</command> determines those
92+
settings for the template databases, which will serve as the default for
93+
all other databases. By default, <command>initdb</command> uses the
94+
locale provider <literal>libc</literal>, takes the locale settings from
95+
the environment, and determines the encoding from the locale settings.
96+
This is almost always sufficient, unless there are special requirements.
9697
</para>
9798

9899
<para>
99-
To alter the default collation order or character set classes, use the
100-
<option>--lc-collate</option> and <option>--lc-ctype</option> options.
101-
Collation orders other than <literal>C</literal> or <literal>POSIX</literal> also have
102-
a performance penalty. For these reasons it is important to choose the
103-
right locale when running <command>initdb</command>.
100+
To choose a different locale for the cluster, use the option
101+
<option>--locale</option>. There are also individual options
102+
<option>--lc-*</option> (see below) to set values for the individual locale
103+
categories. Note that inconsistent settings for different locale
104+
categories can give nonsensical results, so this should be used with care.
104105
</para>
105106

106107
<para>
107-
The remaining locale categories can be changed later when the server
108-
is started. You can also use <option>--locale</option> to set the
109-
default for all locale categories, including collation order and
110-
character set classes. All server locale values (<literal>lc_*</literal>) can
111-
be displayed via <command>SHOW ALL</command>.
112-
More details can be found in <xref linkend="locale"/>.
108+
Alternatively, the ICU library can be used to provide locale services.
109+
(Again, this only sets the default for subsequently created databases.) To
110+
select this option, specify <literal>--locale-provider=icu</literal>.
111+
To chose the specific ICU locale ID to apply, use the option
112+
<option>--icu-locale</option>. Note that
113+
for implementation reasons and to support legacy code,
114+
<command>initdb</command> will still select and initialize libc locale
115+
settings when the ICU locale provider is used.
116+
</para>
117+
118+
<para>
119+
When <command>initdb</command> runs, it will print out the locale settings
120+
it has chosen. If you have complex requirements or specified multiple
121+
options, it is advisable to check that the result matches what was
122+
intended.
123+
</para>
124+
125+
<para>
126+
More details about locale settings can be found in <xref
127+
linkend="locale"/>.
113128
</para>
114129

115130
<para>
@@ -210,6 +225,15 @@ PostgreSQL documentation
210225
</listitem>
211226
</varlistentry>
212227

228+
<varlistentry>
229+
<term><option>--icu-locale=<replaceable>locale</replaceable></option></term>
230+
<listitem>
231+
<para>
232+
Specifies the ICU locale ID, if the ICU locale provider is used.
233+
</para>
234+
</listitem>
235+
</varlistentry>
236+
213237
<varlistentry id="app-initdb-data-checksums" xreflabel="data checksums">
214238
<term><option>-k</option></term>
215239
<term><option>--data-checksums</option></term>
@@ -264,6 +288,18 @@ PostgreSQL documentation
264288
</listitem>
265289
</varlistentry>
266290

291+
<varlistentry>
292+
<term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
293+
<listitem>
294+
<para>
295+
This option sets the locale provider for databases created in the
296+
new cluster. It can be overridden in the <command>CREATE
297+
DATABASE</command> command when new databases are subsequently
298+
created. The default is <literal>libc</literal>.
299+
</para>
300+
</listitem>
301+
</varlistentry>
302+
267303
<varlistentry>
268304
<term><option>-N</option></term>
269305
<term><option>--no-sync</option></term>

src/backend/catalog/pg_collation.c

+14-4
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ CollationCreate(const char *collname, Oid collnamespace,
4949
bool collisdeterministic,
5050
int32 collencoding,
5151
const char *collcollate, const char *collctype,
52+
const char *colliculocale,
5253
const char *collversion,
5354
bool if_not_exists,
5455
bool quiet)
@@ -66,8 +67,7 @@ CollationCreate(const char *collname, Oid collnamespace,
6667
AssertArg(collname);
6768
AssertArg(collnamespace);
6869
AssertArg(collowner);
69-
AssertArg(collcollate);
70-
AssertArg(collctype);
70+
AssertArg((collcollate && collctype) || colliculocale);
7171

7272
/*
7373
* Make sure there is no existing collation of same name & encoding.
@@ -161,8 +161,18 @@ CollationCreate(const char *collname, Oid collnamespace,
161161
values[Anum_pg_collation_collprovider - 1] = CharGetDatum(collprovider);
162162
values[Anum_pg_collation_collisdeterministic - 1] = BoolGetDatum(collisdeterministic);
163163
values[Anum_pg_collation_collencoding - 1] = Int32GetDatum(collencoding);
164-
values[Anum_pg_collation_collcollate - 1] = CStringGetTextDatum(collcollate);
165-
values[Anum_pg_collation_collctype - 1] = CStringGetTextDatum(collctype);
164+
if (collcollate)
165+
values[Anum_pg_collation_collcollate - 1] = CStringGetTextDatum(collcollate);
166+
else
167+
nulls[Anum_pg_collation_collcollate - 1] = true;
168+
if (collctype)
169+
values[Anum_pg_collation_collctype - 1] = CStringGetTextDatum(collctype);
170+
else
171+
nulls[Anum_pg_collation_collctype - 1] = true;
172+
if (colliculocale)
173+
values[Anum_pg_collation_colliculocale - 1] = CStringGetTextDatum(colliculocale);
174+
else
175+
nulls[Anum_pg_collation_colliculocale - 1] = true;
166176
if (collversion)
167177
values[Anum_pg_collation_collversion - 1] = CStringGetTextDatum(collversion);
168178
else

0 commit comments

Comments
 (0)