@@ -377,10 +377,13 @@ initdb --locale-provider=icu --icu-locale=en
377
377
variants and customization options.
378
378
</para>
379
379
</sect2>
380
+
380
381
<sect2 id="icu-locales">
381
382
<title>ICU Locales</title>
383
+
382
384
<sect3 id="icu-locale-names">
383
385
<title>ICU Locale Names</title>
386
+
384
387
<para>
385
388
The ICU format for the locale name is a <link
386
389
linkend="icu-language-tag">Language Tag</link>.
@@ -412,16 +415,19 @@ NOTICE: using standard form "de-DE" for locale "de_DE.utf8"
412
415
linkend="icu-language-tag">language tag</link> instead of relying on the
413
416
transformation.
414
417
</para>
418
+
415
419
<para>
416
420
A locale with no language name, or the special language name
417
421
<literal>root</literal>, is transformed to have the language
418
422
<literal>und</literal> ("undefined").
419
423
</para>
424
+
420
425
<para>
421
426
ICU can transform most libc locale names, as well as some other formats,
422
427
into language tags for easier transition to ICU. If a libc locale name is
423
428
used in ICU, it may not have precisely the same behavior as in libc.
424
429
</para>
430
+
425
431
<para>
426
432
If there is a problem interpreting the locale name, or if the locale name
427
433
represents a language or region that ICU does not recognize, you will see
@@ -442,10 +448,12 @@ CREATE COLLATION
442
448
443
449
<sect3 id="icu-language-tag">
444
450
<title>Language Tag</title>
451
+
445
452
<para>
446
453
A language tag, defined in BCP 47, is a standardized identifier used to
447
454
identify languages, regions, and other information about a locale.
448
455
</para>
456
+
449
457
<para>
450
458
Basic language tags are simply
451
459
<replaceable>language</replaceable><literal>-</literal><replaceable>region</replaceable>;
@@ -457,13 +465,15 @@ CREATE COLLATION
457
465
<literal>ja-JP</literal>, <literal>de</literal>, or
458
466
<literal>fr-CA</literal>.
459
467
</para>
468
+
460
469
<para>
461
470
Collation settings may be included in the language tag to customize
462
471
collation behavior. ICU allows extensive customization, such as
463
472
sensitivity (or insensitivity) to accents, case, and punctuation;
464
473
treatment of digits within text; and many other options to satisfy a
465
474
variety of uses.
466
475
</para>
476
+
467
477
<para>
468
478
To include this additional collation information in a language tag,
469
479
append <literal>-u</literal>, which indicates there are additional
@@ -477,6 +487,7 @@ CREATE COLLATION
477
487
<literal>-</literal><replaceable>value</replaceable>, which implies a
478
488
value of <literal>true</literal>.
479
489
</para>
490
+
480
491
<para>
481
492
For example, the language tag <literal>en-US-u-kn-ks-level2</literal>
482
493
means the locale with the English language in the US region, with
@@ -500,13 +511,15 @@ SELECT 'N-45' < 'N-123' COLLATE mycollation5 as result;
500
511
(1 row)
501
512
</screen>
502
513
</para>
514
+
503
515
<para>
504
516
See <xref linkend="icu-custom-collations"/> for details and additional
505
517
examples of using language tags with custom collation information for the
506
518
locale.
507
519
</para>
508
520
</sect3>
509
521
</sect2>
522
+
510
523
<sect2 id="locale-problems">
511
524
<title>Problems</title>
512
525
@@ -1100,6 +1113,7 @@ CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-tr
1100
1113
</tip>
1101
1114
</sect3>
1102
1115
</sect2>
1116
+
1103
1117
<sect2 id="icu-custom-collations">
1104
1118
<title>ICU Custom Collations</title>
1105
1119
@@ -1129,23 +1143,26 @@ SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
1129
1143
linkend="icu-collation-settings"/>, or see <xref
1130
1144
linkend="icu-external-references"/> for more details.
1131
1145
</para>
1146
+
1132
1147
<sect3 id="icu-collation-comparison-levels">
1133
1148
<title>ICU Comparison Levels</title>
1149
+
1134
1150
<para>
1135
1151
Comparison of two strings (collation) in ICU is determined by a
1136
1152
multi-level process, where textual features are grouped into
1137
1153
"levels". Treatment of each level is controlled by the <link
1138
1154
linkend="icu-collation-settings-table">collation settings</link>. Higher
1139
1155
levels correspond to finer textual features.
1140
1156
</para>
1157
+
1141
1158
<para>
1142
1159
<xref linkend="icu-collation-levels"/> shows which textual feature
1143
1160
differences are considered significant when determining equality at the
1144
1161
given level. The unicode character <literal>U+2063</literal> is an
1145
1162
invisible separator, and as seen in the table, is ignored for at all
1146
1163
levels of comparison less than <literal>identic</literal>.
1147
1164
</para>
1148
- <para>
1165
+
1149
1166
<table id="icu-collation-levels">
1150
1167
<title>ICU Collation Levels</title>
1151
1168
<tgroup cols="8">
@@ -1157,6 +1174,7 @@ SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
1157
1174
<colspec colname="col6" colwidth="1*"/>
1158
1175
<colspec colname="col7" colwidth="1*"/>
1159
1176
<colspec colname="col8" colwidth="1*"/>
1177
+
1160
1178
<thead>
1161
1179
<row>
1162
1180
<entry>Level</entry>
@@ -1169,6 +1187,7 @@ SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
1169
1187
<entry><literal>'y' = 'z'</literal></entry>
1170
1188
</row>
1171
1189
</thead>
1190
+
1172
1191
<tbody>
1173
1192
<row>
1174
1193
<entry>level1</entry>
@@ -1224,6 +1243,7 @@ SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
1224
1243
</tgroup>
1225
1244
</table>
1226
1245
1246
+ <para>
1227
1247
At every level, even with full normalization off, basic normalization is
1228
1248
performed. For example, <literal>'á'</literal> may be composed of the
1229
1249
code points <literal>U&'\0061\0301'</literal> or the single code
@@ -1233,9 +1253,9 @@ SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
1233
1253
created with <symbol>deterministic</symbol> set to
1234
1254
<literal>true</literal>.
1235
1255
</para>
1256
+
1236
1257
<sect4 id="icu-collation-level-examples">
1237
1258
<title>Collation Level Examples</title>
1238
- <para>
1239
1259
1240
1260
<programlisting>
1241
1261
CREATE COLLATION level3 (provider = icu, deterministic = false, locale = 'und-u-ka-shifted-ks-level3');
@@ -1251,25 +1271,26 @@ SELECT 'x-y' = 'x_y' COLLATE level3; -- true
1251
1271
SELECT 'x-y' = 'x_y' COLLATE level4; -- false
1252
1272
</programlisting>
1253
1273
1254
- </para>
1255
1274
</sect4>
1256
1275
</sect3>
1257
1276
1258
1277
<sect3 id="icu-collation-settings">
1259
1278
<title>Collation Settings for an ICU Locale</title>
1279
+
1260
1280
<para>
1261
1281
<xref linkend="icu-collation-settings-table"/> shows the available
1262
1282
collation settings, which can be used as part of a language tag to
1263
1283
customize a collation.
1264
1284
</para>
1265
- <para>
1285
+
1266
1286
<table id="icu-collation-settings-table">
1267
1287
<title>ICU Collation Settings</title>
1268
1288
<tgroup cols="4">
1269
1289
<colspec colname="col1" colwidth="1*"/>
1270
1290
<colspec colname="col2" colwidth="2*"/>
1271
1291
<colspec colname="col3" colwidth="2*"/>
1272
1292
<colspec colname="col4" colwidth="5*"/>
1293
+
1273
1294
<thead>
1274
1295
<row>
1275
1296
<entry>Key</entry>
@@ -1278,6 +1299,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
1278
1299
<entry>Description</entry>
1279
1300
</row>
1280
1301
</thead>
1302
+
1281
1303
<tbody>
1282
1304
<row>
1283
1305
<entry><literal>co</literal></entry>
@@ -1287,6 +1309,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
1287
1309
Collation type. See <xref linkend="icu-external-references"/> for additional options and details.
1288
1310
</entry>
1289
1311
</row>
1312
+
1290
1313
<row>
1291
1314
<entry><literal>ka</literal></entry>
1292
1315
<entry><literal>noignore</literal>, <literal>shifted</literal></entry>
@@ -1299,6 +1322,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
1299
1322
character classes are ignored.
1300
1323
</entry>
1301
1324
</row>
1325
+
1302
1326
<row>
1303
1327
<entry><literal>kb</literal></entry>
1304
1328
<entry><literal>true</literal>, <literal>false</literal></entry>
@@ -1309,6 +1333,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
1309
1333
before <literal>'aé'</literal>.
1310
1334
</entry>
1311
1335
</row>
1336
+
1312
1337
<row>
1313
1338
<entry><literal>kc</literal></entry>
1314
1339
<entry><literal>true</literal>, <literal>false</literal></entry>
@@ -1325,6 +1350,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
1325
1350
</para>
1326
1351
</entry>
1327
1352
</row>
1353
+
1328
1354
<row>
1329
1355
<entry><literal>kf</literal></entry>
1330
1356
<entry>
@@ -1339,6 +1365,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
1339
1365
the rules of the locale.
1340
1366
</entry>
1341
1367
</row>
1368
+
1342
1369
<row>
1343
1370
<entry><literal>kn</literal></entry>
1344
1371
<entry><literal>true</literal>, <literal>false</literal></entry>
@@ -1350,6 +1377,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
1350
1377
<literal>'id-123'</literal>.
1351
1378
</entry>
1352
1379
</row>
1380
+
1353
1381
<row>
1354
1382
<entry><literal>kk</literal></entry>
1355
1383
<entry><literal>true</literal>, <literal>false</literal></entry>
@@ -1373,6 +1401,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
1373
1401
</para>
1374
1402
</entry>
1375
1403
</row>
1404
+
1376
1405
<row>
1377
1406
<entry><literal>kr</literal></entry>
1378
1407
<entry>
@@ -1398,6 +1427,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
1398
1427
</para>
1399
1428
</entry>
1400
1429
</row>
1430
+
1401
1431
<row>
1402
1432
<entry><literal>ks</literal></entry>
1403
1433
<entry><literal>level1</literal>, <literal>level2</literal>, <literal>level3</literal>, <literal>level4</literal>, <literal>identic</literal></entry>
@@ -1409,6 +1439,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
1409
1439
<xref linkend="icu-collation-levels"/> for details.
1410
1440
</entry>
1411
1441
</row>
1442
+
1412
1443
<row>
1413
1444
<entry><literal>kv</literal></entry>
1414
1445
<entry>
@@ -1429,10 +1460,13 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
1429
1460
</tbody>
1430
1461
</tgroup>
1431
1462
</table>
1432
- Defaults may depend on locale. The above table is not meant to be
1433
- complete. See <xref linkend="icu-external-references"/> for additional
1434
- options and details.
1463
+
1464
+ <para>
1465
+ Defaults may depend on locale. The above table is not meant to be
1466
+ complete. See <xref linkend="icu-external-references"/> for additional
1467
+ options and details.
1435
1468
</para>
1469
+
1436
1470
<note>
1437
1471
<para>
1438
1472
For many collation settings, you must create the collation with
@@ -1448,7 +1482,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
1448
1482
1449
1483
<sect3 id="icu-locale-examples">
1450
1484
<title>Examples</title>
1451
- <para>
1485
+
1452
1486
<variablelist>
1453
1487
<varlistentry id="collation-managing-create-icu-de-u-co-phonebk-x-icu">
1454
1488
<term><literal>CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk');</literal></term>
@@ -1494,22 +1528,21 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
1494
1528
</listitem>
1495
1529
</varlistentry>
1496
1530
</variablelist>
1497
- </para>
1498
1531
</sect3>
1499
1532
1500
1533
<sect3 id="icu-external-references">
1501
1534
<title>External References for ICU</title>
1535
+
1502
1536
<para>
1503
1537
This section (<xref linkend="icu-custom-collations"/>) is only a brief
1504
1538
overview of ICU behavior and language tags. Refer to the following
1505
1539
documents for technical details, additional options, and new behavior:
1506
1540
</para>
1541
+
1507
1542
<itemizedlist>
1508
1543
<listitem>
1509
1544
<para>
1510
- <ulink
1511
- url="https://www.unicode.org/reports/tr35/tr35-collation.html">Unicode
1512
- Technical Standard #35</ulink>
1545
+ <ulink url="https://www.unicode.org/reports/tr35/tr35-collation.html">Unicode Technical Standard #35</ulink>
1513
1546
</para>
1514
1547
</listitem>
1515
1548
<listitem>
@@ -1519,8 +1552,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
1519
1552
</listitem>
1520
1553
<listitem>
1521
1554
<para>
1522
- <ulink url="https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml">CLDR
1523
- repository</ulink>
1555
+ <ulink url="https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml">CLDR repository</ulink>
1524
1556
</para>
1525
1557
</listitem>
1526
1558
<listitem>
0 commit comments