Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 61d9674

Browse files
committed
Make LC_COLLATE and LC_CTYPE database-level settings. Collation and
ctype are now more like encoding, stored in new datcollate and datctype columns in pg_database. This is a stripped-down version of Radek Strnad's patch, with further changes by me.
1 parent c52aab5 commit 61d9674

30 files changed

+440
-248
lines changed

doc/src/sgml/catalogs.sgml

+15-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/catalogs.sgml,v 2.175 2008/09/19 19:03:40 tgl Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/catalogs.sgml,v 2.176 2008/09/23 09:20:33 heikki Exp $ -->
22
<!--
33
Documentation of the system catalogs, directed toward PostgreSQL developers
44
-->
@@ -2149,6 +2149,20 @@
21492149
this number to the encoding name)</entry>
21502150
</row>
21512151

2152+
<row>
2153+
<entry><structfield>datcollate</structfield></entry>
2154+
<entry><type>name</type></entry>
2155+
<entry></entry>
2156+
<entry>LC_COLLATE for this database</entry>
2157+
</row>
2158+
2159+
<row>
2160+
<entry><structfield>datctype</structfield></entry>
2161+
<entry><type>name</type></entry>
2162+
<entry></entry>
2163+
<entry>LC_CTYPE for this database</entry>
2164+
</row>
2165+
21522166
<row>
21532167
<entry><structfield>datistemplate</structfield></entry>
21542168
<entry><type>bool</type></entry>

doc/src/sgml/charset.sgml

+32-41
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/charset.sgml,v 2.87 2008/07/15 17:45:03 momjian Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/charset.sgml,v 2.88 2008/09/23 09:20:34 heikki Exp $ -->
22

33
<chapter id="charset">
44
<title>Localization</>
@@ -130,23 +130,23 @@ initdb --locale=sv_SE
130130

131131
<para>
132132
The nature of some locale categories is that their value has to be
133-
fixed for the lifetime of a database cluster. That is, once
134-
<command>initdb</command> has run, you cannot change them anymore.
135-
<literal>LC_COLLATE</literal> and <literal>LC_CTYPE</literal> are
136-
those categories. They affect the sort order of indexes, so they
137-
must be kept fixed, or indexes on text columns will become corrupt.
138-
<productname>PostgreSQL</productname> enforces this by recording
139-
the values of <envar>LC_COLLATE</> and <envar>LC_CTYPE</> that are
140-
seen by <command>initdb</>. The server automatically adopts
141-
those two values when it is started.
133+
fixed when the database is created. You can use different settings
134+
for different databases, but once a database is created, you cannot
135+
change them for that database anymore. <literal>LC_COLLATE</literal>
136+
and <literal>LC_CTYPE</literal> are those categories. They affect
137+
the sort order of indexes, so they must be kept fixed, or indexes on
138+
text columns will become corrupt. The default values for these
139+
categories are defined when <command>initdb</command> is run, and
140+
those values are used when new databases are created, unless
141+
specified otherwise in the <command>CREATE DATABASE</command> command.
142142
</para>
143143

144144
<para>
145145
The other locale categories can be changed as desired whenever the
146146
server is running by setting the run-time configuration variables
147147
that have the same name as the locale categories (see <xref
148-
linkend="runtime-config-client-format"> for details). The defaults that are
149-
chosen by <command>initdb</command> are actually only written into
148+
linkend="runtime-config-client-format"> for details). The defaults
149+
that are chosen by <command>initdb</command> are actually only written into
150150
the configuration file <filename>postgresql.conf</filename> to
151151
serve as defaults when the server is started. If you delete these
152152
assignments from <filename>postgresql.conf</filename> then the
@@ -261,7 +261,7 @@ initdb --locale=sv_SE
261261

262262
<para>
263263
Check that <productname>PostgreSQL</> is actually using the locale
264-
that you think it is. <envar>LC_COLLATE</> and <envar>LC_CTYPE</>
264+
that you think it is. The default <envar>LC_COLLATE</> and <envar>LC_CTYPE</>
265265
settings are determined at <command>initdb</> time and cannot be
266266
changed without repeating <command>initdb</>. Other locale
267267
settings including <envar>LC_MESSAGES</> and <envar>LC_MONETARY</>
@@ -319,17 +319,11 @@ initdb --locale=sv_SE
319319
</para>
320320

321321
<para>
322-
An important restriction, however, is that each database character set
323-
must be compatible with the server's <envar>LC_CTYPE</> setting.
322+
An important restriction, however, is that each database's character set
323+
must be compatible with the database's <envar>LC_CTYPE</> setting.
324324
When <envar>LC_CTYPE</> is <literal>C</> or <literal>POSIX</>, any
325325
character set is allowed, but for other settings of <envar>LC_CTYPE</>
326326
there is only one character set that will work correctly.
327-
Since the <envar>LC_CTYPE</> setting is frozen by <command>initdb</>, the
328-
apparent flexibility to use different encodings in different databases
329-
of a cluster is more theoretical than real, except when you select
330-
<literal>C</> or <literal>POSIX</> locale (thus disabling any real locale
331-
awareness). It is likely that these mechanisms will be revisited in future
332-
versions of <productname>PostgreSQL</productname>.
333327
</para>
334328

335329
<sect2 id="multibyte-charset-supported">
@@ -734,19 +728,19 @@ initdb -E EUC_JP
734728
</para>
735729

736730
<para>
737-
If you have selected <literal>C</> or <literal>POSIX</> locale,
738-
you can create a database with a different character set:
731+
You can specify a non-default encoding at database creation time,
732+
provided that the encoding is compatible with the selected locale:
739733

740734
<screen>
741-
createdb -E EUC_KR korean
735+
createdb -E EUC_KR -T template0 --lc-collate=ko_KR.euckr --lc-ctype=ko_KR.euckr korean
742736
</screen>
743737

744738
This will create a database named <literal>korean</literal> that
745-
uses the character set <literal>EUC_KR</literal>. Another way to
746-
accomplish this is to use this SQL command:
739+
uses the character set <literal>EUC_KR</literal>, and locale <literal>ko_KR</literal>.
740+
Another way to accomplish this is to use this SQL command:
747741

748742
<programlisting>
749-
CREATE DATABASE korean WITH ENCODING 'EUC_KR';
743+
CREATE DATABASE korean WITH ENCODING 'EUC_KR' COLLATE='ko_KR.euckr' CTYPE='ko_KR.euckr' TEMPLATE=template0;
750744
</programlisting>
751745

752746
The encoding for a database is stored in the system catalog
@@ -756,20 +750,17 @@ CREATE DATABASE korean WITH ENCODING 'EUC_KR';
756750

757751
<screen>
758752
$ <userinput>psql -l</userinput>
759-
List of databases
760-
Database | Owner | Encoding
761-
---------------+---------+---------------
762-
euc_cn | t-ishii | EUC_CN
763-
euc_jp | t-ishii | EUC_JP
764-
euc_kr | t-ishii | EUC_KR
765-
euc_tw | t-ishii | EUC_TW
766-
mule_internal | t-ishii | MULE_INTERNAL
767-
postgres | t-ishii | EUC_JP
768-
regression | t-ishii | SQL_ASCII
769-
template1 | t-ishii | EUC_JP
770-
test | t-ishii | EUC_JP
771-
utf8 | t-ishii | UTF8
772-
(9 rows)
753+
List of databases
754+
Name | Owner | Encoding | Collation | Ctype | Access Privileges
755+
-----------+----------+-----------+-------------+-------------+-------------------------------------
756+
clocaledb | hlinnaka | SQL_ASCII | C | C |
757+
englishdb | hlinnaka | UTF8 | en_GB.UTF8 | en_GB.UTF8 |
758+
japanese | hlinnaka | UTF8 | ja_JP.UTF8 | ja_JP.UTF8 |
759+
korean | hlinnaka | EUC_KR | ko_KR.euckr | ko_KR.euckr |
760+
postgres | hlinnaka | UTF8 | fi_FI.UTF8 | fi_FI.UTF8 |
761+
template0 | hlinnaka | UTF8 | fi_FI.UTF8 | fi_FI.UTF8 | {=c/hlinnaka,hlinnaka=CTc/hlinnaka}
762+
template1 | hlinnaka | UTF8 | fi_FI.UTF8 | fi_FI.UTF8 | {=c/hlinnaka,hlinnaka=CTc/hlinnaka}
763+
(7 rows)
773764
</screen>
774765
</para>
775766

doc/src/sgml/indices.sgml

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/indices.sgml,v 1.74 2008/07/11 21:06:28 tgl Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/indices.sgml,v 1.75 2008/09/23 09:20:34 heikki Exp $ -->
22

33
<chapter id="indexes">
44
<title id="indexes-title">Indexes</title>
@@ -157,7 +157,7 @@ CREATE INDEX test1_id_index ON test1 (id);
157157
<emphasis>if</emphasis> the pattern is a constant and is anchored to
158158
the beginning of the string &mdash; for example, <literal>col LIKE
159159
'foo%'</literal> or <literal>col ~ '^foo'</literal>, but not
160-
<literal>col LIKE '%bar'</literal>. However, if your server does not
160+
<literal>col LIKE '%bar'</literal>. However, if your database does not
161161
use the C locale you will need to create the index with a special
162162
operator class to support indexing of pattern-matching queries. See
163163
<xref linkend="indexes-opclass"> below. It is also possible to use
@@ -922,7 +922,7 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable>
922922
according to the locale-specific collation rules. This makes
923923
these operator classes suitable for use by queries involving
924924
pattern matching expressions (<literal>LIKE</literal> or POSIX
925-
regular expressions) when the server does not use the standard
925+
regular expressions) when the database does not use the standard
926926
<quote>C</quote> locale. As an example, you might index a
927927
<type>varchar</type> column like this:
928928
<programlisting>

doc/src/sgml/ref/create_database.sgml

+39-6
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$PostgreSQL: pgsql/doc/src/sgml/ref/create_database.sgml,v 1.48 2007/09/28 22:25:49 tgl Exp $
2+
$PostgreSQL: pgsql/doc/src/sgml/ref/create_database.sgml,v 1.49 2008/09/23 09:20:34 heikki Exp $
33
PostgreSQL documentation
44
-->
55

@@ -24,6 +24,8 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable>
2424
[ [ WITH ] [ OWNER [=] <replaceable class="parameter">dbowner</replaceable> ]
2525
[ TEMPLATE [=] <replaceable class="parameter">template</replaceable> ]
2626
[ ENCODING [=] <replaceable class="parameter">encoding</replaceable> ]
27+
[ COLLATE [=] <replaceable class="parameter">collate</replaceable> ]
28+
[ CTYPE [=] <replaceable class="parameter">ctype</replaceable> ]
2729
[ TABLESPACE [=] <replaceable class="parameter">tablespace</replaceable> ]
2830
[ CONNECTION LIMIT [=] <replaceable class="parameter">connlimit</replaceable> ] ]
2931
</synopsis>
@@ -112,6 +114,29 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable>
112114
</para>
113115
</listitem>
114116
</varlistentry>
117+
<varlistentry>
118+
<term><replaceable class="parameter">collate</replaceable></term>
119+
<listitem>
120+
<para>
121+
Collation order (<literal>LC_COLLATE</>) to use in the new database.
122+
This affects the sort order applied to strings, e.g in queries with
123+
ORDER BY, as well as the order used in indexes on text columns.
124+
The default is to use the collation order of the template database.
125+
See below for additional restrictions.
126+
</para>
127+
</listitem>
128+
</varlistentry>
129+
<varlistentry>
130+
<term><replaceable class="parameter">ctype</replaceable></term>
131+
<listitem>
132+
<para>
133+
Character classification (<literal>LC_CTYPE</>) to use in the new
134+
database. This affects the categorization of characters, e.g. lower,
135+
upper and digit. The default is to use the character classification of
136+
the template database. See below for additional restrictions.
137+
</para>
138+
</listitem>
139+
</varlistentry>
115140
<varlistentry>
116141
<term><replaceable class="parameter">tablespace</replaceable></term>
117142
<listitem>
@@ -180,20 +205,28 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable>
180205
</para>
181206

182207
<para>
183-
Any character set encoding specified for the new database must be
184-
compatible with the server's <envar>LC_CTYPE</> locale setting.
208+
The character set encoding specified for the new database must be
209+
compatible with the chosen COLLATE and CTYPE settings.
185210
If <envar>LC_CTYPE</> is <literal>C</> (or equivalently
186211
<literal>POSIX</>), then all encodings are allowed, but for other
187-
locale settings there is only one encoding that will work properly,
188-
and so the apparent freedom to specify an encoding is illusory if
189-
you didn't initialize the database cluster in <literal>C</> locale.
212+
locale settings there is only one encoding that will work properly.
190213
<command>CREATE DATABASE</> will allow superusers to specify
191214
<literal>SQL_ASCII</> encoding regardless of the locale setting,
192215
but this choice is deprecated and may result in misbehavior of
193216
character-string functions if data that is not encoding-compatible
194217
with the locale is stored in the database.
195218
</para>
196219

220+
<para>
221+
The <literal>COLLATE</> and <literal>CTYPE</> settings must match
222+
those of the template database, except when template0 is used as
223+
template. This is because <literal>COLLATE</> and <literal>CTYPE</>
224+
affects the ordering in indexes, so that any indexes copied from the
225+
template database would be invalid in the new database with different
226+
settings. <literal>template0</literal>, however, is known to not
227+
contain any indexes that would be affected.
228+
</para>
229+
197230
<para>
198231
The <literal>CONNECTION LIMIT</> option is only enforced approximately;
199232
if two new sessions start at about the same time when just one

doc/src/sgml/ref/initdb.sgml

+25-16
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$PostgreSQL: pgsql/doc/src/sgml/ref/initdb.sgml,v 1.43 2007/03/26 17:23:36 tgl Exp $
2+
$PostgreSQL: pgsql/doc/src/sgml/ref/initdb.sgml,v 1.44 2008/09/23 09:20:34 heikki Exp $
33
PostgreSQL documentation
44
-->
55

@@ -76,25 +76,34 @@ PostgreSQL documentation
7676

7777
<para>
7878
<command>initdb</command> initializes the database cluster's default
79-
locale and character set encoding. The collation order
80-
(<literal>LC_COLLATE</>) and character set classes
81-
(<literal>LC_CTYPE</>, e.g. upper, lower, digit) are fixed for all
82-
databases and cannot be changed. Collation orders other than
83-
<literal>C</> or <literal>POSIX</> also have a performance penalty.
84-
For these reasons it is important to choose the right locale when
85-
running <command>initdb</command>. The remaining locale categories
86-
can be changed later when the server is started. All server locale
87-
values (<literal>lc_*</>) can be displayed via <command>SHOW ALL</>.
79+
locale and character set encoding. The character set encoding,
80+
collation order (<literal>LC_COLLATE</>) and character set classes
81+
(<literal>LC_CTYPE</>, e.g. upper, lower, digit) can be set separately
82+
for a database when it is created. <command>initdb</command> determines
83+
those settings for the <literal>template1</literal> database, which will
84+
serve as the default for all other databases.
85+
</para>
86+
87+
<para>
88+
To alter the default collation order or character set classes, use the
89+
<option>--lc-collate</option> and <option>--lc-ctype</option> options.
90+
Collation orders other than <literal>C</> or <literal>POSIX</> also have
91+
a performance penalty. For these reasons it is important to choose the
92+
right locale when running <command>initdb</command>.
93+
</para>
94+
95+
<para>
96+
The remaining locale categories can be changed later when the server
97+
is started. You can also use <option>--locale</option> to set the
98+
default for all locale categories, including collation order and
99+
character set classes. All server locale values (<literal>lc_*</>) can
100+
be displayed via <command>SHOW ALL</>.
88101
More details can be found in <xref linkend="locale">.
89102
</para>
90103

91104
<para>
92-
The character set encoding can be set separately for a database when
93-
it is created. <command>initdb</command> determines the encoding for
94-
the <literal>template1</literal> database, which will serve as the
95-
default for all other databases. To alter the default encoding use
96-
the <option>--encoding</option> option. More details can be found in
97-
<xref linkend="multibyte">.
105+
To alter the default encoding, use the <option>--encoding</option>.
106+
More details can be found in <xref linkend="multibyte">.
98107
</para>
99108

100109
</refsect1>

doc/src/sgml/ref/pg_controldata.sgml

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$PostgreSQL: pgsql/doc/src/sgml/ref/pg_controldata.sgml,v 1.10 2007/02/20 18:10:58 momjian Exp $
2+
$PostgreSQL: pgsql/doc/src/sgml/ref/pg_controldata.sgml,v 1.11 2008/09/23 09:20:35 heikki Exp $
33
PostgreSQL documentation
44
-->
55

@@ -30,7 +30,7 @@ PostgreSQL documentation
3030
<title>Description</title>
3131
<para>
3232
<command>pg_controldata</command> prints information initialized during
33-
<command>initdb</>, such as the catalog version and server locale.
33+
<command>initdb</>, such as the catalog version.
3434
It also shows information about write-ahead logging and checkpoint
3535
processing. This information is cluster-wide, and not specific to any one
3636
database.

doc/src/sgml/ref/pg_resetxlog.sgml

+5-9
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$PostgreSQL: pgsql/doc/src/sgml/ref/pg_resetxlog.sgml,v 1.20 2007/01/31 23:26:04 momjian Exp $
2+
$PostgreSQL: pgsql/doc/src/sgml/ref/pg_resetxlog.sgml,v 1.21 2008/09/23 09:20:35 heikki Exp $
33
PostgreSQL documentation
44
-->
55

@@ -62,14 +62,10 @@ PostgreSQL documentation
6262
by specifying the <literal>-f</> (force) switch. In this case plausible
6363
values will be substituted for the missing data. Most of the fields can be
6464
expected to match, but manual assistance might be needed for the next OID,
65-
next transaction ID and epoch, next multitransaction ID and offset,
66-
WAL starting address, and database locale fields.
67-
The first six of these can be set using the switches discussed below.
68-
<command>pg_resetxlog</command>'s own environment is the source for its
69-
guess at the locale fields; take care that <envar>LANG</> and so forth
70-
match the environment that <command>initdb</> was run in.
71-
If you are not able to determine correct values for all these fields,
72-
<literal>-f</> can still be used, but
65+
next transaction ID and epoch, next multitransaction ID and offset, and
66+
WAL starting address fields. These fields can be set using the switches
67+
discussed below. If you are not able to determine correct values for all
68+
these fields, <literal>-f</> can still be used, but
7369
the recovered database must be treated with even more suspicion than
7470
usual: an immediate dump and reload is imperative. <emphasis>Do not</>
7571
execute any data-modifying operations in the database before you dump,

doc/src/sgml/ref/select.sgml

+2-3
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$PostgreSQL: pgsql/doc/src/sgml/ref/select.sgml,v 1.103 2008/02/15 22:17:06 tgl Exp $
2+
$PostgreSQL: pgsql/doc/src/sgml/ref/select.sgml,v 1.104 2008/09/23 09:20:35 heikki Exp $
33
PostgreSQL documentation
44
-->
55

@@ -747,8 +747,7 @@ SELECT name FROM distributors ORDER BY code;
747747

748748
<para>
749749
Character-string data is sorted according to the locale-specific
750-
collation order that was established when the database cluster
751-
was initialized.
750+
collation order that was established when the database was created.
752751
</para>
753752
</refsect2>
754753

0 commit comments

Comments
 (0)