1
- <!-- $PostgreSQL: pgsql/doc/src/sgml/charset.sgml,v 2.95 2009/05/18 08:59:28 petere Exp $ -->
1
+ <!-- $PostgreSQL: pgsql/doc/src/sgml/charset.sgml,v 2.96 2010/02/03 17:25:05 momjian Exp $ -->
2
2
3
3
<chapter id="charset">
4
4
<title>Localization</>
5
5
6
6
<para>
7
7
This chapter describes the available localization features from the
8
8
point of view of the administrator.
9
- <productname>PostgreSQL</productname> supports localization with
10
- two approaches :
9
+ <productname>PostgreSQL</productname> supports two localization
10
+ facilities :
11
11
12
12
<itemizedlist>
13
13
<listitem>
@@ -67,10 +67,10 @@ initdb --locale=sv_SE
67
67
(<literal>sv</>) as spoken
68
68
in Sweden (<literal>SE</>). Other possibilities might be
69
69
<literal>en_US</> (U.S. English) and <literal>fr_CA</> (French
70
- Canadian). If more than one character set can be useful for a
70
+ Canadian). If more than one character set can be used for a
71
71
locale then the specifications look like this:
72
- <literal>cs_CZ.ISO8859-2</>. What locales are available under what
73
- names on your system depends on what was provided by the operating
72
+ <literal>cs_CZ.ISO8859-2</>. What locales are available on your
73
+ system under what names depends on what was provided by the operating
74
74
system vendor and what was installed. On most Unix systems, the command
75
75
<literal>locale -a</> will provide a list of available locales.
76
76
Windows uses more verbose locale names, such as <literal>German_Germany</>
@@ -80,8 +80,8 @@ initdb --locale=sv_SE
80
80
<para>
81
81
Occasionally it is useful to mix rules from several locales, e.g.,
82
82
use English collation rules but Spanish messages. To support that, a
83
- set of locale subcategories exist that control only a certain
84
- aspect of the localization rules:
83
+ set of locale subcategories exist that control only certain
84
+ aspects of the localization rules:
85
85
86
86
<informaltable>
87
87
<tgroup cols="2">
@@ -127,13 +127,13 @@ initdb --locale=sv_SE
127
127
</para>
128
128
129
129
<para>
130
- The nature of some locale categories is that their value has to be
130
+ Some locale categories must have their values
131
131
fixed when the database is created. You can use different settings
132
132
for different databases, but once a database is created, you cannot
133
133
change them for that database anymore. <literal>LC_COLLATE</literal>
134
- and <literal>LC_CTYPE</literal> are these categories. They affect
134
+ and <literal>LC_CTYPE</literal> are these type of categories. They affect
135
135
the sort order of indexes, so they must be kept fixed, or indexes on
136
- text columns will become corrupt. The default values for these
136
+ text columns would become corrupt. The default values for these
137
137
categories are determined when <command>initdb</command> is run, and
138
138
those values are used when new databases are created, unless
139
139
specified otherwise in the <command>CREATE DATABASE</command> command.
@@ -146,7 +146,7 @@ initdb --locale=sv_SE
146
146
linkend="runtime-config-client-format"> for details). The values
147
147
that are chosen by <command>initdb</command> are actually only written
148
148
into the configuration file <filename>postgresql.conf</filename> to
149
- serve as defaults when the server is started. If you delete these
149
+ serve as defaults when the server is started. If you disable these
150
150
assignments from <filename>postgresql.conf</filename> then the
151
151
server will inherit the settings from its execution environment.
152
152
</para>
@@ -178,7 +178,7 @@ initdb --locale=sv_SE
178
178
settings for the purpose of setting the language of messages. If
179
179
in doubt, please refer to the documentation of your operating
180
180
system, in particular the documentation about
181
- <application>gettext</>, for more information .
181
+ <application>gettext</>.
182
182
</para>
183
183
</note>
184
184
@@ -320,8 +320,9 @@ initdb --locale=sv_SE
320
320
321
321
<para>
322
322
An important restriction, however, is that each database's character set
323
- must be compatible with the database's <envar>LC_CTYPE</> and
324
- <envar>LC_COLLATE</> locale settings. For <literal>C</> or
323
+ must be compatible with the database's <envar>LC_CTYPE</> (character
324
+ classification) and <envar>LC_COLLATE</> (string sort order) locale
325
+ settings. For <literal>C</> or
325
326
<literal>POSIX</> locale, any character set is allowed, but for other
326
327
locales there is only one character set that will work correctly.
327
328
(On Windows, however, UTF-8 encoding can be used with any locale.)
@@ -543,7 +544,7 @@ initdb --locale=sv_SE
543
544
<entry>LATIN1 with Euro and accents</entry>
544
545
<entry>Yes</entry>
545
546
<entry>1</entry>
546
- <entry>ISO885915</entry>
547
+ <entry><literal> ISO885915</> </entry>
547
548
</row>
548
549
<row>
549
550
<entry><literal>LATIN10</literal></entry>
@@ -694,7 +695,7 @@ initdb --locale=sv_SE
694
695
</table>
695
696
696
697
<para>
697
- Not all <acronym>API</>s support all the listed character sets. For example, the
698
+ Not all client <acronym>API</>s support all the listed character sets. For example, the
698
699
<productname>PostgreSQL</>
699
700
JDBC driver does not support <literal>MULE_INTERNAL</>, <literal>LATIN6</>,
700
701
<literal>LATIN8</>, and <literal>LATIN10</>.
@@ -710,7 +711,7 @@ initdb --locale=sv_SE
710
711
much a declaration that a specific encoding is in use, as a declaration
711
712
of ignorance about the encoding. In most cases, if you are
712
713
working with any non-ASCII data, it is unwise to use the
713
- <literal>SQL_ASCII</> setting, because
714
+ <literal>SQL_ASCII</> setting because
714
715
<productname>PostgreSQL</productname> will be unable to help you by
715
716
converting or validating non-ASCII characters.
716
717
</para>
@@ -720,17 +721,17 @@ initdb --locale=sv_SE
720
721
<title>Setting the Character Set</title>
721
722
722
723
<para>
723
- <command>initdb</> defines the default character set
724
+ <command>initdb</> defines the default character set (encoding)
724
725
for a <productname>PostgreSQL</productname> cluster. For example,
725
726
726
727
<screen>
727
728
initdb -E EUC_JP
728
729
</screen>
729
730
730
- sets the default character set (encoding) to
731
+ sets the default character set to
731
732
<literal>EUC_JP</literal> (Extended Unix Code for Japanese). You
732
733
can use <option>--encoding</option> instead of
733
- <option>-E</option> if you prefer to type longer option strings.
734
+ <option>-E</option> if you prefer longer option strings.
734
735
If no <option>-E</> or <option>--encoding</option> option is
735
736
given, <command>initdb</> attempts to determine the appropriate
736
737
encoding to use based on the specified or default locale.
@@ -762,8 +763,8 @@ CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' LC_CTYPE=
762
763
<para>
763
764
The encoding for a database is stored in the system catalog
764
765
<literal>pg_database</literal>. You can see it by using the
765
- <option>-l</option> option or the <command>\l</command> command
766
- of <command>psql </command>.
766
+ <command>psql</command> < option>-l</option> option or the
767
+ <command>\l </command> command .
767
768
768
769
<screen>
769
770
$ <userinput>psql -l</userinput>
@@ -784,11 +785,11 @@ $ <userinput>psql -l</userinput>
784
785
<important>
785
786
<para>
786
787
On most modern operating systems, <productname>PostgreSQL</productname>
787
- can determine which character set is implied by an <envar>LC_CTYPE</>
788
+ can determine which character set is implied by the <envar>LC_CTYPE</>
788
789
setting, and it will enforce that only the matching database encoding is
789
790
used. On older systems it is your responsibility to ensure that you use
790
791
the encoding expected by the locale you have selected. A mistake in
791
- this area is likely to lead to strange misbehavior of locale-dependent
792
+ this area is likely to lead to strange behavior of locale-dependent
792
793
operations such as sorting.
793
794
</para>
794
795
@@ -1190,9 +1191,9 @@ RESET client_encoding;
1190
1191
<para>
1191
1192
If the conversion of a particular character is not possible
1192
1193
— suppose you chose <literal>EUC_JP</literal> for the
1193
- server and <literal>LATIN1</literal> for the client, then some
1194
- Japanese characters do not have a representation in
1195
- <literal>LATIN1</literal> — then an error is reported.
1194
+ server and <literal>LATIN1</literal> for the client, and some
1195
+ Japanese characters are returned that do not have a representation in
1196
+ <literal>LATIN1</literal> — an error is reported.
1196
1197
</para>
1197
1198
1198
1199
<para>
@@ -1249,7 +1250,8 @@ RESET client_encoding;
1249
1250
1250
1251
<listitem>
1251
1252
<para>
1252
- <acronym>UTF</acronym>-8 is defined here.
1253
+ <acronym>UTF</acronym>-8 (8-bit UCS/Unicode Transformation
1254
+ Format) is defined here.
1253
1255
</para>
1254
1256
</listitem>
1255
1257
</varlistentry>
0 commit comments