diff options
author | Marc G. Fournier | 1998-07-24 03:32:46 +0000 |
---|---|---|
committer | Marc G. Fournier | 1998-07-24 03:32:46 +0000 |
commit | bf00bbb0c4940b80b46b7e5b379cd64184f2262f (patch) | |
tree | bf32bf3bafe6f367ee97249c83afb4c9e9a637af /doc/README.mb | |
parent | 6e66468f3a160878111578a93be2852635eb4f4d (diff) |
I really hope that I haven't missed anything in this one...
From: t-ishii@sra.co.jp
Attached are patches to enhance the multi-byte support. (patches are
against 7/18 snapshot)
* determine encoding at initdb/createdb rather than compile time
Now initdb/createdb has an option to specify the encoding. Also, I
modified the syntax of CREATE DATABASE to accept encoding option. See
README.mb for more details.
For this purpose I have added new column "encoding" to pg_database.
Also pg_attribute and pg_class are changed to catch up the
modification to pg_database. Actually I haved added pg_database_mb.h,
pg_attribute_mb.h and pg_class_mb.h. These are used only when MB is
enabled. The reason having separate files is I couldn't find a way to
use ifdef or whatever in those files. I have to admit it looks
ugly. No way.
* support for PGCLIENTENCODING when issuing COPY command
commands/copy.c modified.
* support for SQL92 syntax "SET NAMES"
See gram.y.
* support for LATIN2-5
* add UNICODE regression test case
* new test suite for MB
New directory test/mb added.
* clean up source files
Basic idea is to have MB's own subdirectory for easier maintenance.
These are include/mb and backend/utils/mb.
Diffstat (limited to 'doc/README.mb')
-rw-r--r-- | doc/README.mb | 60 |
1 files changed, 53 insertions, 7 deletions
diff --git a/doc/README.mb b/doc/README.mb index 775d05c48ba..d5436d16039 100644 --- a/doc/README.mb +++ b/doc/README.mb @@ -1,4 +1,4 @@ -postgresql 6.4 multi-byte (MB) support README Jun 5 1998 +postgresql 6.4 multi-byte (MB) support README Jul 22 1998 Tatsuo Ishii t-ishii@sra.co.jp @@ -10,7 +10,10 @@ The MB support is intended for allowing PostgreSQL to handle multi-byte character sets such as EUC(Extended Unix Code), Unicode and Mule internal code. With the MB enabled you can use multi-byte character sets in regexp ,LIKE and some functions. The encoding system -chosen is determined at the compile time. +chosen is determined when initializing your PostgreSQL installation +using initdb(1). Note that this can be overrided when creating a +database using createdb(1) or create database SQL command. So you +could have multiple databases with different encoding system. MB also fixes some problems concerning with 8-bit single byte character sets including ISO8859. (I would not say all of problems @@ -36,7 +39,11 @@ where encoding_system is one of: EUC_TW Taiwan EUC UNICODE Unicode(UTF-8) MULE_INTERNAL Mule internal - LATIN1 ISO 8859-1 English and some European laguages + LATIN1 ISO 8859-1 English and some European languages + LATIN2 ISO 8859-2 English and some European languages + LATIN3 ISO 8859-3 English and some European languages + LATIN4 ISO 8859-4 English and some European languages + LATIN5 ISO 8859-5 English and some European languages Example: @@ -50,7 +57,28 @@ Example: If MB is disabled, nothing is changed except better supporting for 8-bit single byte character sets. -2. PGCLIENTENCODING +2. How to set encoding + +initdb command defines the default encoding for a PostgreSQL +installation. For example: + + % initdb -e EUC_JP + +sets the default encoding to EUC_JP(Extended Unix Code for Japanese). +Note that you can use "-pgencoding" instead of "-e" if you like longer +option string:-) If no -e or -pgencoding option is given, the encoding +specified at the compile time is used. + +You can create a database with a different encoding. + + % createdb -E EUC_KR korean + +will create a database named "korean" with EUC_KR encoding. The +another way to accomplish this is to use a SQL command: + + CREATE DATABASE korean WITH ENCODING = 'EUC_KR'; + +3. PGCLIENTENCODING If an environment variable PGCLIENTENCODING is defined on the frontend, automatic encoding translation is done by the backend. For @@ -68,7 +96,11 @@ Supported encodings for PGCLIENTENCODING are: EUC_KR Korean EUC EUC_TW Taiwan EUC MULE_INTERNAL Mule internal - LATIN1 ISO 8859-1 English and some European laguages + LATIN1 ISO 8859-1 English and some European languages + LATIN2 ISO 8859-2 English and some European languages + LATIN3 ISO 8859-3 English and some European languages + LATIN4 ISO 8859-4 English and some European languages + LATIN5 ISO 8859-5 English and some European languages Note that UNICODE is not supported(yet). Also note that the translation is not always possible. Suppose you choose EUC_JP for the @@ -86,7 +118,12 @@ new command: SET CLIENT_ENCODING TO 'encoding'; where encoding is one of the encodings those can be set to -PGCLIENTENCODING. To query the current the frontend encoding: +PGCLIENTENCODING. Also you can use SQL92 syntax "SET NAMES" for this +purpose: + + SET NAMES 'encoding'; + +To query the current the frontend encoding: SHOW CLIENT_ENCODING; @@ -114,7 +151,16 @@ Unicode: http://www.unicode.org/ 5. History -Jun 5, 1988 +Jul 22, 1998 + * determine encoding at initdb/createdb rather than compile time + * support for PGCLIENTENCODING when issuing COPY command + * support for SQL92 syntax "SET NAMES" + * support for LATIN2-5 + * add UNICODE regression test case + * new test suite for MB + * clean up source files + +Jun 5, 1998 * add support for the encoding translation between the backend and the frontend * new command SET CLIENT_ENCODING etc. added |