doc/README.mb


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87

postgresql 6.3 multi-byte (MB) support README	  April 21 1998

						Tatsuo Ishii
						t-ishii@sra.co.jp
		  http://www.sra.co.jp/people/t-ishii/PostgreSQL/

Introduction

The MB support is intended for allowing PostgreSQL to handle
multi-byte character sets such as EUC(Extended Unix Code), Unicode and
Mule internal code. With the MB enabled you can use multi-byte
character sets in regexp ,LIKE and some functions. The encoding system
chosen is determined at the compile time.

MB also fixes some problems concerning with 8-bit single byte
character sets including ISO8859. (I would not say all of problems
have been fixed. I just confirmed that the regression test ran fine
and a few French characters could be used with the patch. Please let
me know if you find any problem while using 8-bit characters)

How to use

create src/Makefile.custom with a line including:

	MB=encoding_system

or run configure with the mb option:

	% configure --with-mb=encoding_system

where encoding_system is one of:

	EUC_JP			Japanese EUC
	EUC_CN			Chinese EUC
	EUC_KR			Korean EUC
	EUC_TW			Taiwan EUC
	UNICODE			Unicode(UTF-8)
	MULE_INTERNAL		Mule internal

Example:

	% cat Makefile.custom
	MB=EUC_JP

	or

	% configure --with-mb=EUC_JP

If MB is disabled, nothing is changed except better supporting for
8-bit single byte character sets.

References

These are good sources to start learning various kind of encoding
systems.

ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf
	Detailed explanations of EUC_JP, EUC_CN, EUC_KR, EUC_TW
	appear in section 3.2.

Unicode: http://www.unicode.org/
	The homepage of UNICODE.

	RFC 2044
	UTF-8 is defined here.

History

April 21, 1998 some enhancements/fixes
	* character_length(), position(), substring() are now aware of 
	  multi-byte characters
	* add octet_length()
	* add --with-mb option to configure
	* new regression tests for EUC_KR
  	  (contributed by "Soonmyung. Hong" <hong@lunaris.hanmesoft.co.kr>)
	* add some test cases to the EUC_JP regression test
	* fix problem in regress/regress.sh in case of System V
	* fix toupper(), tolower() to handle 8bit chars

Mar 25, 1998 MB PL2 is incorporated into PostgreSQL 6.3.1

Mar 10, 1998 PL2 released
	* add regression test for EUC_JP, EUC_CN and MULE_INTERNAL
	* add an English document (this file)
	* fix problems concerning 8-bit single byte characters

Mar 1, 1998 PL1 released