Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit f554af0

Browse files
committed
From: t-ishii@sra.co.jp
Hi, here are patches I promised (against 6.3.2): * character_length(), position(), substring() are now aware of multi-byte characters * add octet_length() * add --with-mb option to configure * new regression tests for EUC_KR (contributed by "Soonmyung. Hong" <hong@lunaris.hanmesoft.co.kr>) * add some test cases to the EUC_JP regression test * fix problem in regress/regress.sh in case of System V * fix toupper(), tolower() to handle 8bit chars note that: o patches for both configure.in and configure are included. maybe the one for configure is not necessary. o pg_proc.h was modified to add octet_length(). I used OIDs (1374-1379) for that. Please let me know if these numbers are not appropriate.
1 parent 2cbcf46 commit f554af0

File tree

15 files changed

+749
-372
lines changed

15 files changed

+749
-372
lines changed

doc/README.mb

Lines changed: 39 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,52 @@
1-
postgresql 6.3 multi-byte(MB) patch PL2 README Mar 10 1998
1+
postgresql 6.3 multi-byte (MB) support README April 21 1998
22

33
Tatsuo Ishii
44
t-ishii@sra.co.jp
55
http://www.sra.co.jp/people/t-ishii/PostgreSQL/
66

77
Introduction
88

9-
MB patch is intended for allowing PostgreSQL to handle multi-byte
10-
charachter sets such as EUC(Extende Unix Code), Unicode and Mule
11-
internal code. With the MB patch you can use multi-byte character sets
12-
in regexp and LIKE. The encoding system chosen is determined at the
13-
compile time.
9+
The MB support is intended for allowing PostgreSQL to handle
10+
multi-byte character sets such as EUC(Extended Unix Code), Unicode and
11+
Mule internal code. With the MB enabled you can use multi-byte
12+
character sets in regexp ,LIKE and some functions. The encoding system
13+
chosen is determined at the compile time.
1414

15-
The patch also fixes some problems concerning with 8-bit single byte
15+
MB also fixes some problems concerning with 8-bit single byte
1616
character sets including ISO8859. (I would not say all of problems
1717
have been fixed. I just confirmed that the regression test ran fine
1818
and a few French characters could be used with the patch. Please let
1919
me know if you find any problem while using 8-bit characters)
2020

2121
How to use
2222

23-
After applying the MB patch, create src/Makefile.custom with a line
24-
including:
23+
create src/Makefile.custom with a line including:
2524

26-
MB=encoding_system
25+
MB=encoding_system
26+
27+
or run configure with the mb option:
28+
29+
% configure --with-mb=encoding_system
2730

2831
where encoding_system is one of:
2932

30-
EUC_JP Japanese EUC
31-
EUC_CN Chinese EUC
32-
EUC_KR Korean EUC
33-
EUC_TW Taiwan EUC
34-
UNICODE Unicode(UTF-8)
35-
MULE_INTERNAL Mule internal
33+
EUC_JP Japanese EUC
34+
EUC_CN Chinese EUC
35+
EUC_KR Korean EUC
36+
EUC_TW Taiwan EUC
37+
UNICODE Unicode(UTF-8)
38+
MULE_INTERNAL Mule internal
3639

3740
Example:
3841

39-
% cat Makefile.custom
40-
MB=EUC_JP
42+
% cat Makefile.custom
43+
MB=EUC_JP
44+
45+
or
4146

42-
If MB is not defined, nothing is changed except better supporting for
47+
% configure --with-mb=EUC_JP
48+
49+
If MB is disabled, nothing is changed except better supporting for
4350
8-bit single byte character sets.
4451

4552
References
@@ -59,6 +66,19 @@ Unicode: http://www.unicode.org/
5966

6067
History
6168

69+
April 21, 1998 some enhancements/fixes
70+
* character_length(), position(), substring() are now aware of
71+
multi-byte characters
72+
* add octet_length()
73+
* add --with-mb option to configure
74+
* new regression tests for EUC_KR
75+
(contributed by "Soonmyung. Hong" <hong@lunaris.hanmesoft.co.kr>)
76+
* add some test cases to the EUC_JP regression test
77+
* fix problem in regress/regress.sh in case of System V
78+
* fix toupper(), tolower() to handle 8bit chars
79+
80+
Mar 25, 1998 MB PL2 is incorporated into PostgreSQL 6.3.1
81+
6282
Mar 10, 1998 PL2 released
6383
* add regression test for EUC_JP, EUC_CN and MULE_INTERNAL
6484
* add an English document (this file)

doc/README.mb.jp

Lines changed: 32 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,12 @@
1-
postgresql 6.3 multi-byte (MB) patch PL2 README 1998/3/10 $B:n@.(B
1+
postgresql 6.3.2 multi-byte (MB) support README 1998/4/21 $B:n@.(B
22

33
$B@P0fC#IW(B
44
t-ishii@sra.co.jp
55
http://www.sra.co.jp/people/t-ishii/PostgreSQL/
66

77
$B$O$8$a$K!'(B
8-
$B$3$N%Q%C%A$O!"%U%j!<$J(B RDBMS(Relational Database Management System)$B$N(B
9-
PostgreSQL (http://www.postgresql.org/)$B$N:G?7HG(B 6.3 $B$GF|K\8l(B EUC
10-
$B$J$I!"%^%k%A%P%$%HJ8;z$r07$&$3$H$r2DG=$K$9$k$?$a$N$b$N$G$9!#$3$N%Q%C(B
11-
$B%A$r$"$F$k$3$H$K$h$j!"0J2<$N$3$H$,2DG=$K$J$j$^$9!#(B
8+
9+
PostgreSQL $B$K$*$1$k%^%k%A%P%$%H%5%]!<%H$O0J2<$N$h$&$JFCD'$r;}$C$F$$$^$9!#(B
1210

1311
1.$B%^%k%A%P%$%HJ8;z$H$7$F!"F|K\8l!"Cf9q8l$J$I$N3F9q$N(B EUC$B!"(BUnicode$B!"(B
1412
mule internal code $B$,%3%s%Q%$%k;~$KA*Br2DG=!#%G!<%?%Y!<%9$K$O(B
@@ -19,45 +17,24 @@ postgresql 6.3 multi-byte (MB) patch PL2 README 1998/3/10 $B:n@.(B
1917
4.$B%G!<%?$=$N$b$N$K$b%^%k%A%P%$%HJ8;z$,;HMQ2DG=(B
2018
5.$B%^%k%A%P%$%HJ8;z$N@55,I=8=8!:w$,;HMQ2DG=(B
2119
6.$B%^%k%A%P%$%HJ8;z$N(B LIKE $B8!:w$,;HMQ2DG=(B
20+
7.character_length(), position(), substring() $B$G$N%^%k%A%P%$%H(B
21+
$B%5%]!<%H(B
2222

23-
($B$?$@$7!"(B2,3,4 $B$K$D$$$F$O%Q%C%A$r$"$F$J$/$F$b2DG=$G$9!#(B)
24-
25-
postgresql-6.3 $B$NF~<jJ}K!!'(B
26-
postgresql-6.3.tar.gz $B$O(B postgresql $B$NF|K\$G$N8x<0%_%i!<%5%$%H$G(B
27-
$B$"$k(B ftp://ftp.jaist.ac.jp/pub/dbms/PostgreSQL/ $B$+$iF~<j$G$-$^$9!#(B
28-
$B2?$i$+$NM}M3$G$3$3$+$iF~<j$G$-$J$$>l9g$O!"(B
29-
ftp://ftp.sra.co.jp/pub/cmd/postgres/6.3/ $B$bMxMQ$G$-$^$9!#(B
30-
$B$J$*!"(Bpostgresql $B$N%*%j%8%J%k(B ftp $B%5%$%H$O(B ftp://ftp.postgresql.org
31-
$B$G$9!#(B
32-
33-
$B$3$N%Q%C%A$NF~<jJ}K!!'(B
34-
35-
ftp://ftp.sra.co.jp/pub/cmd/postgres/6.3/patches/6.3mbPL2.patch.gz
36-
$B$rF~<j$7$F2<$5$$!#(B
37-
38-
$B%Q%C%A$N$"$F$+$?!'(B
39-
$BF~<j$7$?%Q%C%A%U%!%$%k$rE83+$7$^$9!#(B
40-
41-
% gunzip 6.3mbPL2.patch.gz
42-
43-
postgresql-6.3 $B$N%=!<%9$rE83+$7$^$9!#(B
44-
45-
% gtar xfz postgresql-6.3.tar.gz
23+
$B%$%s%9%H!<%k!'(B
24+
$B%G%U%)%k%H$G$O(B PostgreSQL $B$O%^%k%A%P%$%H$r%5%]!<%H$7$F$$$^$;$s!#(B
25+
$B%^%k%A%P%$%H%5%]!<%H$rM-8z$K$9$kJ}K!$r@bL@$7$^$9!#(B
4626

47-
$B$9$k$H!"(Bpostgresql-6.3 $B$H$$$&%G%#%l%/%H%j$,$G$-$k$N$G!"$=$3$K(B
48-
cd $B$7$^$9!#(B
49-
50-
% cd postgresql-6.3
51-
52-
$B%Q%C%A$rEv$F$^$9!#(B
53-
54-
% patch -p1 < 6.3mbPL2.patch
55-
56-
$B$H$7$F$"$F$F$/$@$5$$!#<!$K!"(Bsrc/Makefile.custom $B$H$$$&%U%!%$%k$r:n$j!"(B
27+
src/Makefile.custom $B$H$$$&%U%!%$%k$r:n$j!"(B
5728

5829
MB=EUC_JP
5930

60-
$B$N(B 1 $B9T$rDI2C$7$^$9!#(BEUC_JP $B$r4^$a!"0J2<$N%3!<%I$,;XDj$G$-$^$9!#(B
31+
$B$N(B 1 $B9T$rDI2C$7$^$9!#$"$k$$$O!"(Bconfigure $B5/F0;~$K0J2<$N$h$&$K;XDj$7$^$9!#(B
32+
33+
% configure --with-mb=EUC_JP
34+
35+
$BJ8;z%3!<%I$H$7$F$O(B EUC_JP $B$r4^$a!"0J2<$N%3!<%I$,;XDj$G$-$^$9!#(B
36+
($B8=:_$N<BAu$G$O!"J8;z%3!<%I$O%3%s%Q%$%k;~$K7hDj$5$l!"<B9T;~$K(B
37+
$BF0E*$KJQ99$9$k$3$H$O$G$-$^$;$s(B)
6138

6239
EUC_JP $BF|K\8l(B EUC
6340
EUC_CN GB $B$r%Y!<%9$K$7$?CfJ8(BEUC$B!#(Bcode set 2 $B$O(B
@@ -93,6 +70,22 @@ postgresql-6.3 $B$NF~<jJ}K!!'(B
9370

9471
$B2~DjMzNr!'(B
9572

73+
1998/4/21 $B5!G=DI2C!?%P%0=$@5(B
74+
* character_length(), position(), substring() $B$N%^%k%A%P%$%H(B
75+
$BBP1~(B
76+
* octet_length() $BDI2C(B $B"*(B initdb $B$N$d$jD>$7I,MW(B
77+
* configure $B$N%*%W%7%g%s$K(B MB $B%5%]!<%HDI2C(B
78+
(ex. configure --with-mb=EUC_JP)
79+
* EUC_KR $B$N(B regression test $BDI2C(B
80+
("Soonmyung. Hong" <hong@lunaris.hanmesoft.co.kr> $B$5$sDs6!(B)
81+
* EUC_JP $B$N(B regression test $B$K(B character_length(), position(),
82+
substring(), octet_length() $BDI2C(B
83+
* regress.sh $B$N(B SystemV $B$K$*$1$kHs8_49@-=$@5(B
84+
* toupper(), tolower() $B$K(B 8bit $BJ8;z$,EO$k$HMn$A$k$3$H$,(B
85+
$B$"$k$N$r=$@5(B
86+
87+
1998/3/25 PostgreSQL 6.3.1 $B%j%j!<%9!"(BMB PL2 $B$,<h$j9~$^$l$k(B
88+
9689
1998/3/10 PL2 $B$r%j%j!<%9(B
9790
* EUC_JP, EUC_CN, MULE_INTERNAL $B$N(B regression test $B$rDI2C(B
9891
(EUC_CN $B$N%G!<%?$O(B he@sra.co.jp $B$5$sDs6!(B)

src/Makefile.global.in

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
#
88
#
99
# IDENTIFICATION
10-
# $Header: /cvsroot/pgsql/src/Makefile.global.in,v 1.40 1998/04/27 14:54:05 scrappy Exp $
10+
# $Header: /cvsroot/pgsql/src/Makefile.global.in,v 1.41 1998/04/27 17:07:22 scrappy Exp $
1111
#
1212
# NOTES
1313
# Essentially all Postgres make files include this file and use the
@@ -147,6 +147,11 @@ X_CFLAGS= @X_CFLAGS@
147147
X_LIBS= @X_LIBS@
148148
X11_LIBS= -lX11 @X_EXTRA_LIBS@
149149

150+
#
151+
# enable multi-byte support
152+
# choose one of:
153+
# EUC_JP,EHC_CN,EUC_KR,EUC_TW,UNICODE,MULE_INTERNAL
154+
MB=@MB@
150155

151156
##############################################################################
152157
#

src/backend/regex/utils.c

Lines changed: 135 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
/*
22
* misc conversion functions between pg_wchar and other encodings.
33
* Tatsuo Ishii
4-
* $Id: utils.c,v 1.1 1998/03/15 07:38:39 scrappy Exp $
4+
* $Id: utils.c,v 1.2 1998/04/27 17:07:53 scrappy Exp $
55
*/
66
#include <regex/pg_wchar.h>
77
/*
@@ -324,25 +324,151 @@ static void pg_mule2wchar_with_len(const unsigned char *from, pg_wchar *to, int
324324
*to = 0;
325325
}
326326

327+
static int pg_euc_mblen(const unsigned char *s)
328+
{
329+
int len;
330+
331+
if (*s == SS2) {
332+
len = 2;
333+
} else if (*s == SS3) {
334+
len = 3;
335+
} else if (*s & 0x80) {
336+
len = 2;
337+
} else {
338+
len = 1;
339+
}
340+
return(len);
341+
}
342+
343+
static int pg_eucjp_mblen(const unsigned char *s)
344+
{
345+
return(pg_euc_mblen(s));
346+
}
347+
348+
static int pg_euckr_mblen(const unsigned char *s)
349+
{
350+
return(pg_euc_mblen(s));
351+
}
352+
353+
static int pg_eucch_mblen(const unsigned char *s)
354+
{
355+
int len;
356+
357+
if (*s == SS2) {
358+
len = 3;
359+
} else if (*s == SS3) {
360+
len = 3;
361+
} else if (*s & 0x80) {
362+
len = 2;
363+
} else {
364+
len = 1;
365+
}
366+
return(len);
367+
}
368+
369+
static int pg_euccn_mblen(const unsigned char *s)
370+
{
371+
int len;
372+
373+
if (*s == SS2) {
374+
len = 4;
375+
} else if (*s == SS3) {
376+
len = 3;
377+
} else if (*s & 0x80) {
378+
len = 2;
379+
} else {
380+
len = 1;
381+
}
382+
return(len);
383+
}
384+
385+
static int pg_utf_mblen(const unsigned char *s)
386+
{
387+
int len = 1;
388+
389+
if ((*s & 0x80) == 0) {
390+
len = 1;
391+
} else if ((*s & 0xe0) == 0xc0) {
392+
len = 2;
393+
} else if ((*s & 0xe0) == 0xe0) {
394+
len = 3;
395+
}
396+
return(len);
397+
}
398+
399+
static int pg_mule_mblen(const unsigned char *s)
400+
{
401+
int len;
402+
403+
if (IS_LC1(*s)) {
404+
len = 2;
405+
} else if (IS_LCPRV1(*s)) {
406+
len = 3;
407+
} else if (IS_LC2(*s)) {
408+
len = 3;
409+
} else if (IS_LCPRV2(*s)) {
410+
len = 4;
411+
} else { /* assume ASCII */
412+
len = 1;
413+
}
414+
return(len);
415+
}
416+
327417
typedef struct {
328-
void (*mb2wchar)();
329-
void (*mb2wchar_with_len)();
418+
void (*mb2wchar)(); /* convert a multi-byte string to a wchar */
419+
void (*mb2wchar_with_len)(); /* convert a multi-byte string to a wchar
420+
with a limited length */
421+
int (*mblen)(); /* returns the length of a multi-byte word */
330422
} pg_wchar_tbl;
331423

332424
static pg_wchar_tbl pg_wchar_table[] = {
333-
{pg_eucjp2wchar, pg_eucjp2wchar_with_len},
334-
{pg_eucch2wchar, pg_eucch2wchar_with_len},
335-
{pg_euckr2wchar, pg_euckr2wchar_with_len},
336-
{pg_euccn2wchar, pg_euccn2wchar_with_len},
337-
{pg_utf2wchar, pg_utf2wchar_with_len},
338-
{pg_mule2wchar, pg_mule2wchar_with_len}};
425+
{pg_eucjp2wchar, pg_eucjp2wchar_with_len, pg_eucjp_mblen},
426+
{pg_eucch2wchar, pg_eucch2wchar_with_len, pg_eucch_mblen},
427+
{pg_euckr2wchar, pg_euckr2wchar_with_len, pg_euckr_mblen},
428+
{pg_euccn2wchar, pg_euccn2wchar_with_len, pg_euccn_mblen},
429+
{pg_utf2wchar, pg_utf2wchar_with_len, pg_utf_mblen},
430+
{pg_mule2wchar, pg_mule2wchar_with_len, pg_mule_mblen}};
339431

432+
/* convert a multi-byte string to a wchar */
340433
void pg_mb2wchar(const unsigned char *from, pg_wchar *to)
341434
{
342435
(*pg_wchar_table[MB].mb2wchar)(from,to);
343436
}
344437

438+
/* convert a multi-byte string to a wchar with a limited length */
345439
void pg_mb2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
346440
{
347441
(*pg_wchar_table[MB].mb2wchar_with_len)(from,to,len);
348442
}
443+
444+
/* returns the byte length of a multi-byte word */
445+
int pg_mblen(const unsigned char *mbstr)
446+
{
447+
return((*pg_wchar_table[MB].mblen)(mbstr));
448+
}
449+
450+
/* returns the length (counted as a wchar) of a multi-byte string */
451+
int pg_mbstrlen(const unsigned char *mbstr)
452+
{
453+
int len = 0;
454+
while (*mbstr) {
455+
mbstr += pg_mblen(mbstr);
456+
len++;
457+
}
458+
return(len);
459+
}
460+
461+
/* returns the length (counted as a wchar) of a multi-byte string
462+
(not necessarily NULL terminated) */
463+
int pg_mbstrlen_with_len(const unsigned char *mbstr, int limit)
464+
{
465+
int len = 0;
466+
int l;
467+
while (*mbstr && limit > 0) {
468+
l = pg_mblen(mbstr);
469+
limit -= l;
470+
mbstr += l;
471+
len++;
472+
}
473+
return(len);
474+
}

0 commit comments

Comments
 (0)