Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 8f8a5df

Browse files
committed
Make initdb behave sanely when the selected locale has codeset "US-ASCII".
Per discussion, this should result in defaulting to SQL_ASCII encoding. The original coding could not support that because it conflated selection of SQL_ASCII encoding with not being able to determine the encoding. Adjust pg_get_encoding_from_locale()'s API to distinguish these cases, and fix callers appropriately. Only initdb actually changes behavior, since the other callers were perfectly content to consider these cases equivalent. Per bug #5178 from Boh Yap. Not going to bother back-patching, since no one has complained before and there's an easy workaround (namely, specify the encoding you want).
1 parent 19d8027 commit 8f8a5df

File tree

4 files changed

+44
-36
lines changed

4 files changed

+44
-36
lines changed

src/backend/commands/dbcommands.c

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
*
1414
*
1515
* IDENTIFICATION
16-
* $PostgreSQL: pgsql/src/backend/commands/dbcommands.c,v 1.227 2009/10/07 22:14:18 alvherre Exp $
16+
* $PostgreSQL: pgsql/src/backend/commands/dbcommands.c,v 1.228 2009/11/12 02:46:16 tgl Exp $
1717
*
1818
*-------------------------------------------------------------------------
1919
*/
@@ -334,27 +334,30 @@ createdb(const CreatedbStmt *stmt)
334334
* Check whether chosen encoding matches chosen locale settings. This
335335
* restriction is necessary because libc's locale-specific code usually
336336
* fails when presented with data in an encoding it's not expecting. We
337-
* allow mismatch in three cases:
337+
* allow mismatch in four cases:
338338
*
339-
* 1. locale encoding = SQL_ASCII, which means either that the locale is
340-
* C/POSIX which works with any encoding, or that we couldn't determine
341-
* the locale's encoding and have to trust the user to get it right.
339+
* 1. locale encoding = SQL_ASCII, which means that the locale is
340+
* C/POSIX which works with any encoding.
342341
*
343-
* 2. selected encoding is SQL_ASCII, but only if you're a superuser. This
344-
* is risky but we have historically allowed it --- notably, the
345-
* regression tests require it.
342+
* 2. locale encoding = -1, which means that we couldn't determine
343+
* the locale's encoding and have to trust the user to get it right.
346344
*
347345
* 3. selected encoding is UTF8 and platform is win32. This is because
348346
* UTF8 is a pseudo codepage that is supported in all locales since it's
349347
* converted to UTF16 before being used.
350348
*
349+
* 4. selected encoding is SQL_ASCII, but only if you're a superuser. This
350+
* is risky but we have historically allowed it --- notably, the
351+
* regression tests require it.
352+
*
351353
* Note: if you change this policy, fix initdb to match.
352354
*/
353355
ctype_encoding = pg_get_encoding_from_locale(dbctype);
354356
collate_encoding = pg_get_encoding_from_locale(dbcollate);
355357

356358
if (!(ctype_encoding == encoding ||
357359
ctype_encoding == PG_SQL_ASCII ||
360+
ctype_encoding == -1 ||
358361
#ifdef WIN32
359362
encoding == PG_UTF8 ||
360363
#endif
@@ -369,6 +372,7 @@ createdb(const CreatedbStmt *stmt)
369372

370373
if (!(collate_encoding == encoding ||
371374
collate_encoding == PG_SQL_ASCII ||
375+
collate_encoding == -1 ||
372376
#ifdef WIN32
373377
encoding == PG_UTF8 ||
374378
#endif

src/backend/utils/mb/mbutils.c

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
*
55
* Tatsuo Ishii
66
*
7-
* $PostgreSQL: pgsql/src/backend/utils/mb/mbutils.c,v 1.91 2009/10/17 05:14:52 mha Exp $
7+
* $PostgreSQL: pgsql/src/backend/utils/mb/mbutils.c,v 1.92 2009/11/12 02:46:16 tgl Exp $
88
*/
99
#include "postgres.h"
1010

@@ -984,7 +984,14 @@ int
984984
GetPlatformEncoding(void)
985985
{
986986
if (PlatformEncoding == NULL)
987-
PlatformEncoding = &pg_enc2name_tbl[pg_get_encoding_from_locale("")];
987+
{
988+
/* try to determine encoding of server's environment locale */
989+
int encoding = pg_get_encoding_from_locale("");
990+
991+
if (encoding < 0)
992+
encoding = PG_SQL_ASCII;
993+
PlatformEncoding = &pg_enc2name_tbl[encoding];
994+
}
988995
return PlatformEncoding->encoding;
989996
}
990997

src/bin/initdb/initdb.c

Lines changed: 7 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@
4242
* Portions Copyright (c) 1994, Regents of the University of California
4343
* Portions taken from FreeBSD.
4444
*
45-
* $PostgreSQL: pgsql/src/bin/initdb/initdb.c,v 1.175 2009/09/03 01:40:11 tgl Exp $
45+
* $PostgreSQL: pgsql/src/bin/initdb/initdb.c,v 1.176 2009/11/12 02:46:16 tgl Exp $
4646
*
4747
*-------------------------------------------------------------------------
4848
*/
@@ -2193,21 +2193,14 @@ check_locale_encoding(const char *locale, int user_enc)
21932193

21942194
locale_enc = pg_get_encoding_from_locale(locale);
21952195

2196-
/* We allow selection of SQL_ASCII --- see notes in createdb() */
2196+
/* See notes in createdb() to understand these tests */
21972197
if (!(locale_enc == user_enc ||
21982198
locale_enc == PG_SQL_ASCII ||
2199-
user_enc == PG_SQL_ASCII
2199+
locale_enc == -1 ||
22002200
#ifdef WIN32
2201-
2202-
/*
2203-
* On win32, if the encoding chosen is UTF8, all locales are OK (assuming
2204-
* the actual locale name passed the checks above). This is because UTF8
2205-
* is a pseudo-codepage, that we convert to UTF16 before doing any
2206-
* operations on, and UTF16 supports all locales.
2207-
*/
2208-
|| user_enc == PG_UTF8
2201+
user_enc == PG_UTF8 ||
22092202
#endif
2210-
))
2203+
user_enc == PG_SQL_ASCII))
22112204
{
22122205
fprintf(stderr, _("%s: encoding mismatch\n"), progname);
22132206
fprintf(stderr,
@@ -2851,11 +2844,9 @@ main(int argc, char *argv[])
28512844

28522845
ctype_enc = pg_get_encoding_from_locale(lc_ctype);
28532846

2854-
if (ctype_enc == PG_SQL_ASCII &&
2855-
!(pg_strcasecmp(lc_ctype, "C") == 0 ||
2856-
pg_strcasecmp(lc_ctype, "POSIX") == 0))
2847+
if (ctype_enc == -1)
28572848
{
2858-
/* Hmm, couldn't recognize the locale's codeset */
2849+
/* Couldn't recognize the locale's codeset */
28592850
fprintf(stderr, _("%s: could not find suitable encoding for locale %s\n"),
28602851
progname, lc_ctype);
28612852
fprintf(stderr, _("Rerun %s with the -E option.\n"), progname);

src/port/chklocale.c

Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $PostgreSQL: pgsql/src/port/chklocale.c,v 1.11 2009/02/10 19:29:39 petere Exp $
11+
* $PostgreSQL: pgsql/src/port/chklocale.c,v 1.12 2009/11/12 02:46:16 tgl Exp $
1212
*
1313
*-------------------------------------------------------------------------
1414
*/
@@ -181,6 +181,8 @@ static const struct encoding_match encoding_match_list[] = {
181181

182182
{PG_SHIFT_JIS_2004, "SJIS_2004"},
183183

184+
{PG_SQL_ASCII, "US-ASCII"},
185+
184186
{PG_SQL_ASCII, NULL} /* end marker */
185187
};
186188

@@ -215,13 +217,13 @@ win32_langinfo(const char *ctype)
215217

216218
/*
217219
* Given a setting for LC_CTYPE, return the Postgres ID of the associated
218-
* encoding, if we can determine it.
220+
* encoding, if we can determine it. Return -1 if we can't determine it.
219221
*
220222
* Pass in NULL to get the encoding for the current locale setting.
223+
* Pass "" to get the encoding selected by the server's environment.
221224
*
222225
* If the result is PG_SQL_ASCII, callers should treat it as being compatible
223-
* with any desired encoding. We return this if the locale is C/POSIX or we
224-
* can't determine the encoding.
226+
* with any desired encoding.
225227
*/
226228
int
227229
pg_get_encoding_from_locale(const char *ctype)
@@ -237,17 +239,17 @@ pg_get_encoding_from_locale(const char *ctype)
237239

238240
save = setlocale(LC_CTYPE, NULL);
239241
if (!save)
240-
return PG_SQL_ASCII; /* setlocale() broken? */
242+
return -1; /* setlocale() broken? */
241243
/* must copy result, or it might change after setlocale */
242244
save = strdup(save);
243245
if (!save)
244-
return PG_SQL_ASCII; /* out of memory; unlikely */
246+
return -1; /* out of memory; unlikely */
245247

246248
name = setlocale(LC_CTYPE, ctype);
247249
if (!name)
248250
{
249251
free(save);
250-
return PG_SQL_ASCII; /* bogus ctype passed in? */
252+
return -1; /* bogus ctype passed in? */
251253
}
252254

253255
#ifndef WIN32
@@ -266,7 +268,7 @@ pg_get_encoding_from_locale(const char *ctype)
266268
/* much easier... */
267269
ctype = setlocale(LC_CTYPE, NULL);
268270
if (!ctype)
269-
return PG_SQL_ASCII; /* setlocale() broken? */
271+
return -1; /* setlocale() broken? */
270272
#ifndef WIN32
271273
sys = nl_langinfo(CODESET);
272274
if (sys)
@@ -277,7 +279,7 @@ pg_get_encoding_from_locale(const char *ctype)
277279
}
278280

279281
if (!sys)
280-
return PG_SQL_ASCII; /* out of memory; unlikely */
282+
return -1; /* out of memory; unlikely */
281283

282284
/* If locale is C or POSIX, we can allow all encodings */
283285
if (pg_strcasecmp(ctype, "C") == 0 || pg_strcasecmp(ctype, "POSIX") == 0)
@@ -328,12 +330,16 @@ pg_get_encoding_from_locale(const char *ctype)
328330
#endif
329331

330332
free(sys);
331-
return PG_SQL_ASCII;
333+
return -1;
332334
}
333335
#else /* (HAVE_LANGINFO_H && CODESET) || WIN32 */
334336

335337
/*
336338
* stub if no platform support
339+
*
340+
* Note: we could return -1 here, but that would have the effect of
341+
* forcing users to specify an encoding to initdb on such platforms.
342+
* It seems better to silently default to SQL_ASCII.
337343
*/
338344
int
339345
pg_get_encoding_from_locale(const char *ctype)

0 commit comments

Comments
 (0)