Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 5827472

Browse files
committed
Be forgiving of variant spellings of locale names in pg_upgrade.
Even though the server tries to canonicalize stored locale names, the platform often doesn't cooperate, so it's entirely possible that one DB thinks its locale is, say, "en_US.UTF-8" while the other has "en_US.utf8". Rather than failing, we should try to allow this where it's clearly OK. There is already pretty robust encoding lookup in encnames.c, so make use of that to compare the encoding parts of the names. The locale identifier parts are just compared case-insensitively, which we were already doing. The major problem known to exist in the field is variant encoding-name spellings, so hopefully this will be Good Enough. If not, we can try being even laxer. Pavel Raiskup, reviewed by Rushabh Lathia
1 parent 41e364e commit 5827472

File tree

1 file changed

+69
-13
lines changed

1 file changed

+69
-13
lines changed

contrib/pg_upgrade/check.c

+69-13
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,16 @@
99

1010
#include "postgres_fe.h"
1111

12+
#include "mb/pg_wchar.h"
1213
#include "pg_upgrade.h"
1314

1415

1516
static void set_locale_and_encoding(ClusterInfo *cluster);
1617
static void check_new_cluster_is_empty(void);
1718
static void check_locale_and_encoding(ControlData *oldctrl,
1819
ControlData *newctrl);
20+
static bool equivalent_locale(const char *loca, const char *locb);
21+
static bool equivalent_encoding(const char *chara, const char *charb);
1922
static void check_is_super_user(ClusterInfo *cluster);
2023
static void check_for_prepared_transactions(ClusterInfo *cluster);
2124
static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
@@ -397,27 +400,80 @@ set_locale_and_encoding(ClusterInfo *cluster)
397400
/*
398401
* check_locale_and_encoding()
399402
*
400-
* locale is not in pg_controldata in 8.4 and later so
401-
* we probably had to get via a database query.
403+
* Check that old and new locale and encoding match. Even though the backend
404+
* tries to canonicalize stored locale names, the platform often doesn't
405+
* cooperate, so it's entirely possible that one DB thinks its locale is
406+
* "en_US.UTF-8" while the other says "en_US.utf8". Try to be forgiving.
402407
*/
403408
static void
404409
check_locale_and_encoding(ControlData *oldctrl,
405410
ControlData *newctrl)
406411
{
407-
/*
408-
* These are often defined with inconsistent case, so use pg_strcasecmp().
409-
* They also often use inconsistent hyphenation, which we cannot fix, e.g.
410-
* UTF-8 vs. UTF8, so at least we display the mismatching values.
411-
*/
412-
if (pg_strcasecmp(oldctrl->lc_collate, newctrl->lc_collate) != 0)
412+
if (!equivalent_locale(oldctrl->lc_collate, newctrl->lc_collate))
413413
pg_fatal("lc_collate cluster values do not match: old \"%s\", new \"%s\"\n",
414-
oldctrl->lc_collate, newctrl->lc_collate);
415-
if (pg_strcasecmp(oldctrl->lc_ctype, newctrl->lc_ctype) != 0)
414+
oldctrl->lc_collate, newctrl->lc_collate);
415+
if (!equivalent_locale(oldctrl->lc_ctype, newctrl->lc_ctype))
416416
pg_fatal("lc_ctype cluster values do not match: old \"%s\", new \"%s\"\n",
417-
oldctrl->lc_ctype, newctrl->lc_ctype);
418-
if (pg_strcasecmp(oldctrl->encoding, newctrl->encoding) != 0)
417+
oldctrl->lc_ctype, newctrl->lc_ctype);
418+
if (!equivalent_encoding(oldctrl->encoding, newctrl->encoding))
419419
pg_fatal("encoding cluster values do not match: old \"%s\", new \"%s\"\n",
420-
oldctrl->encoding, newctrl->encoding);
420+
oldctrl->encoding, newctrl->encoding);
421+
}
422+
423+
/*
424+
* equivalent_locale()
425+
*
426+
* Best effort locale-name comparison. Return false if we are not 100% sure
427+
* the locales are equivalent.
428+
*/
429+
static bool
430+
equivalent_locale(const char *loca, const char *locb)
431+
{
432+
const char *chara = strrchr(loca, '.');
433+
const char *charb = strrchr(locb, '.');
434+
int lencmp;
435+
436+
/* If they don't both contain an encoding part, just do strcasecmp(). */
437+
if (!chara || !charb)
438+
return (pg_strcasecmp(loca, locb) == 0);
439+
440+
/* Compare the encoding parts. */
441+
if (!equivalent_encoding(chara + 1, charb + 1))
442+
return false;
443+
444+
/*
445+
* OK, compare the locale identifiers (e.g. en_US part of en_US.utf8).
446+
*
447+
* It's tempting to ignore non-alphanumeric chars here, but for now it's
448+
* not clear that that's necessary; just do case-insensitive comparison.
449+
*/
450+
lencmp = chara - loca;
451+
if (lencmp != charb - locb)
452+
return false;
453+
454+
return (pg_strncasecmp(loca, locb, lencmp) == 0);
455+
}
456+
457+
/*
458+
* equivalent_encoding()
459+
*
460+
* Best effort encoding-name comparison. Return true only if the encodings
461+
* are valid server-side encodings and known equivalent.
462+
*
463+
* Because the lookup in pg_valid_server_encoding() does case folding and
464+
* ignores non-alphanumeric characters, this will recognize many popular
465+
* variant spellings as equivalent, eg "utf8" and "UTF-8" will match.
466+
*/
467+
static bool
468+
equivalent_encoding(const char *chara, const char *charb)
469+
{
470+
int enca = pg_valid_server_encoding(chara);
471+
int encb = pg_valid_server_encoding(charb);
472+
473+
if (enca < 0 || encb < 0)
474+
return false;
475+
476+
return (enca == encb);
421477
}
422478

423479

0 commit comments

Comments
 (0)