Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit d535136

Browse files
committed
Don't downcase non-ascii identifier chars in multi-byte encodings.
Long-standing code has called tolower() on identifier character bytes with the high bit set. This is clearly an error and produces junk output when the encoding is multi-byte. This patch therefore restricts this activity to cases where there is a character with the high bit set AND the encoding is single-byte. There have been numerous gripes about this, most recently from Martin Schäfer. Backpatch to all live releases.
1 parent 94e3311 commit d535136

File tree

1 file changed

+5
-3
lines changed

1 file changed

+5
-3
lines changed

src/backend/parser/scansup.c

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -132,25 +132,27 @@ downcase_truncate_identifier(const char *ident, int len, bool warn)
132132
{
133133
char *result;
134134
int i;
135+
bool enc_is_single_byte;
135136

136137
result = palloc(len + 1);
138+
enc_is_single_byte = pg_database_encoding_max_length() == 1;
137139

138140
/*
139141
* SQL99 specifies Unicode-aware case normalization, which we don't yet
140142
* have the infrastructure for. Instead we use tolower() to provide a
141143
* locale-aware translation. However, there are some locales where this
142144
* is not right either (eg, Turkish may do strange things with 'i' and
143145
* 'I'). Our current compromise is to use tolower() for characters with
144-
* the high bit set, and use an ASCII-only downcasing for 7-bit
145-
* characters.
146+
* the high bit set, as long as they aren't part of a multi-byte character,
147+
* and use an ASCII-only downcasing for 7-bit characters.
146148
*/
147149
for (i = 0; i < len; i++)
148150
{
149151
unsigned char ch = (unsigned char) ident[i];
150152

151153
if (ch >= 'A' && ch <= 'Z')
152154
ch += 'a' - 'A';
153-
else if (IS_HIGHBIT_SET(ch) && isupper(ch))
155+
else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
154156
ch = tolower(ch);
155157
result[i] = (char) ch;
156158
}

0 commit comments

Comments
 (0)