Re: Mac OS: invalid byte sequence for encoding "UTF8"
От | Artur Zakirov |
---|---|
Тема | Re: Mac OS: invalid byte sequence for encoding "UTF8" |
Дата | |
Msg-id | 56BC42FA.10509@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: Mac OS: invalid byte sequence for encoding "UTF8" (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
On 11.02.2016 01:19, Tom Lane wrote: > I wrote: >> Artur Zakirov <a.zakirov@postgrespro.ru> writes: >>> I think this is not a bug. It is a normal behavior. In Mac OS sscanf() >>> with the %s format reads the string one character at a time. The size of >>> letter 'Ñ…' is 2. And sscanf() separate it into two wrong characters. > >> That argument might be convincing if OSX behaved that way for all >> multibyte characters, but it doesn't seem to be doing that. Why is >> only 'Ñ…' affected? > > I looked into the OS X sources, and found that indeed you are right: > *scanf processes the input a byte at a time, and applies isspace() to > each byte separately, even when the locale is such that that's a clearly > insane thing to do. Since this code was derived from FreeBSD, FreeBSD > has or once had the same issue. (A look at the freebsd project on github > says it still does, assuming that's the authoritative repo.) Not sure > about other BSDen. > > I also verified that in UTF8-based locales, isspace() thinks that 0x85 and > 0xA0, and no other high-bit-set values, are spaces. Not sure exactly why > it thinks that, but that explains why 'Ñ…' fails when adjacent code points > don't. > > So apparently the coding rule we have to adopt is "don't use *scanf() > on data that might contain multibyte characters". (There might be corner > cases where it'd work all right for conversion specifiers other than %s, > but probably you might as well just use strtol and friends in such cases.) > Ugh. > > regards, tom lane > Yes, I meant this. The second byte divides the word into two wrong pieces. Sorry for my unclear explanation. I should to explain more clearly. -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
В списке pgsql-hackers по дате отправления: