-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi-byte grouping_mark
doesn't work when source file is different encoding
#1459
Comments
Similarly, mark <- " "
charToRaw(mark)
#> [1] 20
txt <- paste0("123", mark, "456.78")
txt
#> [1] "123 456.78"
readr::parse_number(txt, locale = readr::locale(grouping_mark = mark))
#> [1] 123456.8
mark <- "\U00A0"
charToRaw(mark)
#> [1] c2 a0
txt <- paste0("123", mark, "456.78")
txt
#> [1] "123 456.78"
readr::parse_number(txt, locale = readr::locale(grouping_mark = mark))
#> [1] 123
mark <- "’"
charToRaw(mark)
#> [1] e2 80 99
txt <- paste0("123", mark, "456.78")
txt
#> [1] "123’456.78"
readr::parse_number(txt, locale = readr::locale(grouping_mark = mark))
#> [1] 123 |
Place to start is probably to figure out what's going on here: library(readr)
parse_number("123--456", locale = locale(grouping_mark = "--"))
#> [1] 123456
parse_number("123\U00A0456", locale = locale(grouping_mark = "\U00A0"))
#> [1] 123 Created on 2023-08-01 with reprex v2.0.2 |
"--" seems to work because of some kind of recycling... multi-byte grouping marks where the bytes are different do not work readr::parse_number("123-456", locale = readr::locale(grouping_mark = "-"))
#> [1] 123456
readr::parse_number("123|456", locale = readr::locale(grouping_mark = "|"))
#> [1] 123456
readr::parse_number("123-456", locale = readr::locale(grouping_mark = "---"))
#> [1] 123456
readr::parse_number("123---456", locale = readr::locale(grouping_mark = "-"))
#> [1] 123456
readr::parse_number("123|456", locale = readr::locale(grouping_mark = "|||"))
#> [1] 123456
readr::parse_number("123|||456", locale = readr::locale(grouping_mark = "|"))
#> [1] 123456
readr::parse_number("123|-456", locale = readr::locale(grouping_mark = "|-"))
#> [1] 123
readr::parse_number("123-|456", locale = readr::locale(grouping_mark = "-|"))
#> [1] 123 |
pretty sure this is iterating through bytes, not characters Lines 165 to 178 in e529cb2
|
Using a multi-byte character as a
grouping_mark
doesn't work when the source file encoding is "windows-1252", while other uncommon non-multi-byte strings work as expected. I'm on macOS in UTF-8 locale. Is there a way to specify thegrouping_mark
so that it matches when the source file is in "windows-1252"? Seems like #796 is related, which was closed by tidyverse/vroom@959b4b7The text was updated successfully, but these errors were encountered: