Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] __repr__ returns bytes soup for non ASCII, but valid UTF-8 chars #3456

Closed
mzaks opened this issue Sep 8, 2024 · 0 comments
Closed
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed mojo-repo Tag all issues with this label

Comments

@mzaks
Copy link
Contributor

mzaks commented Sep 8, 2024

Bug description

I did not expect non ASCII but valid UTF-8 strings to be returned as bytes soup. If this is expected behaviour, please delete this issue.

Steps to reproduce

print("hello 🔥!".__repr__()) 
# prints: 'hello \xf0\x9f\x94\xa5!'

System information

macOS 14.5
mojo 2024.9.805 (d7ccdc12)
modular 0.8.0 (39a426b5)
@mzaks mzaks added bug Something isn't working mojo-repo Tag all issues with this label labels Sep 8, 2024
@JoeLoser JoeLoser added good first issue Good for newcomers help wanted Extra attention is needed labels Sep 13, 2024 — with Linear
modularbot pushed a commit that referenced this issue Sep 19, 2024
…-8 characters (#47447)

[External] [stdlib] Modify `String.__repr__` to handle multi-byte UTF-8
characters

This PR fixes #3456 by modifying
`String.__repr__` to handle multi-byte UTF-8 characters. The original
code iterated over the string's internal bytes buffer and handled each
one individually, causing the multi-byte characters to get emitted
separately.

I took advantage of `String.__iter__` since it already has the logic
internally to slice the string up into its UTF-8 characters, but I do
have another implementation that still iterates over the byte buffer and
pulls out the characters directly, if that solution is preferred for
whatever reason.

The old definition for the module function `string.ascii` used to just
call `String.__repr__` directly, so I copied the old definition of
`String.__repr__` to `string.ascii` so that it still works the same.

Co-authored-by: Derek Smith <derekcs@pm.me>
Closes #3495
MODULAR_ORIG_COMMIT_REV_ID: fd61985dd2145a5b341a557fdbdcaa82fa1e4b9c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed mojo-repo Tag all issues with this label
Projects
None yet
Development

No branches or pull requests

2 participants