Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrectly parsing UTF-8 words #223

Open
michalpodrouzek opened this issue Oct 22, 2024 · 1 comment
Open

Incorrectly parsing UTF-8 words #223

michalpodrouzek opened this issue Oct 22, 2024 · 1 comment

Comments

@michalpodrouzek
Copy link

michalpodrouzek commented Oct 22, 2024

Hello,

We've had an issue with the parser, it was working correctly for most languages, but we've noticed that it incorrectly parses words in Czech. For example, we had a term CI and the parser was parsing the word zákazníci like zákazníci.

Screenshot 2024-10-22 at 16 27 55

For anyone who happens to have this issue, we've come to a solution to give the regex an additional flag /u to the regex pattern in ParserService.
Here is a patch for this:

`
diff --git a/Classes/Service/ParserService.php b/Classes/Service/ParserService.php
--- a/Classes/Service/ParserService.php (revision 29da54f)
+++ b/Classes/Service/ParserService.php (date 1729607337701)
@@ -580,7 +580,7 @@
'($|[\s<[:punct:]]|<br*>' . self::$additionalRegexWrapCharacters . ')' .
'(?![^<]>|[^<>]</)' .
'#' .

  •        ($term->isCaseSensitive() ? '' : 'i');
    
  •        ($term->isCaseSensitive() ? '' : 'i') . 'u';
    
       // replace callback
       $callback = function (array $match) use (
    

`

Thanks for this extension :)

@featdd
Copy link
Owner

featdd commented Nov 17, 2024

Hi @michalpodrouzek,

I have to check if adding this produces issues on some other places, there were issues with the case of umlauts as well.

Greetings
Daniel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants