Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit fd90b5d

Browse files
committed
Fix ancient encoding error in hungarian.stop.
When we grabbed this file off the Snowball project's website, we mistakenly supposed that it was in LATIN1 encoding, but evidently it was actually in LATIN2. This resulted in ő (o-double-acute, U+0151, which is code 0xF5 in LATIN2) being misconverted into õ (o-tilde, U+00F5), as complained of in bug #10589 from Zoltán Sörös. We'd have messed up u-double-acute too, but there aren't any of those in the file. Other characters used in the file have the same codes in LATIN1 and LATIN2, which no doubt helped hide the problem for so long. The error is not only ours: the Snowball project also was confused about which encoding is required for Hungarian. But dealing with that will require source-code changes that I'm not at all sure we'll wish to back-patch. Fixing the stopword file seems reasonably safe to back-patch however.
1 parent 3bd82dd commit fd90b5d

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

src/backend/snowball/stopwords/hungarian.stop

+7-7
Original file line numberDiff line numberDiff line change
@@ -55,10 +55,10 @@ ekkor
5555
el
5656
elég
5757
ellen
58-
elõ
59-
elõször
60-
elõtt
61-
elsõ
58+
elő
59+
először
60+
előtt
61+
első
6262
én
6363
éppen
6464
ebben
@@ -149,9 +149,9 @@ nincs
149149
olyan
150150
ott
151151
össze
152-
õ
153-
õk
154-
õket
152+
ő
153+
ők
154+
őket
155155
pedig
156156
persze
157157

0 commit comments

Comments
 (0)