Re: Fuzzy substring searching with the pg_trgm extension
От | Artur Zakirov |
---|---|
Тема | Re: Fuzzy substring searching with the pg_trgm extension |
Дата | |
Msg-id | 56AB8C2F.2080609@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: Fuzzy substring searching with the pg_trgm extension (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Ответы |
Re: Fuzzy substring searching with the pg_trgm extension
|
Список | pgsql-hackers |
On 29.01.2016 18:39, Alvaro Herrera wrote: > Teodor Sigaev wrote: >>> The behavior of this function is surprising to me. >>> >>> select substring_similarity('dog' , 'hotdogpound') ; >>> >>> substring_similarity >>> ---------------------- >>> 0.25 >>> >> Substring search was desined to search similar word in string: >> contrib_regression=# select substring_similarity('dog' , 'hot dogpound') ; >> substring_similarity >> ---------------------- >> 0.75 >> >> contrib_regression=# select substring_similarity('dog' , 'hot dog pound') ; >> substring_similarity >> ---------------------- >> 1 > > Hmm, this behavior looks too much like magic to me. I mean, a substring > is a substring -- why are we treating the space as a special character > here? > I think, I can rename this function to subword_similarity() and correct the documentation. The current behavior is developed to find most similar word in a text. For example, if we will search just substring (not word) then we will get the following result: select substring_similarity('dog', 'dogmatist'); substring_similarity --------------------- 1 (1 row) But this is wrong I think. They are completely different words. For searching a similar substring (not word) in a text maybe another function should be added? -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
В списке pgsql-hackers по дате отправления: