Re: Fuzzy substring searching with the pg_trgm extension
От | Artur Zakirov |
---|---|
Тема | Re: Fuzzy substring searching with the pg_trgm extension |
Дата | |
Msg-id | 56AF91E3.3010908@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: Fuzzy substring searching with the pg_trgm extension (Artur Zakirov <a.zakirov@postgrespro.ru>) |
Ответы |
Re: Fuzzy substring searching with the pg_trgm extension
|
Список | pgsql-hackers |
On 29.01.2016 18:58, Artur Zakirov wrote: > On 29.01.2016 18:39, Alvaro Herrera wrote: >> Teodor Sigaev wrote: >>>> The behavior of this function is surprising to me. >>>> >>>> select substring_similarity('dog' , 'hotdogpound') ; >>>> >>>> substring_similarity >>>> ---------------------- >>>> 0.25 >>>> >>> Substring search was desined to search similar word in string: >>> contrib_regression=# select substring_similarity('dog' , 'hot >>> dogpound') ; >>> substring_similarity >>> ---------------------- >>> 0.75 >>> >>> contrib_regression=# select substring_similarity('dog' , 'hot dog >>> pound') ; >>> substring_similarity >>> ---------------------- >>> 1 >> >> Hmm, this behavior looks too much like magic to me. I mean, a substring >> is a substring -- why are we treating the space as a special character >> here? >> > > I think, I can rename this function to subword_similarity() and correct > the documentation. > > The current behavior is developed to find most similar word in a text. > For example, if we will search just substring (not word) then we will > get the following result: > > select substring_similarity('dog', 'dogmatist'); > substring_similarity > --------------------- > 1 > (1 row) > > But this is wrong I think. They are completely different words. > > For searching a similar substring (not word) in a text maybe another > function should be added? > I have changed the patch: 1 - trgm2.data was corrected, duplicates were deleted. 2 - I have added operators <<-> and <->> with GiST index supporting. A regression test will pass only with the patch http://www.postgresql.org/message-id/CAPpHfdt19FwQXarYjkzxb3oxmv-KAn3FLuZrooARE_U3H3CV9g@mail.gmail.com 3 - the function substring_similarity() was renamed to subword_similarity(). But there is not a function substring_similarity_pos() yet. It is not trivial. -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Вложения
В списке pgsql-hackers по дате отправления: