Re: Fuzzy substring searching with the pg_trgm extension

Поиск

Список

Период

Сортировка

От	Artur Zakirov
Тема	Re: Fuzzy substring searching with the pg_trgm extension
Дата	29 января 2016 г. 15:58:44
Msg-id	56AB8C2F.2080609@postgrespro.ru обсуждение исходный текст
Ответ на	Re: Fuzzy substring searching with the pg_trgm extension (Alvaro Herrera <alvherre@2ndquadrant.com>)
Ответы	Re: Fuzzy substring searching with the pg_trgm extension
Список	pgsql-hackers

Дерево обсуждения

On 29.01.2016 18:39, Alvaro Herrera wrote:
> Teodor Sigaev wrote:
>>> The behavior of this function is surprising to me.
>>>
>>> select substring_similarity('dog' ,  'hotdogpound') ;
>>>
>>>   substring_similarity
>>> ----------------------
>>>                   0.25
>>>
>> Substring search was desined to search similar word in string:
>> contrib_regression=# select substring_similarity('dog' ,  'hot dogpound') ;
>>   substring_similarity
>> ----------------------
>>                   0.75
>>
>> contrib_regression=# select substring_similarity('dog' ,  'hot dog pound') ;
>>   substring_similarity
>> ----------------------
>>                      1
>
> Hmm, this behavior looks too much like magic to me.  I mean, a substring
> is a substring -- why are we treating the space as a special character
> here?
>

I think, I can rename this function to subword_similarity() and correct 
the documentation.

The current behavior is developed to find most similar word in a text. 
For example, if we will search just substring (not word) then we will 
get the following result:

select substring_similarity('dog', 'dogmatist'); substring_similarity
---------------------                    1
(1 row)

But this is wrong I think. They are completely different words.

For searching a similar substring (not word) in a text maybe another 
function should be added?

-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Fuzzy substring searching with the pg_trgm extension