Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

new function for tsquery creartion

Lists: pgsql-hackers
From: Victor Drobny <v(dot)drobny(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: new function for tsquery creartion
Date: 2017-07-19 16:43:51
Message-ID: fe931111ff7e9ad79196486ada79e268@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dear all,

Now Postgres has a few functions to create tsqueries for full text
search. The main one is the to_tsquery function that allows to make
query with any operation. But to make correct query all of the operators
should be specified explicitly. In order to make it easier postgres has
functions like plainto_tsquery and phraseto_tsquery which allow to make
tsqueries from strings. But they are not flexible enough.

Let me introduce new function for full text search query creation(which
is called 'queryto_tsquery'). It takes 'google like' query string and
translates it to tsquery.
The main features are the following:
All the text inside double quotes would be treated as a phrase("a b c"
-> 'a <-> b <-> c')
New operator AROUND(N). It matches if the distance between words(or
maybe phrases) is less than or equal to N.
Alias for !('-rat' is the same as '!rat')
Alias for |('dog OR cat' is the same as 'dog | cat')

As a plainto_tsquery and phraseto_tsquery it will fill operators by
itself, but already placed operations won't be ignored. It allows to
combine two approaches.

In the attachment you can find patch with the new features, tests and
documentation for it.
What do you think about it?

Thank you very much for the attention!

--
------
Victor Drobny
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
queryto_tsquery.patch text/x-diff 32.5 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Victor Drobny <v(dot)drobny(at)postgrespro(dot)ru>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new function for tsquery creartion
Date: 2017-07-19 16:58:05
Message-ID: CA+TgmoYSBLFAq14ePMWC=JBQ4D4Crt08emkcDPa65SQ6sbVAoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jul 19, 2017 at 12:43 PM, Victor Drobny <v(dot)drobny(at)postgrespro(dot)ru> wrote:
> Let me introduce new function for full text search query creation(which is
> called 'queryto_tsquery'). It takes 'google like' query string and
> translates it to tsquery.

I haven't looked at the code, but that sounds like a neat idea.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Victor Drobny <v(dot)drobny(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new function for tsquery creartion
Date: 2017-09-09 03:03:02
Message-ID: CAEepm=2S-rXC-G3iGSJLKwThF2v9zWG0pJB4R6v5LVVphwQkRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 20, 2017 at 4:58 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Jul 19, 2017 at 12:43 PM, Victor Drobny <v(dot)drobny(at)postgrespro(dot)ru> wrote:
>> Let me introduce new function for full text search query creation(which is
>> called 'queryto_tsquery'). It takes 'google like' query string and
>> translates it to tsquery.
>
> I haven't looked at the code, but that sounds like a neat idea.

+1

This is a very cool feature making tsquery much more accessible. Many
people know that sort of defacto search engine query language that
many websites accept using quotes, AND, OR, - etc.

Calling this search syntax just "query" seems too general and
overloaded. "Simple search", "simple query", "web search", "web
syntax", "web query", "Google-style query", "Poogle" (kidding!) ...
well I'm not sure, but I feel like it deserves a proper name.
websearch_to_tsquery()?

I see that your AROUND(n) is an undocumented Google search syntax.
That's a good trick to know.

Please send a rebased version of the patch for people to review and
test as that one has bit-rotted.

--
Thomas Munro
http://www.enterprisedb.com


From: Victor Drobny <v(dot)drobny(at)postgrespro(dot)ru>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: new function for tsquery creartion
Date: 2017-09-13 08:57:20
Message-ID: 69234700d26674fc7c171e48b850824c@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-09-09 06:03, Thomas Munro wrote:
> Please send a rebased version of the patch for people to review and
> test as that one has bit-rotted.
Hello,
Thank you for interest. In the attachment you can find rebased
version(based on 69835bc8988812c960f4ed5aeee86b62ac73602a commit)
--
Victor Drobny
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
queryto_tsquery2.patch text/x-diff 32.5 KB

From: Alexey Chernyshov <a(dot)chernyshov(at)postgrespro(dot)ru>
To: Victor Drobny <v(dot)drobny(at)postgrespro(dot)ru>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: new function for tsquery creartion
Date: 2017-10-13 13:37:32
Message-ID: 20171013163732.1bb85695@PostgresPro-A-Chernyshov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all,
I am extending phrase operator <n> is such way that it will have <n,m>
syntax that means from n to m words, so I will use such syntax (<n,m>)
further. I found that a AROUND(N) b is exactly the same as a <-N,N> b
and it can be replaced while parsing. So, what do you think of such
idea? In this patch I have noticed some unobvious behavior.

# select to_tsvector('Hello, cat world!') @@ queryto_tsquery('cat
AROUND(1) cat') as match;
match
-------
t

cat AROUND(1) cat is the same is "cat <1> cat || cat <0> cat" and:

# select to_tsvector('Hello, cat world!') @@ to_tsquery('cat <0> cat');
?column?
-------
t

It seems to be a proper logic behavior but it is a possible pitfall,
maybe it should be documented?

But more important question is how AROUND() operator should handle stop
words? Now it works as:

# select queryto_tsquery('cat <2> (a AROUND(10) rat)');
queryto_tsquery
------------------
'cat' <12> 'rat'
(1 row)

# select queryto_tsquery('cat <2> a AROUND(10) rat');
queryto_tsquery
------------------------
'cat' AROUND(12) 'rat'
(1 row)

In my opinion it should be like:
cat <2> (a AROUND(10) rat) == cat <2,2> (a <-10,10> rat) == cat <-8,12>
rat

cat <2> a AROUND(10) rat == cat <2,2> a <-10,10> rat = cat <-8, 12>
rat

Now <n,m> operator can be replaced with combination of phrase
operator <n>, AROUND(), and logical operators, but with <n,m> operator
it will be much painless. Correct me, please, if I am wrong.

--
Alexey Chernyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Victor Drobny <v(dot)drobny(at)postgrespro(dot)ru>
To: Alexey Chernyshov <a(dot)chernyshov(at)postgrespro(dot)ru>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: new function for tsquery creartion
Date: 2017-10-13 13:55:12
Message-ID: 3e1b0851d3a6a2da42f78d31cc241d0b@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-10-13 16:37, Alexey Chernyshov wrote:
> Hi all,
> I am extending phrase operator <n> is such way that it will have <n,m>
> syntax that means from n to m words, so I will use such syntax (<n,m>)
> further. I found that a AROUND(N) b is exactly the same as a <-N,N> b
> and it can be replaced while parsing. So, what do you think of such
> idea? In this patch I have noticed some unobvious behavior.

Thank you for the interest and review!

> # select to_tsvector('Hello, cat world!') @@ queryto_tsquery('cat
> AROUND(1) cat') as match;
> match
> -------
> t
>
> cat AROUND(1) cat is the same is "cat <1> cat || cat <0> cat" and:
>
> # select to_tsvector('Hello, cat world!') @@ to_tsquery('cat <0> cat');
> ?column?
> -------
> t
>
> It seems to be a proper logic behavior but it is a possible pitfall,
> maybe it should be documented?

It is a tricky question. I think that this interpretation is confusing,
so
better to make it as <-N, -1> and <1, N>.

> But more important question is how AROUND() operator should handle stop
> words? Now it works as:
>
> # select queryto_tsquery('cat <2> (a AROUND(10) rat)');
> queryto_tsquery
> ------------------
> 'cat' <12> 'rat'
> (1 row)
>
> # select queryto_tsquery('cat <2> a AROUND(10) rat');
> queryto_tsquery
> ------------------------
> 'cat' AROUND(12) 'rat'
> (1 row)
>
> In my opinion it should be like:
> cat <2> (a AROUND(10) rat) == cat <2,2> (a <-10,10> rat) == cat <-8,12>
> rat

I think that correct version is:
cat <2> (a AROUND(10) rat) == cat <2,2> (a <-10,10> rat) == cat <-2,12>
rat.

> cat <2> a AROUND(10) rat == cat <2,2> a <-10,10> rat = cat <-8, 12>
> rat

It is a problem indeed. I did not catch it during implementation. Thank
you
for pointing it out.

> Now <n,m> operator can be replaced with combination of phrase
> operator <n>, AROUND(), and logical operators, but with <n,m> operator
> it will be much painless. Correct me, please, if I am wrong.

I think that <n,m> operator is more general than around(n) so the last
one
should be based on yours. However, i think, that taking negative
parameters
is not the best idea because it is confusing. On top of that it is not
so
necessary and i think it won`t be popular among users.
It seems to me that AROUND operator can be easily implemented with
<n,m>,
also, it helps to avoid problems, that you showed above.

--
Victor Drobny
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Victor Drobny <v(dot)drobny(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] new function for tsquery creartion
Date: 2017-11-19 01:30:57
Message-ID: bff75704-a832-e240-d216-5ba6a8db5d13@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 09/13/2017 10:57 AM, Victor Drobny wrote:
> On 2017-09-09 06:03, Thomas Munro wrote:
>> Please send a rebased version of the patch for people to review and
>> test as that one has bit-rotted.
> Hello,
> Thank you for interest. In the attachment you can find rebased
> version(based on 69835bc8988812c960f4ed5aeee86b62ac73602a commit)
>

I did a quick review of the patch today. The patch unfortunately no
longer applies, so I had to use an older commit from September. Please
rebase to current master.

I've only looked on the diff at this point, will do more testing once
the rebase happens.

Some comments:

1) This seems to mix multiple improvements into one patch. There's the
support for alternative query syntax, and then there are the new
operators (AROUND and <m,n>). I propose to split the patch into two or
more parts, each addressing one of those bits.

I guess there will be two or three parts - first adding the syntax,
second adding <m,n> and third adding the AROUND(n). Seems reasonable?

2) I don't think we should mention Google in the docs explicitly. Not
that I'm somehow anti-google, but this syntax was certainly not invented
by Google - I vividly remember using something like that on Altavista
(yeah, I'm old). And it's used by pretty much every other web search
engine out there ...

3) In the SGML docs, please use <literal></literal> instead of just
quoting the values. So it should be <literal>|</literal> instead of '|'
etc. Just like in the parts describing plainto_tsquery, for example.

4) Also, I recommend adding a brief explanation what the examples do.
Right now there's just a bunch of queryto_tsquery, and the reader is
expected to understand the output. I suggest adding a sentence or two,
explaining what's happening (just like for plainto_tsquery examples).

5) I'm not sure about negative values in the <n,m> operator. I don't
find it particularly confusing - once you understand that (a <n,m> b)
means "there are 'k' words between 'a' and 'b' (n <= k <= m)", then
negative values seem like a fairly straightforward extension.

But I guess the main question is "Is there really a demand for the new
<n,m> operator, or have we just invented if because we can?"

6) There seem to be some new constants defined, with names not following
the naming convention. I mean this

#define WAITOPERAND 1
#define WAITOPERATOR 2
#define WAITFIRSTOPERAND 3
#define WAITSINGLEOPERAND 4
#define INSIDEQUOTES 5 <-- the new one

and

#define TSPO_L_ONLY 0x01
#define TSPO_R_ONLY 0x02
#define TSPO_BOTH 0x04
#define TS_NOT_EXAC 0x08 <-- the new one

Perhaps that's OK, but it seems a bit inconsistent.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Victor Drobny <v(dot)drobny(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] new function for tsquery creartion
Date: 2017-11-28 06:56:10
Message-ID: 15517a04afd2be779e2e6d69611c348b@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-11-19 04:30, Tomas Vondra wrote:
Hello,
> Hi,
>
> On 09/13/2017 10:57 AM, Victor Drobny wrote:
>> On 2017-09-09 06:03, Thomas Munro wrote:
>>> Please send a rebased version of the patch for people to review and
>>> test as that one has bit-rotted.
>> Hello,
>> Thank you for interest. In the attachment you can find rebased
>> version(based on 69835bc8988812c960f4ed5aeee86b62ac73602a commit)
>>
>
> I did a quick review of the patch today. The patch unfortunately no
> longer applies, so I had to use an older commit from September. Please
> rebase to current master.

Thank you for your time. In the attachment you can find rebased version.
(based on e842791b0 commit)

> I've only looked on the diff at this point, will do more testing once
> the rebase happens.
>
> Some comments:
>
> 1) This seems to mix multiple improvements into one patch. There's the
> support for alternative query syntax, and then there are the new
> operators (AROUND and <m,n>). I propose to split the patch into two or
> more parts, each addressing one of those bits.

I agree. I have split it in 3 parts: support for around operator,
queryto_tsquery function and documentation.

> I guess there will be two or three parts - first adding the syntax,
> second adding <m,n> and third adding the AROUND(n). Seems reasonable?
>
> 2) I don't think we should mention Google in the docs explicitly. Not
> that I'm somehow anti-google, but this syntax was certainly not
> invented
> by Google - I vividly remember using something like that on Altavista
> (yeah, I'm old). And it's used by pretty much every other web search
> engine out there ...

Yes, those syntax is not introduced by google, but, as for me, it was
the
easiest way to give a brief description of it. Of cause it can be
changed,
I just don't know how. Any suggestions are welcomed! ;)

> 3) In the SGML docs, please use <literal></literal> instead of just
> quoting the values. So it should be <literal>|</literal> instead of '|'
> etc. Just like in the parts describing plainto_tsquery, for example.

Fixed. I hope that i didn't miss anything.

> 4) Also, I recommend adding a brief explanation what the examples do.
> Right now there's just a bunch of queryto_tsquery, and the reader is
> expected to understand the output. I suggest adding a sentence or two,
> explaining what's happening (just like for plainto_tsquery examples).
>
> 5) I'm not sure about negative values in the <n,m> operator. I don't
> find it particularly confusing - once you understand that (a <n,m> b)
> means "there are 'k' words between 'a' and 'b' (n <= k <= m)", then
> negative values seem like a fairly straightforward extension.
>
> But I guess the main question is "Is there really a demand for the new
> <n,m> operator, or have we just invented if because we can?"

The operator <n,m> is not introduced yet. It's just a concept. It were
our
thoughts about implementation AROUND operator through <n,m> in future.

> 6) There seem to be some new constants defined, with names not
> following
> the naming convention. I mean this
>
> #define WAITOPERAND 1
> #define WAITOPERATOR 2
> #define WAITFIRSTOPERAND 3
> #define WAITSINGLEOPERAND 4
> #define INSIDEQUOTES 5 <-- the new one
>
> and
>
> #define TSPO_L_ONLY 0x01
> #define TSPO_R_ONLY 0x02
> #define TSPO_BOTH 0x04
> #define TS_NOT_EXAC 0x08 <-- the new one
>
> Perhaps that's OK, but it seems a bit inconsistent.

I agree. I have fixed it.

>
> regards

--
Victor Drobny
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
queryto_tsquery3_01_around.patch text/x-diff 12.0 KB
queryto_tsquery3_02_queryto_tsquery.patch text/x-diff 18.5 KB
queryto_tsquery3_03_documentation.patch text/x-diff 4.9 KB

From: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Victor Drobny <v(dot)drobny(at)postgrespro(dot)ru>
Subject: Re: new function for tsquery creartion
Date: 2017-11-28 14:39:16
Message-ID: 20171128143916.1377.5847.pgcf@coridan.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: tested, passed

Hi Victor,

I like the idea and I think it's a great patch. However in current shape it
requires some amount of reworking to meet PostgreSQL standards of code quality.

Particularly:

1. Many new procedures don't have a comment with a brief description. Ideally
every procedure should have not only a brief description but also a description
of every argument, return value and changes of global state if applicable.

2. I believe you could implement the has_prefix procedure just as a wrapper of
strstr().

3. I suggest to use snprintf instead of sprintf in a new code whenever
possible, especially if you are using %s - just to be on a safe side.

4. I noticed that your code affects the catalog. Did you check that your
changes will not cause problems during the migration from the older version of
PostgreSQL to the never one?

5. Tests for queryto_tsquery use only ASCII strings. I suggest to add a few
test that use non-ASCII characters as well, and a few corner cases like empty
string, string that contains only the stop-words, etc.

6. `make check-world` doesn't pass:

```
***************
*** 1672,1678 ****
(1 row)

set enable_seqscan = on;
-
--test queryto_tsquery function
select queryto_tsquery('My brand new smartphone');
queryto_tsquery
--- 1672,1677 ----
***************
*** 1784,1786 ****
--- 1783,1786 ----
---------------------------
'fat-rat' & 'fat' & 'rat'
(1 row)
+
```


From: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Victor Drobny <v(dot)drobny(at)postgrespro(dot)ru>
Subject: Re: new function for tsquery creartion
Date: 2017-11-28 14:57:00
Message-ID: 20171128145659.GA32487@e733.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Victor,

> I like the idea and I think it's a great patch. However in current shape it
> requires some amount of reworking to meet PostgreSQL standards of code quality.

Also I would like to add that I agree with Thomas Munro:

> Calling this search syntax just "query" seems too general and
> overloaded. "Simple search", "simple query", "web search", "web
> syntax", "web query", "Google-style query", "Poogle" (kidding!) ...
> well I'm not sure, but I feel like it deserves a proper name.
> websearch_to_tsquery()?

websearch_to_tsquery() sounds much better than query_to_tsquery().

Also I agree Tomas Vondra in regard that:

> 2) I don't think we should mention Google in the docs explicitly. Not
> that I'm somehow anti-google, but this syntax was certainly not invented
> by Google - I vividly remember using something like that on Altavista
> (yeah, I'm old). And it's used by pretty much every other web search
> engine out there ...

I suggest to rephrase:

```
+ about its input. <function>queryto_tsquery</function> provides a
+ different, Google like syntax to create tsquery.
```

.. to something more like "provides a different syntax, similar to one
used in web search engines, to create tsqeury". And maybe give a few
examples right in the next sentence.

--
Best regards,
Aleksander Alekseev


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Victor Drobny <v(dot)drobny(at)postgrespro(dot)ru>
Subject: Re: new function for tsquery creartion
Date: 2017-11-29 04:41:52
Message-ID: CAB7nPqT9ixgrD-FXOwEcSAzeQVP5GPb+gkhxjMd1LuFfq+JYkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 28, 2017 at 11:57 PM, Aleksander Alekseev
<a(dot)alekseev(at)postgrespro(dot)ru> wrote:
>> I like the idea and I think it's a great patch. However in current shape it
>> requires some amount of reworking to meet PostgreSQL standards of code quality.
>
> Also I would like to add that I agree with Thomas Munro:
>
>> Calling this search syntax just "query" seems too general and
>> overloaded. "Simple search", "simple query", "web search", "web
>> syntax", "web query", "Google-style query", "Poogle" (kidding!) ...
>> well I'm not sure, but I feel like it deserves a proper name.
>> websearch_to_tsquery()?
>
> websearch_to_tsquery() sounds much better than query_to_tsquery().
>
> Also I agree Tomas Vondra in regard that:
>
>> 2) I don't think we should mention Google in the docs explicitly. Not
>> that I'm somehow anti-google, but this syntax was certainly not invented
>> by Google - I vividly remember using something like that on Altavista
>> (yeah, I'm old). And it's used by pretty much every other web search
>> engine out there ...
>
> I suggest to rephrase:
>
> ```
> + about its input. <function>queryto_tsquery</function> provides a
> + different, Google like syntax to create tsquery.
> ```
>
> .. to something more like "provides a different syntax, similar to one
> used in web search engines, to create tsqeury". And maybe give a few
> examples right in the next sentence.

The patch got a review less than 1 day ago, so I am moving it to next
CF with the same status, waiting on author.
--
Michael


From: Victor Drobny <v(dot)drobny(at)postgrespro(dot)ru>
To: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: new function for tsquery creartion
Date: 2017-11-29 14:56:30
Message-ID: 157ea1194ec9c416df45bfddab10cacd@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-11-28 17:57, Aleksander Alekseev wrote:
Hi Aleksander,

Thank you for review. I have tried to fix all of your comments.
However i want to mention that the absence of comments for functions
in to_tsany.c is justified by the absence of comments for other
similar functions.
> Hi Victor,
>
>> I like the idea and I think it's a great patch. However in current
>> shape it
>> requires some amount of reworking to meet PostgreSQL standards of code
>> quality.
>
> Also I would like to add that I agree with Thomas Munro:
>
>> Calling this search syntax just "query" seems too general and
>> overloaded. "Simple search", "simple query", "web search", "web
>> syntax", "web query", "Google-style query", "Poogle" (kidding!) ...
>> well I'm not sure, but I feel like it deserves a proper name.
>> websearch_to_tsquery()?
>
> websearch_to_tsquery() sounds much better than query_to_tsquery().
>
> Also I agree Tomas Vondra in regard that:
>
>> 2) I don't think we should mention Google in the docs explicitly. Not
>> that I'm somehow anti-google, but this syntax was certainly not
>> invented
>> by Google - I vividly remember using something like that on Altavista
>> (yeah, I'm old). And it's used by pretty much every other web search
>> engine out there ...
>
> I suggest to rephrase:
>
> ```
> + about its input. <function>queryto_tsquery</function> provides a
> + different, Google like syntax to create tsquery.
> ```
>
> .. to something more like "provides a different syntax, similar to one
> used in web search engines, to create tsqeury". And maybe give a few
> examples right in the next sentence.
Best,

--
Victor Drobny
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Victor Drobny <v(dot)drobny(at)postgrespro(dot)ru>
To: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: new function for tsquery creartion
Date: 2017-11-29 14:57:33
Message-ID: b098b9373fdf256116e53ec5dc449f67@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-11-29 17:56, Victor Drobny wrote:

Sorry, forgot to attach new version of the patch.
> On 2017-11-28 17:57, Aleksander Alekseev wrote:
> Hi Aleksander,
>
> Thank you for review. I have tried to fix all of your comments.
> However i want to mention that the absence of comments for functions
> in to_tsany.c is justified by the absence of comments for other
> similar functions.
>> Hi Victor,
>>
>>> I like the idea and I think it's a great patch. However in current
>>> shape it
>>> requires some amount of reworking to meet PostgreSQL standards of
>>> code quality.
>>
>> Also I would like to add that I agree with Thomas Munro:
>>
>>> Calling this search syntax just "query" seems too general and
>>> overloaded. "Simple search", "simple query", "web search", "web
>>> syntax", "web query", "Google-style query", "Poogle" (kidding!) ...
>>> well I'm not sure, but I feel like it deserves a proper name.
>>> websearch_to_tsquery()?
>>
>> websearch_to_tsquery() sounds much better than query_to_tsquery().
>>
>> Also I agree Tomas Vondra in regard that:
>>
>>> 2) I don't think we should mention Google in the docs explicitly. Not
>>> that I'm somehow anti-google, but this syntax was certainly not
>>> invented
>>> by Google - I vividly remember using something like that on Altavista
>>> (yeah, I'm old). And it's used by pretty much every other web search
>>> engine out there ...
>>
>> I suggest to rephrase:
>>
>> ```
>> + about its input. <function>queryto_tsquery</function> provides a
>> + different, Google like syntax to create tsquery.
>> ```
>>
>> .. to something more like "provides a different syntax, similar to one
>> used in web search engines, to create tsqeury". And maybe give a few
>> examples right in the next sentence.
> Best,

--
Victor Drobny
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
queryto_tsquery4_01_around.patch text/x-diff 12.1 KB
queryto_tsquery4_02_queryto_tsquery.patch text/x-diff 18.9 KB
queryto_tsquery4_03_documentation.patch text/x-diff 5.0 KB

From: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Victor Drobny <v(dot)drobny(at)postgrespro(dot)ru>
Subject: Re: new function for tsquery creartion
Date: 2018-01-15 13:00:55
Message-ID: 151602125554.25640.2584561162347369841.pgcf@coridan.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: tested, passed

Here are a few minor issues:

```
+/*
+ * Checks if 'str' starts with a 'prefix'
+ */
+static bool
+has_prefix(char * str, char * prefix)
+{
+ if (strlen(prefix) > strlen(str))
+ {
+ return false;
+ }
+ return strstr(str, prefix) == str;
+}
```

strlen() check is redundant.

```
+ case OP_AROUND:
+ snprintf(in->cur, 256, " AROUND(%d) %s", distance, nrm.buf);
+ break;
```

Instead of the constant 256 it's better to use sizeof().

Apart from these issues this patch looks not bad.

The new status of this patch is: Ready for Committer


From: Andres Freund <andres(at)anarazel(dot)de>
To: Victor Drobny <v(dot)drobny(at)postgrespro(dot)ru>
Cc: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: new function for tsquery creartion
Date: 2018-03-02 03:06:05
Message-ID: 20180302030605.5ur6ktkb2vyb46pb@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-11-29 17:56:30 +0300, Victor Drobny wrote:
> Thank you for review. I have tried to fix all of your comments.
> However i want to mention that the absence of comments for functions
> in to_tsany.c is justified by the absence of comments for other
> similar functions.

That's not justification. Tsquery related code is notorious for being
badly commented, we do not want to continue that.

Greetings,

Andres Freund


From: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Victor Drobny <v(dot)drobny(at)postgrespro(dot)ru>
Subject: Re: new function for tsquery creartion
Date: 2018-03-05 12:52:19
Message-ID: 152025433949.1548.16466871234054596674.pgcf@coridan.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

It seems that this patch doesn't apply anymore, see http://commitfest.cputube.org/

The new status of this patch is: Waiting on Author


From: David Steele <david(at)pgmasters(dot)net>
To: Victor Drobny <v(dot)drobny(at)postgrespro(dot)ru>
Cc: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Re: new function for tsquery creartion
Date: 2018-03-21 14:24:54
Message-ID: 34280367-0dab-7f73-8d7c-d21c7276814b@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Victor,

On 3/5/18 7:52 AM, Aleksander Alekseev wrote:
> It seems that this patch doesn't apply anymore, see http://commitfest.cputube.org/
>
> The new status of this patch is: Waiting on Author

This patch needs a rebase and should address the comments from
Aleksander and Andres. We are now three weeks into the CF with no new
patch.

Are you planning to provide a new patch? If not, I think it should be
marked as Returned with Feedback and submitted to the next CF once it
has been updated.

Regards,
--
-David
david(at)pgmasters(dot)net


From: Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: new function for tsquery creartion
Date: 2018-03-22 13:53:15
Message-ID: 8df9de83119eb295f3424c573d64ff84@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi David,

I'd like to take over from Victor. I'll post a revised version of the
patch in a couple of days.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Alexey Chernyshov <a(dot)chernyshov(at)postgrespro(dot)ru>, Victor Drobny <v(dot)drobny(at)postgrespro(dot)ru>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] new function for tsquery creartion
Date: 2018-03-22 14:38:04
Message-ID: e1839299-f926-7714-cac2-878f0334ac84@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> I am extending phrase operator <n> is such way that it will have <n,m>
> syntax that means from n to m words, so I will use such syntax (<n,m>)
> further. I found that a AROUND(N) b is exactly the same as a <-N,N> b
> and it can be replaced while parsing. So, what do you think of such
> idea? In this patch I have noticed some unobvious behavior.

I think new operator should be a subject for separate patch. And I prefer idea
about range phrase operator.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru>
To: Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>
Cc: David Steele <david(at)pgmasters(dot)net>, Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: new function for tsquery creartion
Date: 2018-03-26 06:24:05
Message-ID: 20180326092405.71820cf2@asp437-24-g082ur
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 22 Mar 2018 16:53:15 +0300
Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru> wrote:

> Hi David,
>
> I'd like to take over from Victor. I'll post a revised version of the
> patch in a couple of days.
>

Hi Dmitry,

Recently I worked with the old version of the patch and found a bug.
So, I think it is better to notify you immediately, so you can fix it in
rebased/revised version.

I noticed, that operator AROUND(N) works only
in case of non-negative operands. If any of the operands is negative, it
behaves as phrase operator <N>. It is caused by lack of TS_NOT_EXAC
flag and AROUND(N) operator check in function TS_phrase_execute in
branches for negated operands.

--
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


From: Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>
To: Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru>
Cc: David Steele <david(at)pgmasters(dot)net>, Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: new function for tsquery creartion
Date: 2018-03-26 08:51:42
Message-ID: a1686708fc7d43d9a2de19c1697ac57e@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Recently I worked with the old version of the patch and found a bug.
> So, I think it is better to notify you immediately, so you can fix it
> in
> rebased/revised version.
>
> I noticed, that operator AROUND(N) works only
> in case of non-negative operands. If any of the operands is negative,
> it
> behaves as phrase operator <N>. It is caused by lack of TS_NOT_EXAC
> flag and AROUND(N) operator check in function TS_phrase_execute in
> branches for negated operands.

Good to know, thanks! To be honest, I' sure that Theodor is right: it's
better to implement AROUND(N) operator using <N, M> when it's committed.
The following version of patch won't support AROUND(N). I have to fix a
few more questionable things, though.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>
Cc: Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: new function for tsquery creartion
Date: 2018-03-26 11:32:05
Message-ID: CAEepm=3dQmh7NCC_ap-L4yMiVRcZuPXtJUv5_mOnYdoqsnUKvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 26, 2018 at 9:51 PM, Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru> wrote:
>> Recently I worked with the old version of the patch and found a bug.
>> So, I think it is better to notify you immediately, so you can fix it in
>> rebased/revised version.
>>
>> I noticed, that operator AROUND(N) works only
>> in case of non-negative operands. If any of the operands is negative, it
>> behaves as phrase operator <N>. It is caused by lack of TS_NOT_EXAC
>> flag and AROUND(N) operator check in function TS_phrase_execute in
>> branches for negated operands.
>
> Good to know, thanks! To be honest, I' sure that Theodor is right: it's
> better to implement AROUND(N) operator using <N, M> when it's committed. The
> following version of patch won't support AROUND(N). I have to fix a few more
> questionable things, though.

Hi,

I took a quick look at the language in the last version of the patches.

Patch 01:

+ errmsg("Invalid AROUND(X) operator!")));

s/I/i/;s/!//

+ errmsg("Missing ')' in AROUND(X) operator")));

s/M/m/

Patch 03 (the documentation) needed some proof-reading. I've attached
a new version of that patch with some small suggested improvements.

Questions I had while reading the documentation without looking at the
code: Is there anything to_tsquery() can do that
websearch_to_tsquery() can't? The documentation doesn't mention
parentheses, but I can see that they are in fact supported from the
regression test. Would it be OK to use user-supplied websearch
strings? Ie can it produce a syntax error? Well clearly it can, see
above -- perhaps that should be explicitly documented? Is there any
way to write OR as a term (that's a valuable non-stopword in French)?

It seems like AROUND(x) should be documented also more generally for
tsquery, but I see there is some discussion about how that should
look.

By the way, not this patch's fault, but I noticed that commit
f5f1355dc4d did this:

- (errmsg("query contains only
stopword(s) or doesn't contain lexeme(s), ignored")));
+ (errmsg("text-search query contains
only stop words or doesn't contain lexemes, ignored")));

But the old test still appears in an example in doc/src/sgml/textsearch.sgml.

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
docs.patch application/octet-stream 5.3 KB

From: Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: new function for tsquery creartion
Date: 2018-03-26 16:48:19
Message-ID: 6754a224654a4f0339195b83b133d551@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Patch 03 (the documentation) needed some proof-reading. I've attached
> a new version of that patch with some small suggested improvements.

Thanks, I'm definitely going to use this.

> Is there anything to_tsquery() can do that websearch_to_tsquery()
> can't?

Currently, no.

> Would it be OK to use user-supplied websearch strings?
> Ie can it produce a syntax error?

I believe that's the most important question. After a private discussion
with Theodor I came to a conclusion that the most beneficial outcome
would be to suppress all syntax errors and give user some result,
cutting all misused operators along the way. This requires some changes,
though.

> Is there any way to write OR as a term (that's a valuable non-stopword
> in French)?

You could quote it like this: websearch_to_tsquery('"or"');

Moreover, it's still possible to use & and |.

> It seems like AROUND(x) should be documented also more generally for
> tsquery, but I see there is some discussion about how that should
> look.

Personally, I like <N, M> operator better. It would instantly deprecate
AROUND(N), which is why I'm going to drop it.

> By the way, not this patch's fault, but I noticed that commit
> f5f1355dc4d did this:
>
> - (errmsg("query contains only
> stopword(s) or doesn't contain lexeme(s), ignored")));
> + (errmsg("text-search query contains
> only stop words or doesn't contain lexemes, ignored")));
>
> But the old test still appears in an example in
> doc/src/sgml/textsearch.sgml.

Will fix this.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: new function for tsquery creartion
Date: 2018-03-27 22:29:48
Message-ID: 9153fb21831d0ba7c832960f37df2688@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi everyone,

I'd like to share some intermediate results. Here's what has changed:

1. OR operator is now case-insensitive. Moreover, trailing whitespace is
no longer used to identify it:

select websearch_to_tsquery('simple', 'abc or');
websearch_to_tsquery
----------------------
'abc' & 'or'
(1 row)

select websearch_to_tsquery('simple', 'abc or(def)');
websearch_to_tsquery
----------------------
'abc' | 'def'
(1 row)

select websearch_to_tsquery('simple', 'abc or!def');
websearch_to_tsquery
----------------------
'abc' | 'def'
(1 row)

2. AROUND(N) has been dropped. I hope that <N, M> operator will allow us
to implement it with a few lines of code.

3. websearch_to_tsquery() now tolerates various syntax errors, for
instance:

Misused operators:

'abc &'
'| abc'
'<- def'

Missing parentheses:

'abc & (def <-> (cat or rat'

Other sorts of nonsense:

'abc &--|| def' => 'abc' & !!'def'
'abc:def' => 'abc':D & 'ef'

This, however, doesn't mean that the result will always be adequate (who
would have thought?). Overall, current implementation follows the GIGO
principle. In theory, this would allow us to use user-supplied websearch
strings (but see gotchas), even if they don't make much sense. Better
then nothing, right?

4. A small refactoring: I've replaced all WAIT* macros with a enum for
better debugging (names look much nicer in GDB). Hope this is
acceptable.

5. Finally, I've added a few more comments and tests. I haven't checked
the code coverage, though.

A few gotchas:

I haven't touched gettoken_tsvector() yet. As a result, the following
queries produce errors:

select websearch_to_tsquery('simple', '''');
ERROR: syntax error in tsquery: "'"

select websearch_to_tsquery('simple', '\');
ERROR: there is no escaped character: "\"

Maybe there's more. The question is: should we fix those, or it's fine
as it is? I don't have a strong opinion about this.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
websearch_to_tsquery_v1.diff text/x-diff 28.1 KB

From: Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: new function for tsquery creartion
Date: 2018-03-27 22:46:31
Message-ID: 9e6f3d4f5e93e72807b1b27c75b98c51@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> select websearch_to_tsquery('simple', 'abc or!def');
> websearch_to_tsquery
> ----------------------
> 'abc' | 'def'
> (1 row)

This is wrong ofc, I've attached the fixed version.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
websearch_to_tsquery_v2.diff text/x-diff 28.0 KB

From: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>
To: Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: new function for tsquery creartion
Date: 2018-03-28 09:21:15
Message-ID: 20180328092115.GA32581@e733.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello Dmitry,

> A few gotchas:
>
> I haven't touched gettoken_tsvector() yet. As a result, the following
> queries produce errors:
>
> select websearch_to_tsquery('simple', '''');
> ERROR: syntax error in tsquery: "'"
>
> select websearch_to_tsquery('simple', '\');
> ERROR: there is no escaped character: "\"
>
> Maybe there's more. The question is: should we fix those, or it's fine as it
> is? I don't have a strong opinion about this.

It doesn't sound right to me to accept any input as a general rule but
sometimes return errors nevertheless. That API would be complicated for
the users. Thus I suggest to accept any garbage and try our best to
interpret it.

--
Best regards,
Aleksander Alekseev


From: Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru>
To: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>
Cc: Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: new function for tsquery creartion
Date: 2018-04-01 12:51:03
Message-ID: 8829c490d0e17b6033b3fd622dfeed1e@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello hackers,

On 2018-03-28 12:21, Aleksander Alekseev wrote:
> It doesn't sound right to me to accept any input as a general rule but
> sometimes return errors nevertheless. That API would be complicated for
> the users. Thus I suggest to accept any garbage and try our best to
> interpret it.

I agree with Aleksander about silencing all errors in
websearch_to_tsquery().

In the attachment is a revised patch with the attempt to introduce an
ability to ignore syntax errors in gettoken_tsvector().
I'm also read through the patch and all the code looks good to me except
one thing.
The name of enum ts_parsestate looks more like a name of the function
than a name of a type.
In my version, it renamed to QueryParserState, but you can fix it if I'm
wrong.

--
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment Content-Type Size
websearch_to_tsquery_v3.diff text/x-diff 35.2 KB

From: Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>
To: Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru>
Cc: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: new function for tsquery creartion
Date: 2018-04-01 20:00:43
Message-ID: cbd6ea6e79b8247765a6c493943a5799@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Aleksandr,

> I agree with Aleksander about silencing all errors in
> websearch_to_tsquery().
>
> In the attachment is a revised patch with the attempt to introduce an
> ability to ignore syntax errors in gettoken_tsvector().

Thanks for the further improvements! Yes, you're both right, the API has
to be consistent. Unfortunately, I had to make some adjustments
according to Oleg Bartunov's review. Here's a change log:

1. &, | and (), <-> are no longer considered operators in web search
mode.
2. I've stumbled upon a bug: web search used to transform "pg_class"
into 'pg <-> class', which is no longer the case.
3. I changed the behavior of gettoken_tsvector() as soon as I had heard
from Aleksander Alekseev, so I decided to use my implementation in this
revision of the patch. This is a good subject for discussion, though.
Feel free to share your opinion.
4. As suggested by Theodor, I've replaced some bool args with bit flags.

> The name of enum ts_parsestate looks more like a name of the function
> than a name of a type.
> In my version, it renamed to QueryParserState, but you can fix it if
> I'm wrong.

True, but gettoken_query() returns ts_tokentype, so I decided to use
this naming scheme.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
websearch_to_tsquery_v4.diff text/x-diff 30.1 KB

From: Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>
To: Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru>
Cc: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: new function for tsquery creartion
Date: 2018-04-02 16:41:51
Message-ID: b8355fbb948314e0dd4dd2594debc1bf@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've fixed a bug and added some tests and documentation.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
websearch_to_tsquery_v5.diff text/x-diff 39.4 KB

From: Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>
To: Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru>
Cc: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: new function for tsquery creartion
Date: 2018-04-03 10:47:14
Message-ID: 1f33416d7173df5504e84e63680bd27e@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi everyone,

The code in its current state looks messy and way too complicated;
there're lots of interleaving code branches. Thus, I decided to split
gettoken_query() into three independent tokenizers for phrase, web and
original (to_tsquery()) syntaxes. Documentation is included. Any
feedback is very welcome.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
websearch_to_tsquery_v6.diff text/x-diff 41.8 KB

From: Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>
To: Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru>
Cc: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: new function for tsquery creartion
Date: 2018-04-03 11:28:37
Message-ID: 94e41ae5e36b640d3172729c4c9b5b4c@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> The code in its current state looks messy and way too complicated;
> there're lots of interleaving code branches. Thus, I decided to split
> gettoken_query() into three independent tokenizers for phrase, web and
> original (to_tsquery()) syntaxes. Documentation is included. Any
> feedback is very welcome.

I'm sorry, I totally forgot to fix a few more things, the patch is
attached below.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
websearch_to_tsquery_v7.diff text/x-diff 41.7 KB

From: Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru>
To: Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>
Cc: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: new function for tsquery creartion
Date: 2018-04-03 14:13:20
Message-ID: 20180403171320.400cd24a@asp437-24-g082ur
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 03 Apr 2018 14:28:37 +0300
Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru> wrote:
> I'm sorry, I totally forgot to fix a few more things, the patch is
> attached below.

The patch looks good to me except two things.

I'm not sure about the different result for these queries:
SELECT websearch_to_tsquery('simple', 'cat or ');
websearch_to_tsquery
----------------------
'cat'
(1 row)
SELECT websearch_to_tsquery('simple', 'cat or');
websearch_to_tsquery
----------------------
'cat' & 'or'
(1 row)

But I don't have strong opinion about these queries, since input in
both of them looks broken in terms of operator usage.

I found an odd behavior of the query creation function in case:
SELECT websearch_to_tsquery('english', '"pg_class pg"');
websearch_to_tsquery
-----------------------------
( 'pg' & 'class' ) <-> 'pg'
(1 row)

This query means that lexemes 'pg' and 'class' should be at the same
distance from the last 'pg'. In other words, they should have the same
position. But default parser will interpret pg_class as two separate
words during text parsing/vector creation.

The bug wasn't introduced in the patch and can be found in current
master. During the discussion of the patch with Dmitry, he noticed that
to_tsquery() function shares same odd behavior:
select to_tsquery('english', ' pg_class <-> pg');
to_tsquery
-----------------------------
( 'pg' & 'class' ) <-> 'pg'
(1 row)

This oddity caused by they implementation of makepol. In makepol, each
token (parsed by query parser) is sent to FTS parser and in case of
further separation of the token, it uses operator selected in functions
to_tsquery and friends. So it doesn't change over the runtime.

I see two different ways to solve it:
1) Use the same operator inside the parenthesizes. This will mean to
parse it as few parts of one word.
2) Remove parenthesizes. This will mean to parse it as few separate
words.

I prefer the second way since in some languages words can be separated
by some special symbol or not separated by any symbols at all and
should be extracted by special FTS parser. It also allows us to parse
such words as one by using the special parser (as it done for hyphenated
word).

But in the example with websearch_to_tsquery, I think it should use
the same operator for quoted part of the query. For example, we can
update the operator in makepol before sending it to pushval
(pushval_morph) to do so.

It looks like there should be two separated patches, one for
websearch_to_tsquery and another one for fixing odd behavior of the
query construction. But since the first one may depend on the
bugfix, to solve case with quotes, I will mark it as Waiting on
Author.

--
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


From: Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>
To: Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru>
Cc: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: new function for tsquery creartion
Date: 2018-04-04 14:33:47
Message-ID: a269c5f0e426174dad29c474d3669e92@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I'm not sure about the different result for these queries:
> SELECT websearch_to_tsquery('simple', 'cat or ');
> websearch_to_tsquery
> ----------------------
> 'cat'
> (1 row)
> SELECT websearch_to_tsquery('simple', 'cat or');
> websearch_to_tsquery
> ----------------------
> 'cat' & 'or'
> (1 row)

I guess both queries should produce just 'cat'. I've changed the
definition of parse_or_operator().

> I found an odd behavior of the query creation function in case:
> SELECT websearch_to_tsquery('english', '"pg_class pg"');
> websearch_to_tsquery
> -----------------------------
> ( 'pg' & 'class' ) <-> 'pg'
> (1 row)
>
> This query means that lexemes 'pg' and 'class' should be at the same
> distance from the last 'pg'. In other words, they should have the same
> position. But default parser will interpret pg_class as two separate
> words during text parsing/vector creation.
>
> The bug wasn't introduced in the patch and can be found in current
> master. During the discussion of the patch with Dmitry, he noticed that
> to_tsquery() function shares same odd behavior:
> select to_tsquery('english', ' pg_class <-> pg');
> to_tsquery
> -----------------------------
> ( 'pg' & 'class' ) <-> 'pg'
> (1 row)

I've been thinking about this for a while, and it seems that this should
be fixed somewhere near parsetext(). Perhaps 'pg' and 'class' should
share the same position. After all, somebody could implement a parser
which would split some words using an arbitrary set of rules, for
instance "split all words containing digits". I propose merging this
patch provided that there are no objections.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
websearch_to_tsquery_v8.patch text/x-diff 43.3 KB

From: Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru>
To: Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>
Cc: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: new function for tsquery creartion
Date: 2018-04-04 14:49:08
Message-ID: e011ca174d982f0d3ec604850ecdab75@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2018-04-04 17:33, Dmitry Ivanov wrote:
> I've been thinking about this for a while, and it seems that this
> should be fixed somewhere near parsetext(). Perhaps 'pg' and 'class'
> should share the same position. After all, somebody could implement a
> parser which would split some words using an arbitrary set of rules,
> for instance "split all words containing digits". I propose merging
> this patch provided that there are no objections.

I'm agree that this problem should be solved in separate patch and
that this feature can be merged without waiting for the fix.

--
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>, Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru>
Cc: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: new function for tsquery creartion
Date: 2018-04-05 16:56:07
Message-ID: aed90baf-b36b-faba-68da-e61599a2db4c@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks to everyone, pushed with some editorization:

1) translate russian test to prevent potential problems with encoding
2) fix inconsistency 'or cat' and 'cat or', second example doesn't treat OR as
lexeme, but first one does.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Where is Where <whisere(at)gmail(dot)com>
To: Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: new function for tsquery creartion
Date: 2019-07-21 23:28:50
Message-ID: CAK3r-hMCsD5cHG++Fk9j7=EGEqyDEbfSeswrBnf2v4FHwGB8iA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello everyone, I am wondering if
AROUND(N) or <N, M> is still possible? I found this thread below and the
original post
https://www.postgresql.org/message-id/fe931111ff7e9ad79196486ada79e268%40postgrespro.ru
mentioned the proposed feature: 'New operator AROUND(N). It matches if the
distance between words(or maybe phrases) is less than or equal to N.'

currently in tsquery_phrase(query1 tsquery, query2 tsquery, distance
integer) the distaince is searching a fixed distance, is there way to
search maximum distance so the search returns query1 followed by query2 up
to a certain distance? like the AROUND(N) or <N, M> mentioned in the thread?

Thank you!

On Mon, Jul 22, 2019 at 9:13 AM Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>
wrote:

> Hi everyone,
>
> I'd like to share some intermediate results. Here's what has changed:
>
>
> 1. OR operator is now case-insensitive. Moreover, trailing whitespace is
> no longer used to identify it:
>
> select websearch_to_tsquery('simple', 'abc or');
> websearch_to_tsquery
> ----------------------
> 'abc' & 'or'
> (1 row)
>
> select websearch_to_tsquery('simple', 'abc or(def)');
> websearch_to_tsquery
> ----------------------
> 'abc' | 'def'
> (1 row)
>
> select websearch_to_tsquery('simple', 'abc or!def');
> websearch_to_tsquery
> ----------------------
> 'abc' | 'def'
> (1 row)
>
>
> 2. AROUND(N) has been dropped. I hope that <N, M> operator will allow us
> to implement it with a few lines of code.
>
> 3. websearch_to_tsquery() now tolerates various syntax errors, for
> instance:
>
> Misused operators:
>
> 'abc &'
> '| abc'
> '<- def'
>
> Missing parentheses:
>
> 'abc & (def <-> (cat or rat'
>
> Other sorts of nonsense:
>
> 'abc &--|| def' => 'abc' & !!'def'
> 'abc:def' => 'abc':D & 'ef'
>
> This, however, doesn't mean that the result will always be adequate (who
> would have thought?). Overall, current implementation follows the GIGO
> principle. In theory, this would allow us to use user-supplied websearch
> strings (but see gotchas), even if they don't make much sense. Better
> then nothing, right?
>
> 4. A small refactoring: I've replaced all WAIT* macros with a enum for
> better debugging (names look much nicer in GDB). Hope this is
> acceptable.
>
> 5. Finally, I've added a few more comments and tests. I haven't checked
> the code coverage, though.
>
>
> A few gotchas:
>
> I haven't touched gettoken_tsvector() yet. As a result, the following
> queries produce errors:
>
> select websearch_to_tsquery('simple', '''');
> ERROR: syntax error in tsquery: "'"
>
> select websearch_to_tsquery('simple', '\');
> ERROR: there is no escaped character: "\"
>
> Maybe there's more. The question is: should we fix those, or it's fine
> as it is? I don't have a strong opinion about this.
>
> --
> Dmitry Ivanov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company


From: Wh isere <whisere(at)gmail(dot)com>
To: Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: query1 followed by query2 at maximum distance vs current fixed distance
Date: 2019-07-23 06:55:44
Message-ID: CAK3r-hOJbZqbQhu5ZQHkhY7j-23Mh9qz_6FDpkq_aWOJ=91DUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Is this possible with the current websearch_to_tsquery function?

Thanks.

Hello everyone, I am wondering if
> AROUND(N) or <N, M> is still possible? I found this thread below and the
> original post
> https://www.postgresql.org/message-id/fe931111ff7e9ad79196486ada79e268%40postgrespro.ru
> mentioned the proposed feature: 'New operator AROUND(N). It matches if the
> distance between words(or maybe phrases) is less than or equal to N.'
>
> currently in tsquery_phrase(query1 tsquery, query2 tsquery, distance
> integer) the distaince is searching a fixed distance, is there way to
> search maximum distance so the search returns query1 followed by query2 up
> to a certain distance? like the AROUND(N) or <N, M> mentioned in the
> thread?
>
> Thank you!
>
>
>
> On Mon, Jul 22, 2019 at 9:13 AM Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>
> wrote:
>
>> Hi everyone,
>>
>> I'd like to share some intermediate results. Here's what has changed:
>>
>>
>> 1. OR operator is now case-insensitive. Moreover, trailing whitespace is
>> no longer used to identify it:
>>
>> select websearch_to_tsquery('simple', 'abc or');
>> websearch_to_tsquery
>> ----------------------
>> 'abc' & 'or'
>> (1 row)
>>
>> select websearch_to_tsquery('simple', 'abc or(def)');
>> websearch_to_tsquery
>> ----------------------
>> 'abc' | 'def'
>> (1 row)
>>
>> select websearch_to_tsquery('simple', 'abc or!def');
>> websearch_to_tsquery
>> ----------------------
>> 'abc' | 'def'
>> (1 row)
>>
>>
>> 2. AROUND(N) has been dropped. I hope that <N, M> operator will allow us
>> to implement it with a few lines of code.
>>
>> 3. websearch_to_tsquery() now tolerates various syntax errors, for
>> instance:
>>
>> Misused operators:
>>
>> 'abc &'
>> '| abc'
>> '<- def'
>>
>> Missing parentheses:
>>
>> 'abc & (def <-> (cat or rat'
>>
>> Other sorts of nonsense:
>>
>> 'abc &--|| def' => 'abc' & !!'def'
>> 'abc:def' => 'abc':D & 'ef'
>>
>> This, however, doesn't mean that the result will always be adequate (who
>> would have thought?). Overall, current implementation follows the GIGO
>> principle. In theory, this would allow us to use user-supplied websearch
>> strings (but see gotchas), even if they don't make much sense. Better
>> then nothing, right?
>>
>> 4. A small refactoring: I've replaced all WAIT* macros with a enum for
>> better debugging (names look much nicer in GDB). Hope this is
>> acceptable.
>>
>> 5. Finally, I've added a few more comments and tests. I haven't checked
>> the code coverage, though.
>>
>>
>> A few gotchas:
>>
>> I haven't touched gettoken_tsvector() yet. As a result, the following
>> queries produce errors:
>>
>> select websearch_to_tsquery('simple', '''');
>> ERROR: syntax error in tsquery: "'"
>>
>> select websearch_to_tsquery('simple', '\');
>> ERROR: there is no escaped character: "\"
>>
>> Maybe there's more. The question is: should we fix those, or it's fine
>> as it is? I don't have a strong opinion about this.
>>
>> --
>> Dmitry Ivanov
>> Postgres Professional: http://www.postgrespro.com
>> The Russian Postgres Company
>
>


From: Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
To: Wh isere <whisere(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: query1 followed by query2 at maximum distance vs current fixed distance
Date: 2019-07-29 14:01:15
Message-ID: d5c950d7-3282-40e0-6464-f9f339856cff@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

On 23.07.2019 09:55, Wh isere wrote:
> Is this possible with the current websearch_to_tsquery function?
>
> Thanks.
>
> Hello everyone, I am wondering if
> AROUND(N) or <N, M> is still possible? I found this thread below and
> the original post
> https://www.postgresql.org/message-id/fe931111ff7e9ad79196486ada79e268%40postgrespro.ru
> mentioned the proposed feature: 'New operator AROUND(N). It matches
> if the distance between words(or maybe phrases) is less than or
> equal to N.'
>
> currently in tsquery_phrase(query1 tsquery, query2 tsquery, distance
> integer) the distaince is searching a fixed distance, is there way to
> search maximum distance so the search returns query1 followed by
> query2 up
> to a certain distance? like the AROUND(N) or <N, M> mentioned in the
> thread?
As far as I know AROUND(N) and <N, M> weren't committed, unfortunately.
And so you can search only using a fixed distance currently.

websearch_to_tsquery() can't help here. It just transforms search
pattern with OR, AND statements into tsquery syntax.

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


From: Wh isere <whisere(at)gmail(dot)com>
To: Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: query1 followed by query2 at maximum distance vs current fixed distance
Date: 2019-07-30 10:22:50
Message-ID: CAK3r-hNOs0OG4QMap3rVoOV=qK2NkEWnYJ7hcA2VicinDMn0JQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks Arthur! I guess there is not other solution? I tried to create a
function to loop through all the distance but its very slow.

On Tuesday, July 30, 2019, Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru> wrote:

> Hello,
>
> On 23.07.2019 09:55, Wh isere wrote:
>
>> Is this possible with the current websearch_to_tsquery function?
>>
>> Thanks.
>>
>> Hello everyone, I am wondering if
>> AROUND(N) or <N, M> is still possible? I found this thread below and
>> the original post
>> https://www.postgresql.org/message-id/fe931111ff7e9ad7919648
>> 6ada79e268%40postgrespro.ru
>> mentioned the proposed feature: 'New operator AROUND(N). It matches
>> if the distance between words(or maybe phrases) is less than or
>> equal to N.'
>>
>> currently in tsquery_phrase(query1 tsquery, query2 tsquery, distance
>> integer) the distaince is searching a fixed distance, is there way to
>> search maximum distance so the search returns query1 followed by
>> query2 up
>> to a certain distance? like the AROUND(N) or <N, M> mentioned in the
>> thread?
>>
> As far as I know AROUND(N) and <N, M> weren't committed, unfortunately.
> And so you can search only using a fixed distance currently.
>
> websearch_to_tsquery() can't help here. It just transforms search pattern
> with OR, AND statements into tsquery syntax.
>
> --
> Arthur Zakirov
> Postgres Professional: http://www.postgrespro.com
> Russian Postgres Company
>