Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 1dffabe

Browse files
committed
Further fix pg_trgm's extraction of trigrams from regular expressions.
Commit 9e43e87 turns out to have been insufficient: not only is it necessary to track tentative parent links while considering a set of arc removals, but it's necessary to track tentative flag additions as well. This is because we always merge arc target states into arc source states; therefore, when considering a merge of the final state with some other, it is the other state that will acquire a new TSTATE_FIN bit. If there's another arc for the same color trigram that would cause merging of that state with the initial state, we failed to recognize the problem. The test cases for the prior commit evidently only exercised situations where a tentative merge with the initial state occurs before one with the final state. If it goes the other way around, we'll happily merge the initial and final states, either producing a broken final graph that would never match anything, or triggering the Assert added by the prior commit. It's tempting to consider switching the merge direction when the merge involves the final state, but I lack the time to analyze that idea in detail. Instead just keep track of the flag changes that would result from proposed merges, in the same way that the prior commit tracked proposed parent links. Along the way, add some more debugging support, because I'm not entirely confident that this is the last bug here. And tweak matters so that the transformed.dot file uses small integers rather than pointer values to identify states; that makes it more readable if you're just eyeballing it rather than fooling with Graphviz. And rename a couple of identically named struct fields to reduce confusion. Per report from Corey Csuhta. Add a test case based on his example. (Note: this case does not trigger the bug under 9.3, apparently because its different measurement of costs causes it to stop merging states before it hits the failure. I spent some time trying to find a variant that would fail in 9.3, without success; but I'm sure such cases exist.) Like the previous patch, back-patch to 9.3 where this code was added. Report: https://postgr.es/m/E2B01A4B-4530-406B-8D17-2F67CF9A16BA@csuhta.com
1 parent 139eb96 commit 1dffabe

File tree

3 files changed

+136
-42
lines changed

3 files changed

+136
-42
lines changed

contrib/pg_trgm/expected/pg_trgm.out

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3497,6 +3497,7 @@ create table test2(t text COLLATE "C");
34973497
insert into test2 values ('abcdef');
34983498
insert into test2 values ('quark');
34993499
insert into test2 values (' z foo bar');
3500+
insert into test2 values ('/123/-45/');
35003501
create index test2_idx_gin on test2 using gin (t gin_trgm_ops);
35013502
set enable_seqscan=off;
35023503
explain (costs off)
@@ -3598,7 +3599,8 @@ select * from test2 where t ~ '(abc)*$';
35983599
abcdef
35993600
quark
36003601
z foo bar
3601-
(3 rows)
3602+
/123/-45/
3603+
(4 rows)
36023604

36033605
select * from test2 where t ~* 'DEF';
36043606
t
@@ -3690,6 +3692,12 @@ select * from test2 where t ~ 'qua(?!foo)';
36903692
quark
36913693
(1 row)
36923694

3695+
select * from test2 where t ~ '/\d+/-\d';
3696+
t
3697+
-----------
3698+
/123/-45/
3699+
(1 row)
3700+
36933701
drop index test2_idx_gin;
36943702
create index test2_idx_gist on test2 using gist (t gist_trgm_ops);
36953703
set enable_seqscan=off;
@@ -3784,7 +3792,8 @@ select * from test2 where t ~ '(abc)*$';
37843792
abcdef
37853793
quark
37863794
z foo bar
3787-
(3 rows)
3795+
/123/-45/
3796+
(4 rows)
37883797

37893798
select * from test2 where t ~* 'DEF';
37903799
t
@@ -3876,6 +3885,12 @@ select * from test2 where t ~ 'qua(?!foo)';
38763885
quark
38773886
(1 row)
38783887

3888+
select * from test2 where t ~ '/\d+/-\d';
3889+
t
3890+
-----------
3891+
/123/-45/
3892+
(1 row)
3893+
38793894
-- Check similarity threshold (bug #14202)
38803895
CREATE TEMP TABLE restaurants (city text);
38813896
INSERT INTO restaurants SELECT 'Warsaw' FROM generate_series(1, 10000);

contrib/pg_trgm/sql/pg_trgm.sql

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ create table test2(t text COLLATE "C");
5252
insert into test2 values ('abcdef');
5353
insert into test2 values ('quark');
5454
insert into test2 values (' z foo bar');
55+
insert into test2 values ('/123/-45/');
5556
create index test2_idx_gin on test2 using gin (t gin_trgm_ops);
5657
set enable_seqscan=off;
5758
explain (costs off)
@@ -87,6 +88,7 @@ select * from test2 where t ~ ' z foo bar';
8788
select * from test2 where t ~ ' z foo bar';
8889
select * from test2 where t ~ ' z foo';
8990
select * from test2 where t ~ 'qua(?!foo)';
91+
select * from test2 where t ~ '/\d+/-\d';
9092
drop index test2_idx_gin;
9193

9294
create index test2_idx_gist on test2 using gist (t gist_trgm_ops);
@@ -124,6 +126,7 @@ select * from test2 where t ~ ' z foo bar';
124126
select * from test2 where t ~ ' z foo bar';
125127
select * from test2 where t ~ ' z foo';
126128
select * from test2 where t ~ 'qua(?!foo)';
129+
select * from test2 where t ~ '/\d+/-\d';
127130

128131
-- Check similarity threshold (bug #14202)
129132

0 commit comments

Comments
 (0)