Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit d512794

Browse files
committed
Further fix pg_trgm's extraction of trigrams from regular expressions.
Commit 9e43e87 turns out to have been insufficient: not only is it necessary to track tentative parent links while considering a set of arc removals, but it's necessary to track tentative flag additions as well. This is because we always merge arc target states into arc source states; therefore, when considering a merge of the final state with some other, it is the other state that will acquire a new TSTATE_FIN bit. If there's another arc for the same color trigram that would cause merging of that state with the initial state, we failed to recognize the problem. The test cases for the prior commit evidently only exercised situations where a tentative merge with the initial state occurs before one with the final state. If it goes the other way around, we'll happily merge the initial and final states, either producing a broken final graph that would never match anything, or triggering the Assert added by the prior commit. It's tempting to consider switching the merge direction when the merge involves the final state, but I lack the time to analyze that idea in detail. Instead just keep track of the flag changes that would result from proposed merges, in the same way that the prior commit tracked proposed parent links. Along the way, add some more debugging support, because I'm not entirely confident that this is the last bug here. And tweak matters so that the transformed.dot file uses small integers rather than pointer values to identify states; that makes it more readable if you're just eyeballing it rather than fooling with Graphviz. And rename a couple of identically named struct fields to reduce confusion. Per report from Corey Csuhta. Add a test case based on his example. (Note: this case does not trigger the bug under 9.3, apparently because its different measurement of costs causes it to stop merging states before it hits the failure. I spent some time trying to find a variant that would fail in 9.3, without success; but I'm sure such cases exist.) Like the previous patch, back-patch to 9.3 where this code was added. Report: https://postgr.es/m/E2B01A4B-4530-406B-8D17-2F67CF9A16BA@csuhta.com
1 parent a70b18b commit d512794

File tree

3 files changed

+136
-42
lines changed

3 files changed

+136
-42
lines changed

contrib/pg_trgm/expected/pg_trgm.out

+17-2
Original file line numberDiff line numberDiff line change
@@ -3489,6 +3489,7 @@ create table test2(t text COLLATE "C");
34893489
insert into test2 values ('abcdef');
34903490
insert into test2 values ('quark');
34913491
insert into test2 values (' z foo bar');
3492+
insert into test2 values ('/123/-45/');
34923493
create index test2_idx_gin on test2 using gin (t gin_trgm_ops);
34933494
set enable_seqscan=off;
34943495
explain (costs off)
@@ -3590,7 +3591,8 @@ select * from test2 where t ~ '(abc)*$';
35903591
abcdef
35913592
quark
35923593
z foo bar
3593-
(3 rows)
3594+
/123/-45/
3595+
(4 rows)
35943596

35953597
select * from test2 where t ~* 'DEF';
35963598
t
@@ -3682,6 +3684,12 @@ select * from test2 where t ~ 'qua(?!foo)';
36823684
quark
36833685
(1 row)
36843686

3687+
select * from test2 where t ~ '/\d+/-\d';
3688+
t
3689+
-----------
3690+
/123/-45/
3691+
(1 row)
3692+
36853693
drop index test2_idx_gin;
36863694
create index test2_idx_gist on test2 using gist (t gist_trgm_ops);
36873695
set enable_seqscan=off;
@@ -3776,7 +3784,8 @@ select * from test2 where t ~ '(abc)*$';
37763784
abcdef
37773785
quark
37783786
z foo bar
3779-
(3 rows)
3787+
/123/-45/
3788+
(4 rows)
37803789

37813790
select * from test2 where t ~* 'DEF';
37823791
t
@@ -3868,6 +3877,12 @@ select * from test2 where t ~ 'qua(?!foo)';
38683877
quark
38693878
(1 row)
38703879

3880+
select * from test2 where t ~ '/\d+/-\d';
3881+
t
3882+
-----------
3883+
/123/-45/
3884+
(1 row)
3885+
38713886
-- Check similarity threshold (bug #14202)
38723887
CREATE TEMP TABLE restaurants (city text);
38733888
INSERT INTO restaurants SELECT 'Warsaw' FROM generate_series(1, 10000);

contrib/pg_trgm/sql/pg_trgm.sql

+3
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ create table test2(t text COLLATE "C");
4747
insert into test2 values ('abcdef');
4848
insert into test2 values ('quark');
4949
insert into test2 values (' z foo bar');
50+
insert into test2 values ('/123/-45/');
5051
create index test2_idx_gin on test2 using gin (t gin_trgm_ops);
5152
set enable_seqscan=off;
5253
explain (costs off)
@@ -82,6 +83,7 @@ select * from test2 where t ~ ' z foo bar';
8283
select * from test2 where t ~ ' z foo bar';
8384
select * from test2 where t ~ ' z foo';
8485
select * from test2 where t ~ 'qua(?!foo)';
86+
select * from test2 where t ~ '/\d+/-\d';
8587
drop index test2_idx_gin;
8688

8789
create index test2_idx_gist on test2 using gist (t gist_trgm_ops);
@@ -119,6 +121,7 @@ select * from test2 where t ~ ' z foo bar';
119121
select * from test2 where t ~ ' z foo bar';
120122
select * from test2 where t ~ ' z foo';
121123
select * from test2 where t ~ 'qua(?!foo)';
124+
select * from test2 where t ~ '/\d+/-\d';
122125

123126
-- Check similarity threshold (bug #14202)
124127

0 commit comments

Comments
 (0)