Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 1f18442

Browse files
committed
Avoid regressions in foreign-key-based selectivity estimates.
David Rowley found that the "use the smallest per-column selectivity" heuristic applied in some cases by get_foreign_key_join_selectivity() was badly off if the FK columns are independent, producing estimates much worse than we got before that code was added in 9.6. One case where that heuristic was used was for LEFT and FULL outer joins with the referenced rel on the outside of the join. But we should not really need to special-case those here. eqjoinsel() never has had such a special case; the correction is applied by calc_joinrel_size_estimate() instead. Let's just estimate such cases like inner joins and rely on that later adjustment. (I think there was something of a thinko here, in that the comments seem to be thinking about the selectivity as defined for semi/anti joins; but that shouldn't apply to left/full joins.) Add a regression test exercising such a case to show that this is sane in at least some cases. The other case where we used that heuristic was for SEMI/ANTI outer joins, either if the referenced rel was on the outside, or if it was on the inside but was part of a join within the RHS. In either case, the FK doesn't give us a lot of traction towards estimating the selectivity. To ensure that we don't have regressions from what happened before 9.6, let's punt by ignoring the FK in such cases and applying the traditional selectivity calculation. (We might be able to improve on that later, but for now I just want to be sure it's not worse than 9.5.) Report and patch by David Rowley, simplified a bit by me. Back-patch to 9.6 where this code was added. Discussion: https://postgr.es/m/CAKJS1f8NO8oCDcxrteohG6O72uU1saEVT9qX=R8pENr5QWerXw@mail.gmail.com
1 parent 3ef40dc commit 1f18442

File tree

3 files changed

+93
-67
lines changed

3 files changed

+93
-67
lines changed

src/backend/optimizer/path/costsize.c

+30-67
Original file line numberDiff line numberDiff line change
@@ -4095,7 +4095,6 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
40954095
{
40964096
ForeignKeyOptInfo *fkinfo = (ForeignKeyOptInfo *) lfirst(lc);
40974097
bool ref_is_outer;
4098-
bool use_smallest_selectivity = false;
40994098
List *removedlist;
41004099
ListCell *cell;
41014100
ListCell *prev;
@@ -4114,6 +4113,22 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
41144113
else
41154114
continue;
41164115

4116+
/*
4117+
* If we're dealing with a semi/anti join, and the FK's referenced
4118+
* relation is on the outside, then knowledge of the FK doesn't help
4119+
* us figure out what we need to know (which is the fraction of outer
4120+
* rows that have matches). On the other hand, if the referenced rel
4121+
* is on the inside, then all outer rows must have matches in the
4122+
* referenced table (ignoring nulls). But any restriction or join
4123+
* clauses that filter that table will reduce the fraction of matches.
4124+
* We can account for restriction clauses, but it's too hard to guess
4125+
* how many table rows would get through a join that's inside the RHS.
4126+
* Hence, if either case applies, punt and ignore the FK.
4127+
*/
4128+
if ((jointype == JOIN_SEMI || jointype == JOIN_ANTI) &&
4129+
(ref_is_outer || bms_membership(inner_relids) != BMS_SINGLETON))
4130+
continue;
4131+
41174132
/*
41184133
* Modify the restrictlist by removing clauses that match the FK (and
41194134
* putting them into removedlist instead). It seems unsafe to modify
@@ -4214,10 +4229,7 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
42144229
* However (1) if there are any strict restriction clauses for the
42154230
* referencing column(s) elsewhere in the query, derating here would
42164231
* be double-counting the null fraction, and (2) it's not very clear
4217-
* how to combine null fractions for multiple referencing columns.
4218-
*
4219-
* In the use_smallest_selectivity code below, null derating is done
4220-
* implicitly by relying on clause_selectivity(); in the other cases,
4232+
* how to combine null fractions for multiple referencing columns. So
42214233
* we do nothing for now about correcting for nulls.
42224234
*
42234235
* XXX another point here is that if either side of an FK constraint
@@ -4230,52 +4242,23 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
42304242
* work, it is uncommon in practice to have an FK referencing a parent
42314243
* table. So, at least for now, disregard inheritance here.
42324244
*/
4233-
if (ref_is_outer && jointype != JOIN_INNER)
4245+
if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
42344246
{
42354247
/*
4236-
* When the referenced table is on the outer side of a non-inner
4237-
* join, knowing that each inner row has exactly one match is not
4238-
* as useful as one could wish, since we really need to know the
4239-
* fraction of outer rows with a match. Still, we can avoid the
4240-
* folly of multiplying the per-column estimates together. Take
4241-
* the smallest per-column selectivity, instead. (This should
4242-
* correspond to the FK column with the most nulls.)
4248+
* For JOIN_SEMI and JOIN_ANTI, we only get here when the FK's
4249+
* referenced table is exactly the inside of the join. The join
4250+
* selectivity is defined as the fraction of LHS rows that have
4251+
* matches. The FK implies that every LHS row has a match *in the
4252+
* referenced table*; but any restriction clauses on it will
4253+
* reduce the number of matches. Hence we take the join
4254+
* selectivity as equal to the selectivity of the table's
4255+
* restriction clauses, which is rows / tuples; but we must guard
4256+
* against tuples == 0.
42434257
*/
4244-
use_smallest_selectivity = true;
4245-
}
4246-
else if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
4247-
{
4248-
/*
4249-
* For JOIN_SEMI and JOIN_ANTI, the selectivity is defined as the
4250-
* fraction of LHS rows that have matches. The referenced table
4251-
* is on the inner side (we already handled the other case above),
4252-
* so the FK implies that every LHS row has a match *in the
4253-
* referenced table*. But any restriction or join clauses below
4254-
* here will reduce the number of matches.
4255-
*/
4256-
if (bms_membership(inner_relids) == BMS_SINGLETON)
4257-
{
4258-
/*
4259-
* When the inner side of the semi/anti join is just the
4260-
* referenced table, we may take the FK selectivity as equal
4261-
* to the selectivity of the table's restriction clauses.
4262-
*/
4263-
RelOptInfo *ref_rel = find_base_rel(root, fkinfo->ref_relid);
4264-
double ref_tuples = Max(ref_rel->tuples, 1.0);
4258+
RelOptInfo *ref_rel = find_base_rel(root, fkinfo->ref_relid);
4259+
double ref_tuples = Max(ref_rel->tuples, 1.0);
42654260

4266-
fkselec *= ref_rel->rows / ref_tuples;
4267-
}
4268-
else
4269-
{
4270-
/*
4271-
* When the inner side of the semi/anti join is itself a join,
4272-
* it's hard to guess what fraction of the referenced table
4273-
* will get through the join. But we still don't want to
4274-
* multiply per-column estimates together. Take the smallest
4275-
* per-column selectivity, instead.
4276-
*/
4277-
use_smallest_selectivity = true;
4278-
}
4261+
fkselec *= ref_rel->rows / ref_tuples;
42794262
}
42804263
else
42814264
{
@@ -4289,26 +4272,6 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
42894272

42904273
fkselec *= 1.0 / ref_tuples;
42914274
}
4292-
4293-
/*
4294-
* Common code for cases where we should use the smallest selectivity
4295-
* that would be computed for any one of the FK's clauses.
4296-
*/
4297-
if (use_smallest_selectivity)
4298-
{
4299-
Selectivity thisfksel = 1.0;
4300-
4301-
foreach(cell, removedlist)
4302-
{
4303-
RestrictInfo *rinfo = (RestrictInfo *) lfirst(cell);
4304-
Selectivity csel;
4305-
4306-
csel = clause_selectivity(root, (Node *) rinfo,
4307-
0, jointype, sjinfo);
4308-
thisfksel = Min(thisfksel, csel);
4309-
}
4310-
fkselec *= thisfksel;
4311-
}
43124275
}
43134276

43144277
*restrictlist = worklist;

src/test/regress/expected/join.out

+35
Original file line numberDiff line numberDiff line change
@@ -5323,3 +5323,38 @@ ERROR: invalid reference to FROM-clause entry for table "xx1"
53235323
LINE 1: ...xx1 using lateral (select * from int4_tbl where f1 = x1) ss;
53245324
^
53255325
HINT: There is an entry for table "xx1", but it cannot be referenced from this part of the query.
5326+
--
5327+
-- test that foreign key join estimation performs sanely for outer joins
5328+
--
5329+
begin;
5330+
create table fkest (a int, b int, c int unique, primary key(a,b));
5331+
create table fkest1 (a int, b int, primary key(a,b));
5332+
insert into fkest select x/10, x%10, x from generate_series(1,1000) x;
5333+
insert into fkest1 select x/10, x%10 from generate_series(1,1000) x;
5334+
alter table fkest1
5335+
add constraint fkest1_a_b_fkey foreign key (a,b) references fkest;
5336+
analyze fkest;
5337+
analyze fkest1;
5338+
explain (costs off)
5339+
select *
5340+
from fkest f
5341+
left join fkest1 f1 on f.a = f1.a and f.b = f1.b
5342+
left join fkest1 f2 on f.a = f2.a and f.b = f2.b
5343+
left join fkest1 f3 on f.a = f3.a and f.b = f3.b
5344+
where f.c = 1;
5345+
QUERY PLAN
5346+
------------------------------------------------------------------
5347+
Nested Loop Left Join
5348+
-> Nested Loop Left Join
5349+
-> Nested Loop Left Join
5350+
-> Index Scan using fkest_c_key on fkest f
5351+
Index Cond: (c = 1)
5352+
-> Index Only Scan using fkest1_pkey on fkest1 f1
5353+
Index Cond: ((a = f.a) AND (b = f.b))
5354+
-> Index Only Scan using fkest1_pkey on fkest1 f2
5355+
Index Cond: ((a = f.a) AND (b = f.b))
5356+
-> Index Only Scan using fkest1_pkey on fkest1 f3
5357+
Index Cond: ((a = f.a) AND (b = f.b))
5358+
(11 rows)
5359+
5360+
rollback;

src/test/regress/sql/join.sql

+28
Original file line numberDiff line numberDiff line change
@@ -1730,3 +1730,31 @@ update xx1 set x2 = f1 from xx1, lateral (select * from int4_tbl where f1 = x1)
17301730
delete from xx1 using (select * from int4_tbl where f1 = x1) ss;
17311731
delete from xx1 using (select * from int4_tbl where f1 = xx1.x1) ss;
17321732
delete from xx1 using lateral (select * from int4_tbl where f1 = x1) ss;
1733+
1734+
--
1735+
-- test that foreign key join estimation performs sanely for outer joins
1736+
--
1737+
1738+
begin;
1739+
1740+
create table fkest (a int, b int, c int unique, primary key(a,b));
1741+
create table fkest1 (a int, b int, primary key(a,b));
1742+
1743+
insert into fkest select x/10, x%10, x from generate_series(1,1000) x;
1744+
insert into fkest1 select x/10, x%10 from generate_series(1,1000) x;
1745+
1746+
alter table fkest1
1747+
add constraint fkest1_a_b_fkey foreign key (a,b) references fkest;
1748+
1749+
analyze fkest;
1750+
analyze fkest1;
1751+
1752+
explain (costs off)
1753+
select *
1754+
from fkest f
1755+
left join fkest1 f1 on f.a = f1.a and f.b = f1.b
1756+
left join fkest1 f2 on f.a = f2.a and f.b = f2.b
1757+
left join fkest1 f3 on f.a = f3.a and f.b = f3.b
1758+
where f.c = 1;
1759+
1760+
rollback;

0 commit comments

Comments
 (0)