Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 7fa93ee

Browse files
committed
Fix FK-based join selectivity estimation for semi/antijoins.
This case wasn't thought through sufficiently in commit 100340e. It's true that the FK proves that every outer row has a match in the inner table, but we forgot that some of the inner rows might be filtered away by WHERE conditions located within the semijoin's RHS. If the RHS is just one table, we can reasonably take the semijoin selectivity as equal to the fraction of the referenced table's rows that are expected to survive its restriction clauses. If the RHS is a join, it's not clear how much of the referenced table might get through the join, so fall back to the same rule we were already using for other outer-join cases: use the minimum of the regular per-clause selectivity estimates. This gives the same result as if we hadn't considered the FK at all when there's a single FK column, but it should still help for multi-column FKs, which is the case that 100340e is really meant to help with. Back-patch to 9.6 where the previous commit came in. Discussion: https://postgr.es/m/16149.1481835103@sss.pgh.pa.us
1 parent b645a05 commit 7fa93ee

File tree

1 file changed

+53
-19
lines changed

1 file changed

+53
-19
lines changed

src/backend/optimizer/path/costsize.c

+53-19
Original file line numberDiff line numberDiff line change
@@ -4085,6 +4085,7 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
40854085
{
40864086
ForeignKeyOptInfo *fkinfo = (ForeignKeyOptInfo *) lfirst(lc);
40874087
bool ref_is_outer;
4088+
bool use_smallest_selectivity = false;
40884089
List *removedlist;
40894090
ListCell *cell;
40904091
ListCell *prev;
@@ -4205,9 +4206,9 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
42054206
* be double-counting the null fraction, and (2) it's not very clear
42064207
* how to combine null fractions for multiple referencing columns.
42074208
*
4208-
* In the first branch of the logic below, null derating is done
4209-
* implicitly by relying on clause_selectivity(); in the other two
4210-
* paths, we do nothing for now about correcting for nulls.
4209+
* In the use_smallest_selectivity code below, null derating is done
4210+
* implicitly by relying on clause_selectivity(); in the other cases,
4211+
* we do nothing for now about correcting for nulls.
42114212
*
42124213
* XXX another point here is that if either side of an FK constraint
42134214
* is an inheritance parent, we estimate as though the constraint
@@ -4230,28 +4231,41 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
42304231
* the smallest per-column selectivity, instead. (This should
42314232
* correspond to the FK column with the most nulls.)
42324233
*/
4233-
Selectivity thisfksel = 1.0;
4234-
4235-
foreach(cell, removedlist)
4236-
{
4237-
RestrictInfo *rinfo = (RestrictInfo *) lfirst(cell);
4238-
Selectivity csel;
4239-
4240-
csel = clause_selectivity(root, (Node *) rinfo,
4241-
0, jointype, sjinfo);
4242-
thisfksel = Min(thisfksel, csel);
4243-
}
4244-
fkselec *= thisfksel;
4234+
use_smallest_selectivity = true;
42454235
}
42464236
else if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
42474237
{
42484238
/*
42494239
* For JOIN_SEMI and JOIN_ANTI, the selectivity is defined as the
4250-
* fraction of LHS rows that have matches. If the referenced
4251-
* table is on the inner side, that means the selectivity is 1.0
4252-
* (modulo nulls, which we're ignoring for now). We already
4253-
* covered the other case, so no work here.
4240+
* fraction of LHS rows that have matches. The referenced table
4241+
* is on the inner side (we already handled the other case above),
4242+
* so the FK implies that every LHS row has a match *in the
4243+
* referenced table*. But any restriction or join clauses below
4244+
* here will reduce the number of matches.
42544245
*/
4246+
if (bms_membership(inner_relids) == BMS_SINGLETON)
4247+
{
4248+
/*
4249+
* When the inner side of the semi/anti join is just the
4250+
* referenced table, we may take the FK selectivity as equal
4251+
* to the selectivity of the table's restriction clauses.
4252+
*/
4253+
RelOptInfo *ref_rel = find_base_rel(root, fkinfo->ref_relid);
4254+
double ref_tuples = Max(ref_rel->tuples, 1.0);
4255+
4256+
fkselec *= ref_rel->rows / ref_tuples;
4257+
}
4258+
else
4259+
{
4260+
/*
4261+
* When the inner side of the semi/anti join is itself a join,
4262+
* it's hard to guess what fraction of the referenced table
4263+
* will get through the join. But we still don't want to
4264+
* multiply per-column estimates together. Take the smallest
4265+
* per-column selectivity, instead.
4266+
*/
4267+
use_smallest_selectivity = true;
4268+
}
42554269
}
42564270
else
42574271
{
@@ -4265,6 +4279,26 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
42654279

42664280
fkselec *= 1.0 / ref_tuples;
42674281
}
4282+
4283+
/*
4284+
* Common code for cases where we should use the smallest selectivity
4285+
* that would be computed for any one of the FK's clauses.
4286+
*/
4287+
if (use_smallest_selectivity)
4288+
{
4289+
Selectivity thisfksel = 1.0;
4290+
4291+
foreach(cell, removedlist)
4292+
{
4293+
RestrictInfo *rinfo = (RestrictInfo *) lfirst(cell);
4294+
Selectivity csel;
4295+
4296+
csel = clause_selectivity(root, (Node *) rinfo,
4297+
0, jointype, sjinfo);
4298+
thisfksel = Min(thisfksel, csel);
4299+
}
4300+
fkselec *= thisfksel;
4301+
}
42684302
}
42694303

42704304
*restrictlist = worklist;

0 commit comments

Comments
 (0)