Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit b9fc2d0

Browse files
committed
Improve planner's handling of set-returning functions in grouping columns.
Improve query_is_distinct_for() to accept SRFs in the targetlist when we can prove distinctness from a DISTINCT clause. In that case the de-duplication will surely happen after SRF expansion, so the proof still works. Continue to punt in the case where we'd try to prove distinctness from GROUP BY (or, in the future, source relations). To do that, we'd have to determine whether the SRFs were in the grouping columns or elsewhere in the tlist, and it still doesn't seem worth the trouble. But this trivial change allows us to recognize that "SELECT DISTINCT unnest(foo) FROM ..." produces unique-ified output, which seems worth having. Also, fix estimate_num_groups() to consider the possibility of SRFs in the grouping columns. Its failure to do so was masked before v10 because grouping_planner() scaled up plan rowcount estimates by the estimated SRF multiplier after performing grouping. That doesn't happen anymore, which is more correct, but it means we need an adjustment in the estimate for the number of groups. Failure to do this leads to an underestimate for the number of output rows of subqueries like "SELECT DISTINCT unnest(foo)" compared to what 9.6 and earlier estimated, thus breaking plan choices in some cases. Per report from Dmitry Shalashov. Back-patch to v10 to avoid degraded plan choices compared to previous releases. Discussion: https://postgr.es/m/CAKPeCUGAeHgoh5O=SvcQxREVkoX7UdeJUMj1F5=aBNvoTa+O8w@mail.gmail.com
1 parent a1187c4 commit b9fc2d0

File tree

2 files changed

+41
-14
lines changed

2 files changed

+41
-14
lines changed

src/backend/optimizer/plan/analyzejoins.c

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -744,8 +744,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
744744
bool
745745
query_supports_distinctness(Query *query)
746746
{
747-
/* we don't cope with SRFs, see comment below */
748-
if (query->hasTargetSRFs)
747+
/* SRFs break distinctness except with DISTINCT, see below */
748+
if (query->hasTargetSRFs && query->distinctClause == NIL)
749749
return false;
750750

751751
/* check for features we can prove distinctness with */
@@ -786,21 +786,11 @@ query_is_distinct_for(Query *query, List *colnos, List *opids)
786786

787787
Assert(list_length(colnos) == list_length(opids));
788788

789-
/*
790-
* A set-returning function in the query's targetlist can result in
791-
* returning duplicate rows, if the SRF is evaluated after the
792-
* de-duplication step; so we play it safe and say "no" if there are any
793-
* SRFs. (We could be certain that it's okay if SRFs appear only in the
794-
* specified columns, since those must be evaluated before de-duplication;
795-
* but it doesn't presently seem worth the complication to check that.)
796-
*/
797-
if (query->hasTargetSRFs)
798-
return false;
799-
800789
/*
801790
* DISTINCT (including DISTINCT ON) guarantees uniqueness if all the
802791
* columns in the DISTINCT clause appear in colnos and operator semantics
803-
* match.
792+
* match. This is true even if there are SRFs in the DISTINCT columns or
793+
* elsewhere in the tlist.
804794
*/
805795
if (query->distinctClause)
806796
{
@@ -819,6 +809,16 @@ query_is_distinct_for(Query *query, List *colnos, List *opids)
819809
return true;
820810
}
821811

812+
/*
813+
* Otherwise, a set-returning function in the query's targetlist can
814+
* result in returning duplicate rows, despite any grouping that might
815+
* occur before tlist evaluation. (If all tlist SRFs are within GROUP BY
816+
* columns, it would be safe because they'd be expanded before grouping.
817+
* But it doesn't currently seem worth the effort to check for that.)
818+
*/
819+
if (query->hasTargetSRFs)
820+
return false;
821+
822822
/*
823823
* Similarly, GROUP BY without GROUPING SETS guarantees uniqueness if all
824824
* the grouped columns appear in colnos and operator semantics match.

src/backend/utils/adt/selfuncs.c

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3270,6 +3270,7 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
32703270
List **pgset)
32713271
{
32723272
List *varinfos = NIL;
3273+
double srf_multiplier = 1.0;
32733274
double numdistinct;
32743275
ListCell *l;
32753276
int i;
@@ -3303,6 +3304,7 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
33033304
foreach(l, groupExprs)
33043305
{
33053306
Node *groupexpr = (Node *) lfirst(l);
3307+
double this_srf_multiplier;
33063308
VariableStatData vardata;
33073309
List *varshere;
33083310
ListCell *l2;
@@ -3311,6 +3313,21 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
33113313
if (pgset && !list_member_int(*pgset, i++))
33123314
continue;
33133315

3316+
/*
3317+
* Set-returning functions in grouping columns are a bit problematic.
3318+
* The code below will effectively ignore their SRF nature and come up
3319+
* with a numdistinct estimate as though they were scalar functions.
3320+
* We compensate by scaling up the end result by the largest SRF
3321+
* rowcount estimate. (This will be an overestimate if the SRF
3322+
* produces multiple copies of any output value, but it seems best to
3323+
* assume the SRF's outputs are distinct. In any case, it's probably
3324+
* pointless to worry too much about this without much better
3325+
* estimates for SRF output rowcounts than we have today.)
3326+
*/
3327+
this_srf_multiplier = expression_returns_set_rows(groupexpr);
3328+
if (srf_multiplier < this_srf_multiplier)
3329+
srf_multiplier = this_srf_multiplier;
3330+
33143331
/* Short-circuit for expressions returning boolean */
33153332
if (exprType(groupexpr) == BOOLOID)
33163333
{
@@ -3376,9 +3393,15 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
33763393
*/
33773394
if (varinfos == NIL)
33783395
{
3396+
/* Apply SRF multiplier as we would do in the long path */
3397+
numdistinct *= srf_multiplier;
3398+
/* Round off */
3399+
numdistinct = ceil(numdistinct);
33793400
/* Guard against out-of-range answers */
33803401
if (numdistinct > input_rows)
33813402
numdistinct = input_rows;
3403+
if (numdistinct < 1.0)
3404+
numdistinct = 1.0;
33823405
return numdistinct;
33833406
}
33843407

@@ -3547,6 +3570,10 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
35473570
varinfos = newvarinfos;
35483571
} while (varinfos != NIL);
35493572

3573+
/* Now we can account for the effects of any SRFs */
3574+
numdistinct *= srf_multiplier;
3575+
3576+
/* Round off */
35503577
numdistinct = ceil(numdistinct);
35513578

35523579
/* Guard against out-of-range answers */

0 commit comments

Comments
 (0)