Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 430a595

Browse files
committed
Defer remove_useless_groupby_columns() work until query_planner()
Traditionally, remove_useless_groupby_columns() was called during grouping_planner() directly after the call to preprocess_groupclause(). While in many ways, it made sense to populate the field and remove the functionally dependent columns from processed_groupClause at the same time, it's just that doing so had the disadvantage that remove_useless_groupby_columns() was being called before the RelOptInfos were populated for the relations mentioned in the query. Not having RelOptInfos available meant we needed to manually query the catalog tables to get the required details about the primary key constraint for the table. Here we move the remove_useless_groupby_columns() call to query_planner() and put it directly after the RelOptInfos are populated. This is fine to do as processed_groupClause still isn't final at this point as it can still be modified inside standard_qp_callback() by make_pathkeys_for_sortclauses_extended(). This commit is just a refactor and simply moves remove_useless_groupby_columns() into initsplan.c. A planned follow-up commit will adjust that function so it uses RelOptInfo instead of doing catalog lookups and also teach it how to use unique indexes as proofs to expand the cases where we can remove functionally dependent columns from the GROUP BY. Reviewed-by: Andrei Lepikhov, jian he Discussion: https://postgr.es/m/CAApHDvqLezKwoEBBQd0dp4Y9MDkFBDbny0f3SzEeqOFoU7Z5+A@mail.gmail.com
1 parent 78c5e14 commit 430a595

File tree

4 files changed

+170
-165
lines changed

4 files changed

+170
-165
lines changed

src/backend/optimizer/plan/initsplan.c

Lines changed: 166 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
/*-------------------------------------------------------------------------
22
*
33
* initsplan.c
4-
* Target list, qualification, joininfo initialization routines
4+
* Target list, group by, qualification, joininfo initialization routines
55
*
66
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
77
* Portions Copyright (c) 1994, Regents of the University of California
@@ -14,6 +14,7 @@
1414
*/
1515
#include "postgres.h"
1616

17+
#include "catalog/pg_constraint.h"
1718
#include "catalog/pg_type.h"
1819
#include "nodes/makefuncs.h"
1920
#include "nodes/nodeFuncs.h"
@@ -386,6 +387,170 @@ add_vars_to_attr_needed(PlannerInfo *root, List *vars,
386387
}
387388
}
388389

390+
/*****************************************************************************
391+
*
392+
* GROUP BY
393+
*
394+
*****************************************************************************/
395+
396+
/*
397+
* remove_useless_groupby_columns
398+
* Remove any columns in the GROUP BY clause that are redundant due to
399+
* being functionally dependent on other GROUP BY columns.
400+
*
401+
* Since some other DBMSes do not allow references to ungrouped columns, it's
402+
* not unusual to find all columns listed in GROUP BY even though listing the
403+
* primary-key columns would be sufficient. Deleting such excess columns
404+
* avoids redundant sorting work, so it's worth doing.
405+
*
406+
* Relcache invalidations will ensure that cached plans become invalidated
407+
* when the underlying index of the pkey constraint is dropped.
408+
*
409+
* Currently, we only make use of pkey constraints for this, however, we may
410+
* wish to take this further in the future and also use unique constraints
411+
* which have NOT NULL columns. In that case, plan invalidation will still
412+
* work since relations will receive a relcache invalidation when a NOT NULL
413+
* constraint is dropped.
414+
*/
415+
void
416+
remove_useless_groupby_columns(PlannerInfo *root)
417+
{
418+
Query *parse = root->parse;
419+
Bitmapset **groupbyattnos;
420+
Bitmapset **surplusvars;
421+
ListCell *lc;
422+
int relid;
423+
424+
/* No chance to do anything if there are less than two GROUP BY items */
425+
if (list_length(root->processed_groupClause) < 2)
426+
return;
427+
428+
/* Don't fiddle with the GROUP BY clause if the query has grouping sets */
429+
if (parse->groupingSets)
430+
return;
431+
432+
/*
433+
* Scan the GROUP BY clause to find GROUP BY items that are simple Vars.
434+
* Fill groupbyattnos[k] with a bitmapset of the column attnos of RTE k
435+
* that are GROUP BY items.
436+
*/
437+
groupbyattnos = (Bitmapset **) palloc0(sizeof(Bitmapset *) *
438+
(list_length(parse->rtable) + 1));
439+
foreach(lc, root->processed_groupClause)
440+
{
441+
SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
442+
TargetEntry *tle = get_sortgroupclause_tle(sgc, parse->targetList);
443+
Var *var = (Var *) tle->expr;
444+
445+
/*
446+
* Ignore non-Vars and Vars from other query levels.
447+
*
448+
* XXX in principle, stable expressions containing Vars could also be
449+
* removed, if all the Vars are functionally dependent on other GROUP
450+
* BY items. But it's not clear that such cases occur often enough to
451+
* be worth troubling over.
452+
*/
453+
if (!IsA(var, Var) ||
454+
var->varlevelsup > 0)
455+
continue;
456+
457+
/* OK, remember we have this Var */
458+
relid = var->varno;
459+
Assert(relid <= list_length(parse->rtable));
460+
groupbyattnos[relid] = bms_add_member(groupbyattnos[relid],
461+
var->varattno - FirstLowInvalidHeapAttributeNumber);
462+
}
463+
464+
/*
465+
* Consider each relation and see if it is possible to remove some of its
466+
* Vars from GROUP BY. For simplicity and speed, we do the actual removal
467+
* in a separate pass. Here, we just fill surplusvars[k] with a bitmapset
468+
* of the column attnos of RTE k that are removable GROUP BY items.
469+
*/
470+
surplusvars = NULL; /* don't allocate array unless required */
471+
relid = 0;
472+
foreach(lc, parse->rtable)
473+
{
474+
RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
475+
Bitmapset *relattnos;
476+
Bitmapset *pkattnos;
477+
Oid constraintOid;
478+
479+
relid++;
480+
481+
/* Only plain relations could have primary-key constraints */
482+
if (rte->rtekind != RTE_RELATION)
483+
continue;
484+
485+
/*
486+
* We must skip inheritance parent tables as some of the child rels
487+
* may cause duplicate rows. This cannot happen with partitioned
488+
* tables, however.
489+
*/
490+
if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE)
491+
continue;
492+
493+
/* Nothing to do unless this rel has multiple Vars in GROUP BY */
494+
relattnos = groupbyattnos[relid];
495+
if (bms_membership(relattnos) != BMS_MULTIPLE)
496+
continue;
497+
498+
/*
499+
* Can't remove any columns for this rel if there is no suitable
500+
* (i.e., nondeferrable) primary key constraint.
501+
*/
502+
pkattnos = get_primary_key_attnos(rte->relid, false, &constraintOid);
503+
if (pkattnos == NULL)
504+
continue;
505+
506+
/*
507+
* If the primary key is a proper subset of relattnos then we have
508+
* some items in the GROUP BY that can be removed.
509+
*/
510+
if (bms_subset_compare(pkattnos, relattnos) == BMS_SUBSET1)
511+
{
512+
/*
513+
* To easily remember whether we've found anything to do, we don't
514+
* allocate the surplusvars[] array until we find something.
515+
*/
516+
if (surplusvars == NULL)
517+
surplusvars = (Bitmapset **) palloc0(sizeof(Bitmapset *) *
518+
(list_length(parse->rtable) + 1));
519+
520+
/* Remember the attnos of the removable columns */
521+
surplusvars[relid] = bms_difference(relattnos, pkattnos);
522+
}
523+
}
524+
525+
/*
526+
* If we found any surplus Vars, build a new GROUP BY clause without them.
527+
* (Note: this may leave some TLEs with unreferenced ressortgroupref
528+
* markings, but that's harmless.)
529+
*/
530+
if (surplusvars != NULL)
531+
{
532+
List *new_groupby = NIL;
533+
534+
foreach(lc, root->processed_groupClause)
535+
{
536+
SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
537+
TargetEntry *tle = get_sortgroupclause_tle(sgc, parse->targetList);
538+
Var *var = (Var *) tle->expr;
539+
540+
/*
541+
* New list must include non-Vars, outer Vars, and anything not
542+
* marked as surplus.
543+
*/
544+
if (!IsA(var, Var) ||
545+
var->varlevelsup > 0 ||
546+
!bms_is_member(var->varattno - FirstLowInvalidHeapAttributeNumber,
547+
surplusvars[var->varno]))
548+
new_groupby = lappend(new_groupby, sgc);
549+
}
550+
551+
root->processed_groupClause = new_groupby;
552+
}
553+
}
389554

390555
/*****************************************************************************
391556
*

src/backend/optimizer/plan/planmain.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,9 @@ query_planner(PlannerInfo *root,
169169
*/
170170
add_base_rels_to_query(root, (Node *) parse->jointree);
171171

172+
/* Remove any redundant GROUP BY columns */
173+
remove_useless_groupby_columns(root);
174+
172175
/*
173176
* Examine the targetlist and join tree, adding entries to baserel
174177
* targetlists for all referenced Vars, and generating PlaceHolderInfo

src/backend/optimizer/plan/planner.c

Lines changed: 0 additions & 164 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@
2323
#include "access/sysattr.h"
2424
#include "access/table.h"
2525
#include "catalog/pg_aggregate.h"
26-
#include "catalog/pg_constraint.h"
2726
#include "catalog/pg_inherits.h"
2827
#include "catalog/pg_proc.h"
2928
#include "catalog/pg_type.h"
@@ -139,7 +138,6 @@ static void preprocess_rowmarks(PlannerInfo *root);
139138
static double preprocess_limit(PlannerInfo *root,
140139
double tuple_fraction,
141140
int64 *offset_est, int64 *count_est);
142-
static void remove_useless_groupby_columns(PlannerInfo *root);
143141
static List *preprocess_groupclause(PlannerInfo *root, List *force);
144142
static List *extract_rollup_sets(List *groupingSets);
145143
static List *reorder_grouping_sets(List *groupingSets, List *sortclause);
@@ -1487,8 +1485,6 @@ grouping_planner(PlannerInfo *root, double tuple_fraction,
14871485
{
14881486
/* Preprocess regular GROUP BY clause, if any */
14891487
root->processed_groupClause = preprocess_groupclause(root, NIL);
1490-
/* Remove any redundant GROUP BY columns */
1491-
remove_useless_groupby_columns(root);
14921488
}
14931489

14941490
/*
@@ -2724,166 +2720,6 @@ limit_needed(Query *parse)
27242720
return false; /* don't need a Limit plan node */
27252721
}
27262722

2727-
2728-
/*
2729-
* remove_useless_groupby_columns
2730-
* Remove any columns in the GROUP BY clause that are redundant due to
2731-
* being functionally dependent on other GROUP BY columns.
2732-
*
2733-
* Since some other DBMSes do not allow references to ungrouped columns, it's
2734-
* not unusual to find all columns listed in GROUP BY even though listing the
2735-
* primary-key columns would be sufficient. Deleting such excess columns
2736-
* avoids redundant sorting work, so it's worth doing.
2737-
*
2738-
* Relcache invalidations will ensure that cached plans become invalidated
2739-
* when the underlying index of the pkey constraint is dropped.
2740-
*
2741-
* Currently, we only make use of pkey constraints for this, however, we may
2742-
* wish to take this further in the future and also use unique constraints
2743-
* which have NOT NULL columns. In that case, plan invalidation will still
2744-
* work since relations will receive a relcache invalidation when a NOT NULL
2745-
* constraint is dropped.
2746-
*/
2747-
static void
2748-
remove_useless_groupby_columns(PlannerInfo *root)
2749-
{
2750-
Query *parse = root->parse;
2751-
Bitmapset **groupbyattnos;
2752-
Bitmapset **surplusvars;
2753-
ListCell *lc;
2754-
int relid;
2755-
2756-
/* No chance to do anything if there are less than two GROUP BY items */
2757-
if (list_length(root->processed_groupClause) < 2)
2758-
return;
2759-
2760-
/* Don't fiddle with the GROUP BY clause if the query has grouping sets */
2761-
if (parse->groupingSets)
2762-
return;
2763-
2764-
/*
2765-
* Scan the GROUP BY clause to find GROUP BY items that are simple Vars.
2766-
* Fill groupbyattnos[k] with a bitmapset of the column attnos of RTE k
2767-
* that are GROUP BY items.
2768-
*/
2769-
groupbyattnos = (Bitmapset **) palloc0(sizeof(Bitmapset *) *
2770-
(list_length(parse->rtable) + 1));
2771-
foreach(lc, root->processed_groupClause)
2772-
{
2773-
SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
2774-
TargetEntry *tle = get_sortgroupclause_tle(sgc, parse->targetList);
2775-
Var *var = (Var *) tle->expr;
2776-
2777-
/*
2778-
* Ignore non-Vars and Vars from other query levels.
2779-
*
2780-
* XXX in principle, stable expressions containing Vars could also be
2781-
* removed, if all the Vars are functionally dependent on other GROUP
2782-
* BY items. But it's not clear that such cases occur often enough to
2783-
* be worth troubling over.
2784-
*/
2785-
if (!IsA(var, Var) ||
2786-
var->varlevelsup > 0)
2787-
continue;
2788-
2789-
/* OK, remember we have this Var */
2790-
relid = var->varno;
2791-
Assert(relid <= list_length(parse->rtable));
2792-
groupbyattnos[relid] = bms_add_member(groupbyattnos[relid],
2793-
var->varattno - FirstLowInvalidHeapAttributeNumber);
2794-
}
2795-
2796-
/*
2797-
* Consider each relation and see if it is possible to remove some of its
2798-
* Vars from GROUP BY. For simplicity and speed, we do the actual removal
2799-
* in a separate pass. Here, we just fill surplusvars[k] with a bitmapset
2800-
* of the column attnos of RTE k that are removable GROUP BY items.
2801-
*/
2802-
surplusvars = NULL; /* don't allocate array unless required */
2803-
relid = 0;
2804-
foreach(lc, parse->rtable)
2805-
{
2806-
RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
2807-
Bitmapset *relattnos;
2808-
Bitmapset *pkattnos;
2809-
Oid constraintOid;
2810-
2811-
relid++;
2812-
2813-
/* Only plain relations could have primary-key constraints */
2814-
if (rte->rtekind != RTE_RELATION)
2815-
continue;
2816-
2817-
/*
2818-
* We must skip inheritance parent tables as some of the child rels
2819-
* may cause duplicate rows. This cannot happen with partitioned
2820-
* tables, however.
2821-
*/
2822-
if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE)
2823-
continue;
2824-
2825-
/* Nothing to do unless this rel has multiple Vars in GROUP BY */
2826-
relattnos = groupbyattnos[relid];
2827-
if (bms_membership(relattnos) != BMS_MULTIPLE)
2828-
continue;
2829-
2830-
/*
2831-
* Can't remove any columns for this rel if there is no suitable
2832-
* (i.e., nondeferrable) primary key constraint.
2833-
*/
2834-
pkattnos = get_primary_key_attnos(rte->relid, false, &constraintOid);
2835-
if (pkattnos == NULL)
2836-
continue;
2837-
2838-
/*
2839-
* If the primary key is a proper subset of relattnos then we have
2840-
* some items in the GROUP BY that can be removed.
2841-
*/
2842-
if (bms_subset_compare(pkattnos, relattnos) == BMS_SUBSET1)
2843-
{
2844-
/*
2845-
* To easily remember whether we've found anything to do, we don't
2846-
* allocate the surplusvars[] array until we find something.
2847-
*/
2848-
if (surplusvars == NULL)
2849-
surplusvars = (Bitmapset **) palloc0(sizeof(Bitmapset *) *
2850-
(list_length(parse->rtable) + 1));
2851-
2852-
/* Remember the attnos of the removable columns */
2853-
surplusvars[relid] = bms_difference(relattnos, pkattnos);
2854-
}
2855-
}
2856-
2857-
/*
2858-
* If we found any surplus Vars, build a new GROUP BY clause without them.
2859-
* (Note: this may leave some TLEs with unreferenced ressortgroupref
2860-
* markings, but that's harmless.)
2861-
*/
2862-
if (surplusvars != NULL)
2863-
{
2864-
List *new_groupby = NIL;
2865-
2866-
foreach(lc, root->processed_groupClause)
2867-
{
2868-
SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
2869-
TargetEntry *tle = get_sortgroupclause_tle(sgc, parse->targetList);
2870-
Var *var = (Var *) tle->expr;
2871-
2872-
/*
2873-
* New list must include non-Vars, outer Vars, and anything not
2874-
* marked as surplus.
2875-
*/
2876-
if (!IsA(var, Var) ||
2877-
var->varlevelsup > 0 ||
2878-
!bms_is_member(var->varattno - FirstLowInvalidHeapAttributeNumber,
2879-
surplusvars[var->varno]))
2880-
new_groupby = lappend(new_groupby, sgc);
2881-
}
2882-
2883-
root->processed_groupClause = new_groupby;
2884-
}
2885-
}
2886-
28872723
/*
28882724
* preprocess_groupclause - do preparatory work on GROUP BY clause
28892725
*

src/include/optimizer/planmain.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@ extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
7474
Relids where_needed);
7575
extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars,
7676
Relids where_needed);
77+
extern void remove_useless_groupby_columns(PlannerInfo *root);
7778
extern void find_lateral_references(PlannerInfo *root);
7879
extern void rebuild_lateral_attr_needed(PlannerInfo *root);
7980
extern void create_lateral_join_info(PlannerInfo *root);

0 commit comments

Comments
 (0)