Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit db0d67d

Browse files
committed
Optimize order of GROUP BY keys
When evaluating a query with a multi-column GROUP BY clause using sort, the cost may be heavily dependent on the order in which the keys are compared when building the groups. Grouping does not imply any ordering, so we're allowed to compare the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg, we simply compared keys in the order as specified in the query. This commit explores alternative ordering of the keys, trying to find a cheaper one. In principle, we might generate grouping paths for all permutations of the keys, and leave the rest to the optimizer. But that might get very expensive, so we try to pick only a couple interesting orderings based on both local and global information. When planning the grouping path, we explore statistics (number of distinct values, cost of the comparison function) for the keys and reorder them to minimize comparison costs. Intuitively, it may be better to perform more expensive comparisons (for complex data types etc.) last, because maybe the cheaper comparisons will be enough. Similarly, the higher the cardinality of a key, the lower the probability we’ll need to compare more keys. The patch generates and costs various orderings, picking the cheapest ones. The ordering of group keys may interact with other parts of the query, some of which may not be known while planning the grouping. E.g. there may be an explicit ORDER BY clause, or some other ordering-dependent operation, higher up in the query, and using the same ordering may allow using either incremental sort or even eliminate the sort entirely. The patch generates orderings and picks those minimizing the comparison cost (for various pathkeys), and then adds orderings that might be useful for operations higher up in the plan (ORDER BY, etc.). Finally, it always keeps the ordering specified in the query, on the assumption the user might have additional insights. This introduces a new GUC enable_group_by_reordering, so that the optimization may be disabled if needed. The original patch was proposed by Teodor Sigaev, and later improved and reworked by Dmitry Dolgov. Reviews by a number of people, including me, Andrey Lepikhov, Claudio Freire, Ibrar Ahmed and Zhihong Yu. Author: Dmitry Dolgov, Teodor Sigaev, Tomas Vondra Reviewed-by: Tomas Vondra, Andrey Lepikhov, Claudio Freire, Ibrar Ahmed, Zhihong Yu Discussion: https://postgr.es/m/7c79e6a5-8597-74e8-0671-1c39d124c9d6%40sigaev.ru Discussion: https://postgr.es/m/CA%2Bq6zcW_4o2NC0zutLkOJPsFt80megSpX_dVRo6GK9PC-Jx_Ag%40mail.gmail.com
1 parent 606948b commit db0d67d

24 files changed

+1882
-494
lines changed

contrib/postgres_fdw/expected/postgres_fdw.out

+6-9
Original file line numberDiff line numberDiff line change
@@ -2741,16 +2741,13 @@ select c2 * (random() <= 1)::int as c2 from ft2 group by c2 * (random() <= 1)::i
27412741
-- GROUP BY clause in various forms, cardinal, alias and constant expression
27422742
explain (verbose, costs off)
27432743
select count(c2) w, c2 x, 5 y, 7.0 z from ft1 group by 2, y, 9.0::int order by 2;
2744-
QUERY PLAN
2745-
---------------------------------------------------------------------------------------
2746-
Sort
2744+
QUERY PLAN
2745+
------------------------------------------------------------------------------------------------------------
2746+
Foreign Scan
27472747
Output: (count(c2)), c2, 5, 7.0, 9
2748-
Sort Key: ft1.c2
2749-
-> Foreign Scan
2750-
Output: (count(c2)), c2, 5, 7.0, 9
2751-
Relations: Aggregate on (public.ft1)
2752-
Remote SQL: SELECT count(c2), c2, 5, 7.0, 9 FROM "S 1"."T 1" GROUP BY 2, 3, 5
2753-
(7 rows)
2748+
Relations: Aggregate on (public.ft1)
2749+
Remote SQL: SELECT count(c2), c2, 5, 7.0, 9 FROM "S 1"."T 1" GROUP BY 2, 3, 5 ORDER BY c2 ASC NULLS LAST
2750+
(4 rows)
27542751

27552752
select count(c2) w, c2 x, 5 y, 7.0 z from ft1 group by 2, y, 9.0::int order by 2;
27562753
w | x | y | z

doc/src/sgml/config.sgml

+14
Original file line numberDiff line numberDiff line change
@@ -4967,6 +4967,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
49674967
</listitem>
49684968
</varlistentry>
49694969

4970+
<varlistentry id="guc-enable-groupby-reordering" xreflabel="enable_group_by_reordering">
4971+
<term><varname>enable_group_by_reordering</varname> (<type>boolean</type>)
4972+
<indexterm>
4973+
<primary><varname>enable_group_by_reordering</varname> configuration parameter</primary>
4974+
</indexterm>
4975+
</term>
4976+
<listitem>
4977+
<para>
4978+
Enables or disables reodering of keys in <literal>GROUP BY</literal>
4979+
clause. The default is <literal>on</literal>.
4980+
</para>
4981+
</listitem>
4982+
</varlistentry>
4983+
49704984
<varlistentry id="guc-enable-hashagg" xreflabel="enable_hashagg">
49714985
<term><varname>enable_hashagg</varname> (<type>boolean</type>)
49724986
<indexterm>

0 commit comments

Comments
 (0)