Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 443df6e

Browse files
committed
Revert "Optimize order of GROUP BY keys".
This reverts commit db0d67d and several follow-on fixes. The idea of making a cost-based choice of the order of the sorting columns is not fundamentally unsound, but it requires cost information and data statistics that we don't really have. For example, relying on procost to distinguish the relative costs of different sort comparators is pretty pointless so long as most such comparator functions are labeled with cost 1.0. Moreover, estimating the number of comparisons done by Quicksort requires more than just an estimate of the number of distinct values in the input: you also need some idea of the sizes of the larger groups, if you want an estimate that's good to better than a factor of three or so. That's data that's often unknown or not very reliable. Worse, to arrive at estimates of the number of calls made to the lower-order-column comparison functions, the code needs to make estimates of the numbers of distinct values of multiple columns, which are necessarily even less trustworthy than per-column stats. Even if all the inputs are perfectly reliable, the cost algorithm as-implemented cannot offer useful information about how to order sorting columns beyond the point at which the average group size is estimated to drop to 1. Close inspection of the code added by db0d67d shows that there are also multiple small bugs. These could have been fixed, but there's not much point if we don't trust the estimates to be accurate in-principle. Finally, the changes in cost_sort's behavior made for very large changes (often a factor of 2 or so) in the cost estimates for all sorting operations, not only those for multi-column GROUP BY. That naturally changes plan choices in many situations, and there's precious little evidence to show that the changes are for the better. Given the above doubts about whether the new estimates are really trustworthy, it's hard to summon much confidence that these changes are better on the average. Since we're hard up against the release deadline for v15, let's revert these changes for now. We can always try again later. Note: in v15, I left T_PathKeyInfo in place in nodes.h even though it's unreferenced. Removing it would be an ABI break, and it seems a bit late in the release cycle for that. Discussion: https://postgr.es/m/TYAPR01MB586665EB5FB2C3807E893941F5579@TYAPR01MB5866.jpnprd01.prod.outlook.com
1 parent b507a7a commit 443df6e

File tree

24 files changed

+453
-1867
lines changed

24 files changed

+453
-1867
lines changed

contrib/postgres_fdw/expected/postgres_fdw.out

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2840,13 +2840,16 @@ select c2 * (random() <= 1)::int as c2 from ft2 group by c2 * (random() <= 1)::i
28402840
-- GROUP BY clause in various forms, cardinal, alias and constant expression
28412841
explain (verbose, costs off)
28422842
select count(c2) w, c2 x, 5 y, 7.0 z from ft1 group by 2, y, 9.0::int order by 2;
2843-
QUERY PLAN
2844-
------------------------------------------------------------------------------------------------------------
2845-
Foreign Scan
2843+
QUERY PLAN
2844+
---------------------------------------------------------------------------------------
2845+
Sort
28462846
Output: (count(c2)), c2, 5, 7.0, 9
2847-
Relations: Aggregate on (public.ft1)
2848-
Remote SQL: SELECT count(c2), c2, 5, 7.0, 9 FROM "S 1"."T 1" GROUP BY 2, 3, 5 ORDER BY c2 ASC NULLS LAST
2849-
(4 rows)
2847+
Sort Key: ft1.c2
2848+
-> Foreign Scan
2849+
Output: (count(c2)), c2, 5, 7.0, 9
2850+
Relations: Aggregate on (public.ft1)
2851+
Remote SQL: SELECT count(c2), c2, 5, 7.0, 9 FROM "S 1"."T 1" GROUP BY 2, 3, 5
2852+
(7 rows)
28502853

28512854
select count(c2) w, c2 x, 5 y, 7.0 z from ft1 group by 2, y, 9.0::int order by 2;
28522855
w | x | y | z

doc/src/sgml/config.sgml

Lines changed: 0 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5055,20 +5055,6 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
50555055
</listitem>
50565056
</varlistentry>
50575057

5058-
<varlistentry id="guc-enable-groupby-reordering" xreflabel="enable_group_by_reordering">
5059-
<term><varname>enable_group_by_reordering</varname> (<type>boolean</type>)
5060-
<indexterm>
5061-
<primary><varname>enable_group_by_reordering</varname> configuration parameter</primary>
5062-
</indexterm>
5063-
</term>
5064-
<listitem>
5065-
<para>
5066-
Enables or disables reordering of keys in a <literal>GROUP BY</literal>
5067-
clause. The default is <literal>on</literal>.
5068-
</para>
5069-
</listitem>
5070-
</varlistentry>
5071-
50725058
<varlistentry id="guc-enable-hashagg" xreflabel="enable_hashagg">
50735059
<term><varname>enable_hashagg</varname> (<type>boolean</type>)
50745060
<indexterm>

0 commit comments

Comments
 (0)