Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit f3d3118

Browse files
committed
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
1 parent 6e4415c commit f3d3118

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+5245
-608
lines changed

contrib/pg_stat_statements/pg_stat_statements.c

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2267,6 +2267,7 @@ JumbleQuery(pgssJumbleState *jstate, Query *query)
22672267
JumbleExpr(jstate, (Node *) query->onConflict);
22682268
JumbleExpr(jstate, (Node *) query->returningList);
22692269
JumbleExpr(jstate, (Node *) query->groupClause);
2270+
JumbleExpr(jstate, (Node *) query->groupingSets);
22702271
JumbleExpr(jstate, query->havingQual);
22712272
JumbleExpr(jstate, (Node *) query->windowClause);
22722273
JumbleExpr(jstate, (Node *) query->distinctClause);
@@ -2397,6 +2398,13 @@ JumbleExpr(pgssJumbleState *jstate, Node *node)
23972398
JumbleExpr(jstate, (Node *) expr->aggfilter);
23982399
}
23992400
break;
2401+
case T_GroupingFunc:
2402+
{
2403+
GroupingFunc *grpnode = (GroupingFunc *) node;
2404+
2405+
JumbleExpr(jstate, (Node *) grpnode->refs);
2406+
}
2407+
break;
24002408
case T_WindowFunc:
24012409
{
24022410
WindowFunc *expr = (WindowFunc *) node;
@@ -2698,6 +2706,12 @@ JumbleExpr(pgssJumbleState *jstate, Node *node)
26982706
JumbleExpr(jstate, (Node *) lfirst(temp));
26992707
}
27002708
break;
2709+
case T_IntList:
2710+
foreach(temp, (List *) node)
2711+
{
2712+
APP_JUMB(lfirst_int(temp));
2713+
}
2714+
break;
27012715
case T_SortGroupClause:
27022716
{
27032717
SortGroupClause *sgc = (SortGroupClause *) node;
@@ -2708,6 +2722,13 @@ JumbleExpr(pgssJumbleState *jstate, Node *node)
27082722
APP_JUMB(sgc->nulls_first);
27092723
}
27102724
break;
2725+
case T_GroupingSet:
2726+
{
2727+
GroupingSet *gsnode = (GroupingSet *) node;
2728+
2729+
JumbleExpr(jstate, (Node *) gsnode->content);
2730+
}
2731+
break;
27112732
case T_WindowClause:
27122733
{
27132734
WindowClause *wc = (WindowClause *) node;

doc/src/sgml/func.sgml

Lines changed: 69 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12228,7 +12228,9 @@ NULL baz</literallayout>(3 rows)</entry>
1222812228
<xref linkend="functions-aggregate-statistics-table">.
1222912229
The built-in ordered-set aggregate functions
1223012230
are listed in <xref linkend="functions-orderedset-table"> and
12231-
<xref linkend="functions-hypothetical-table">.
12231+
<xref linkend="functions-hypothetical-table">. Grouping operations,
12232+
which are closely related to aggregate functions, are listed in
12233+
<xref linkend="functions-grouping-table">.
1223212234
The special syntax considerations for aggregate
1223312235
functions are explained in <xref linkend="syntax-aggregates">.
1223412236
Consult <xref linkend="tutorial-agg"> for additional introductory
@@ -13326,6 +13328,72 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
1332613328
to the rule specified in the <literal>ORDER BY</> clause.
1332713329
</para>
1332813330

13331+
<table id="functions-grouping-table">
13332+
<title>Grouping Operations</title>
13333+
13334+
<tgroup cols="3">
13335+
<thead>
13336+
<row>
13337+
<entry>Function</entry>
13338+
<entry>Return Type</entry>
13339+
<entry>Description</entry>
13340+
</row>
13341+
</thead>
13342+
13343+
<tbody>
13344+
13345+
<row>
13346+
<entry>
13347+
<indexterm>
13348+
<primary>GROUPING</primary>
13349+
</indexterm>
13350+
<function>GROUPING(<replaceable class="parameter">args...</replaceable>)</function>
13351+
</entry>
13352+
<entry>
13353+
<type>integer</type>
13354+
</entry>
13355+
<entry>
13356+
Integer bitmask indicating which arguments are not being included in the current
13357+
grouping set
13358+
</entry>
13359+
</row>
13360+
</tbody>
13361+
</tgroup>
13362+
</table>
13363+
13364+
<para>
13365+
Grouping operations are used in conjunction with grouping sets (see
13366+
<xref linkend="queries-grouping-sets">) to distinguish result rows. The
13367+
arguments to the <literal>GROUPING</> operation are not actually evaluated,
13368+
but they must match exactly expressions given in the <literal>GROUP BY</>
13369+
clause of the associated query level. Bits are assigned with the rightmost
13370+
argument being the least-significant bit; each bit is 0 if the corresponding
13371+
expression is included in the grouping criteria of the grouping set generating
13372+
the result row, and 1 if it is not. For example:
13373+
<screen>
13374+
<prompt>=&gt;</> <userinput>SELECT * FROM items_sold;</>
13375+
make | model | sales
13376+
-------+-------+-------
13377+
Foo | GT | 10
13378+
Foo | Tour | 20
13379+
Bar | City | 15
13380+
Bar | Sport | 5
13381+
(4 rows)
13382+
13383+
<prompt>=&gt;</> <userinput>SELECT make, model, GROUPING(make,model), sum(sales) FROM items_sold GROUP BY ROLLUP(make,model);</>
13384+
make | model | grouping | sum
13385+
-------+-------+----------+-----
13386+
Foo | GT | 0 | 10
13387+
Foo | Tour | 0 | 20
13388+
Bar | City | 0 | 15
13389+
Bar | Sport | 0 | 5
13390+
Foo | | 1 | 30
13391+
Bar | | 1 | 20
13392+
| | 3 | 50
13393+
(7 rows)
13394+
</screen>
13395+
</para>
13396+
1332913397
</sect1>
1333013398

1333113399
<sect1 id="functions-window">

doc/src/sgml/queries.sgml

Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1183,6 +1183,181 @@ SELECT product_id, p.name, (sum(s.units) * (p.price - p.cost)) AS profit
11831183
</para>
11841184
</sect2>
11851185

1186+
<sect2 id="queries-grouping-sets">
1187+
<title><literal>GROUPING SETS</>, <literal>CUBE</>, and <literal>ROLLUP</></title>
1188+
1189+
<indexterm zone="queries-grouping-sets">
1190+
<primary>GROUPING SETS</primary>
1191+
</indexterm>
1192+
<indexterm zone="queries-grouping-sets">
1193+
<primary>CUBE</primary>
1194+
</indexterm>
1195+
<indexterm zone="queries-grouping-sets">
1196+
<primary>ROLLUP</primary>
1197+
</indexterm>
1198+
1199+
<para>
1200+
More complex grouping operations than those described above are possible
1201+
using the concept of <firstterm>grouping sets</>. The data selected by
1202+
the <literal>FROM</> and <literal>WHERE</> clauses is grouped separately
1203+
by each specified grouping set, aggregates computed for each group just as
1204+
for simple <literal>GROUP BY</> clauses, and then the results returned.
1205+
For example:
1206+
<screen>
1207+
<prompt>=&gt;</> <userinput>SELECT * FROM items_sold;</>
1208+
brand | size | sales
1209+
-------+------+-------
1210+
Foo | L | 10
1211+
Foo | M | 20
1212+
Bar | M | 15
1213+
Bar | L | 5
1214+
(4 rows)
1215+
1216+
<prompt>=&gt;</> <userinput>SELECT brand, size, sum(sales) FROM items_sold GROUP BY GROUPING SETS ((brand), (size), ());</>
1217+
brand | size | sum
1218+
-------+------+-----
1219+
Foo | | 30
1220+
Bar | | 20
1221+
| L | 15
1222+
| M | 35
1223+
| | 50
1224+
(5 rows)
1225+
</screen>
1226+
</para>
1227+
1228+
<para>
1229+
Each sublist of <literal>GROUPING SETS</> may specify zero or more columns
1230+
or expressions and is interpreted the same way as though it were directly
1231+
in the <literal>GROUP BY</> clause. An empty grouping set means that all
1232+
rows are aggregated down to a single group (which is output even if no
1233+
input rows were present), as described above for the case of aggregate
1234+
functions with no <literal>GROUP BY</> clause.
1235+
</para>
1236+
1237+
<para>
1238+
References to the grouping columns or expressions are replaced
1239+
by <literal>NULL</> values in result rows for grouping sets in which those
1240+
columns do not appear. To distinguish which grouping a particular output
1241+
row resulted from, see <xref linkend="functions-grouping-table">.
1242+
</para>
1243+
1244+
<para>
1245+
A shorthand notation is provided for specifying two common types of grouping set.
1246+
A clause of the form
1247+
<programlisting>
1248+
ROLLUP ( <replaceable>e1</>, <replaceable>e2</>, <replaceable>e3</>, ... )
1249+
</programlisting>
1250+
represents the given list of expressions and all prefixes of the list including
1251+
the empty list; thus it is equivalent to
1252+
<programlisting>
1253+
GROUPING SETS (
1254+
( <replaceable>e1</>, <replaceable>e2</>, <replaceable>e3</>, ... ),
1255+
...
1256+
( <replaceable>e1</>, <replaceable>e2</> )
1257+
( <replaceable>e1</> )
1258+
( )
1259+
)
1260+
</programlisting>
1261+
This is commonly used for analysis over hierarchical data; e.g. total
1262+
salary by department, division, and company-wide total.
1263+
</para>
1264+
1265+
<para>
1266+
A clause of the form
1267+
<programlisting>
1268+
CUBE ( <replaceable>e1</>, <replaceable>e2</>, ... )
1269+
</programlisting>
1270+
represents the given list and all of its possible subsets (i.e. the power
1271+
set). Thus
1272+
<programlisting>
1273+
CUBE ( a, b, c )
1274+
</programlisting>
1275+
is equivalent to
1276+
<programlisting>
1277+
GROUPING SETS (
1278+
( a, b, c ),
1279+
( a, b ),
1280+
( a, c ),
1281+
( a ),
1282+
( b, c ),
1283+
( b ),
1284+
( c ),
1285+
( ),
1286+
)
1287+
</programlisting>
1288+
</para>
1289+
1290+
<para>
1291+
The individual elements of a <literal>CUBE</> or <literal>ROLLUP</>
1292+
clause may be either individual expressions, or sub-lists of elements in
1293+
parentheses. In the latter case, the sub-lists are treated as single
1294+
units for the purposes of generating the individual grouping sets.
1295+
For example:
1296+
<programlisting>
1297+
CUBE ( (a,b), (c,d) )
1298+
</programlisting>
1299+
is equivalent to
1300+
<programlisting>
1301+
GROUPING SETS (
1302+
( a, b, c, d )
1303+
( a, b )
1304+
( c, d )
1305+
( )
1306+
)
1307+
</programlisting>
1308+
and
1309+
<programlisting>
1310+
ROLLUP ( a, (b,c), d )
1311+
</programlisting>
1312+
is equivalent to
1313+
<programlisting>
1314+
GROUPING SETS (
1315+
( a, b, c, d )
1316+
( a, b, c )
1317+
( a )
1318+
( )
1319+
)
1320+
</programlisting>
1321+
</para>
1322+
1323+
<para>
1324+
The <literal>CUBE</> and <literal>ROLLUP</> constructs can be used either
1325+
directly in the <literal>GROUP BY</> clause, or nested inside a
1326+
<literal>GROUPING SETS</> clause. If one <literal>GROUPING SETS</> clause
1327+
is nested inside another, the effect is the same as if all the elements of
1328+
the inner clause had been written directly in the outer clause.
1329+
</para>
1330+
1331+
<para>
1332+
If multiple grouping items are specified in a single <literal>GROUP BY</>
1333+
clause, then the final list of grouping sets is the cross product of the
1334+
individual items. For example:
1335+
<programlisting>
1336+
GROUP BY a, CUBE(b,c), GROUPING SETS ((d), (e))
1337+
</programlisting>
1338+
is equivalent to
1339+
<programlisting>
1340+
GROUP BY GROUPING SETS (
1341+
(a,b,c,d), (a,b,c,e),
1342+
(a,b,d), (a,b,e),
1343+
(a,c,d), (a,c,e),
1344+
(a,d), (a,e)
1345+
)
1346+
</programlisting>
1347+
</para>
1348+
1349+
<note>
1350+
<para>
1351+
The construct <literal>(a,b)</> is normally recognized in expressions as
1352+
a <link linkend="sql-syntax-row-constructors">row constructor</link>.
1353+
Within the <literal>GROUP BY</> clause, this does not apply at the top
1354+
levels of expressions, and <literal>(a,b)</> is parsed as a list of
1355+
expressions as described above. If for some reason you <emphasis>need</>
1356+
a row constructor in a grouping expression, use <literal>ROW(a,b)</>.
1357+
</para>
1358+
</note>
1359+
</sect2>
1360+
11861361
<sect2 id="queries-window">
11871362
<title>Window Function Processing</title>
11881363

doc/src/sgml/ref/select.sgml

Lines changed: 27 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replac
3737
[ * | <replaceable class="parameter">expression</replaceable> [ [ AS ] <replaceable class="parameter">output_name</replaceable> ] [, ...] ]
3838
[ FROM <replaceable class="parameter">from_item</replaceable> [, ...] ]
3939
[ WHERE <replaceable class="parameter">condition</replaceable> ]
40-
[ GROUP BY <replaceable class="parameter">expression</replaceable> [, ...] ]
40+
[ GROUP BY <replaceable class="parameter">grouping_element</replaceable> [, ...] ]
4141
[ HAVING <replaceable class="parameter">condition</replaceable> [, ...] ]
4242
[ WINDOW <replaceable class="parameter">window_name</replaceable> AS ( <replaceable class="parameter">window_definition</replaceable> ) [, ...] ]
4343
[ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] <replaceable class="parameter">select</replaceable> ]
@@ -60,6 +60,15 @@ SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replac
6060
[ WITH ORDINALITY ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
6161
<replaceable class="parameter">from_item</replaceable> [ NATURAL ] <replaceable class="parameter">join_type</replaceable> <replaceable class="parameter">from_item</replaceable> [ ON <replaceable class="parameter">join_condition</replaceable> | USING ( <replaceable class="parameter">join_column</replaceable> [, ...] ) ]
6262

63+
<phrase>and <replaceable class="parameter">grouping_element</replaceable> can be one of:</phrase>
64+
65+
( )
66+
<replaceable class="parameter">expression</replaceable>
67+
( <replaceable class="parameter">expression</replaceable> [, ...] )
68+
ROLLUP ( { <replaceable class="parameter">expression</replaceable> | ( <replaceable class="parameter">expression</replaceable> [, ...] ) } [, ...] )
69+
CUBE ( { <replaceable class="parameter">expression</replaceable> | ( <replaceable class="parameter">expression</replaceable> [, ...] ) } [, ...] )
70+
GROUPING SETS ( <replaceable class="parameter">grouping_element</replaceable> [, ...] )
71+
6372
<phrase>and <replaceable class="parameter">with_query</replaceable> is:</phrase>
6473

6574
<replaceable class="parameter">with_query_name</replaceable> [ ( <replaceable class="parameter">column_name</replaceable> [, ...] ) ] AS ( <replaceable class="parameter">select</replaceable> | <replaceable class="parameter">values</replaceable> | <replaceable class="parameter">insert</replaceable> | <replaceable class="parameter">update</replaceable> | <replaceable class="parameter">delete</replaceable> )
@@ -665,22 +674,34 @@ WHERE <replaceable class="parameter">condition</replaceable>
665674
<para>
666675
The optional <literal>GROUP BY</literal> clause has the general form
667676
<synopsis>
668-
GROUP BY <replaceable class="parameter">expression</replaceable> [, ...]
677+
GROUP BY <replaceable class="parameter">grouping_element</replaceable> [, ...]
669678
</synopsis>
670679
</para>
671680

672681
<para>
673682
<literal>GROUP BY</literal> will condense into a single row all
674683
selected rows that share the same values for the grouped
675-
expressions. <replaceable
676-
class="parameter">expression</replaceable> can be an input column
677-
name, or the name or ordinal number of an output column
678-
(<command>SELECT</command> list item), or an arbitrary
684+
expressions. An <replaceable
685+
class="parameter">expression</replaceable> used inside a
686+
<replaceable class="parameter">grouping_element</replaceable>
687+
can be an input column name, or the name or ordinal number of an
688+
output column (<command>SELECT</command> list item), or an arbitrary
679689
expression formed from input-column values. In case of ambiguity,
680690
a <literal>GROUP BY</literal> name will be interpreted as an
681691
input-column name rather than an output column name.
682692
</para>
683693

694+
<para>
695+
If any of <literal>GROUPING SETS</>, <literal>ROLLUP</> or
696+
<literal>CUBE</> are present as grouping elements, then the
697+
<literal>GROUP BY</> clause as a whole defines some number of
698+
independent <replaceable>grouping sets</>. The effect of this is
699+
equivalent to constructing a <literal>UNION ALL</> between
700+
subqueries with the individual grouping sets as their
701+
<literal>GROUP BY</> clauses. For further details on the handling
702+
of grouping sets see <xref linkend="queries-grouping-sets">.
703+
</para>
704+
684705
<para>
685706
Aggregate functions, if any are used, are computed across all rows
686707
making up each group, producing a separate value for each group.

src/backend/catalog/sql_features.txt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -467,9 +467,9 @@ T331 Basic roles YES
467467
T332 Extended roles NO mostly supported
468468
T341 Overloading of SQL-invoked functions and procedures YES
469469
T351 Bracketed SQL comments (/*...*/ comments) YES
470-
T431 Extended grouping capabilities NO
471-
T432 Nested and concatenated GROUPING SETS NO
472-
T433 Multiargument GROUPING function NO
470+
T431 Extended grouping capabilities YES
471+
T432 Nested and concatenated GROUPING SETS YES
472+
T433 Multiargument GROUPING function YES
473473
T434 GROUP BY DISTINCT NO
474474
T441 ABS and MOD functions YES
475475
T461 Symmetric BETWEEN predicate YES

0 commit comments

Comments
 (0)