Comparing changes

Eager aggregation is a query optimization technique that partially pushes aggregation past a join, and finalizes it once all the relations are joined. Eager aggregation may reduce the number of input rows to the join and thus could result in a better overall plan. A plan with eager aggregation looks like: EXPLAIN (COSTS OFF) SELECT a.i, avg(b.y) FROM a JOIN b ON a.i = b.j GROUP BY a.i; Finalize HashAggregate Group Key: a.i -> Nested Loop -> Partial HashAggregate Group Key: b.j -> Seq Scan on b -> Index Only Scan using a_pkey on a Index Cond: (i = b.j) During the construction of the join tree, we evaluate each base or join relation to determine if eager aggregation can be applied. If feasible, we create a separate RelOptInfo called a "grouped relation" and store it in a dedicated list. Grouped relation paths can be generated in two ways. The first method involves adding sorted and hashed partial aggregation paths on top of the non-grouped paths. To limit planning time, we only consider the cheapest or suitably-sorted non-grouped paths during this phase. Alternatively, grouped paths can be generated by joining a grouped relation with a non-grouped relation. Joining two grouped relations does not seem to be very useful and is currently not supported. For the partial aggregation that is pushed down to a non-aggregated relation, we need to consider all expressions from this relation that are involved in upper join clauses and include them in the grouping keys, using compatible operators. This is essential to ensure that an aggregated row from the partial aggregation matches the other side of the join if and only if each row in the partial group does. This ensures that all rows within the same partial group share the same 'destiny', which is crucial for maintaining correctness. One restriction is that we cannot push partial aggregation down to a relation that is in the nullable side of an outer join, because the NULL-extended rows produced by the outer join would not be available when we perform the partial aggregation, while with a non-eager-aggregation plan these rows are available for the top-level aggregation. Pushing partial aggregation in this case may result in the rows being grouped differently than expected, or produce incorrect values from the aggregate functions. If we have generated a grouped relation for the topmost join relation, we finalize its paths at the end. The final paths will compete in the usual way with paths built from regular planning. Since eager aggregation can generate many grouped relations, we introduce a RelInfoList structure, which encapsulates both a list and a hash table, so that we can leverage the hash table for faster lookups not only for join relations but also for grouped relations. Eager aggregation can use significantly more CPU time and memory than regular planning when the query involves aggregates and many joining relations. However, in some cases, the resulting plan can be much better, justifying the additional planning effort. All the same, for now, turn this feature off by default. The patch was originally proposed by Antonin Houska in 2017. This commit reworks various important aspects and rewrites most of the current code. However, the original patch and reviews were very useful. Author: Richard Guo, Antonin Houska Reviewed-by: Robert Haas, Jian He, Tender Wang, Paul George, Tom Lane Reviewed-by: Tomas Vondra, Andy Fan, Ashutosh Bapat Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com

This branch was automatically generated by a robot using patches from an email thread registered at: https://commitfest.postgresql.org/patch/4881 The branch will be overwritten each time a new patch version is posted to the thread, and also periodically to check for bitrot caused by changes on the master branch. Patch(es): https://www.postgresql.org/message-id/CAMbWs48XdzvnwfTHWxQ7qK-yjvdrbwsPpqhJBuKDnO+hcbsVwA@mail.gmail.com Author(s): Richard Guo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comparing changes

Open a pull request

Commits on Mar 17, 2025

This comparison is taking too long to generate.

Uh oh!