Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: postgresql-cfbot/postgresql
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: cf/4881~1
Choose a base ref
...
head repository: postgresql-cfbot/postgresql
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: cf/4881
Choose a head ref
  • 2 commits
  • 24 files changed
  • 2 contributors

Commits on Mar 17, 2025

  1. Implement Eager Aggregation

    Eager aggregation is a query optimization technique that partially
    pushes aggregation past a join, and finalizes it once all the
    relations are joined.  Eager aggregation may reduce the number of
    input rows to the join and thus could result in a better overall plan.
    
    A plan with eager aggregation looks like:
    
     EXPLAIN (COSTS OFF)
     SELECT a.i, avg(b.y)
     FROM a JOIN b ON a.i = b.j
     GROUP BY a.i;
    
     Finalize HashAggregate
       Group Key: a.i
       ->  Nested Loop
             ->  Partial HashAggregate
                   Group Key: b.j
                   ->  Seq Scan on b
             ->  Index Only Scan using a_pkey on a
                   Index Cond: (i = b.j)
    
    During the construction of the join tree, we evaluate each base or
    join relation to determine if eager aggregation can be applied.  If
    feasible, we create a separate RelOptInfo called a "grouped relation"
    and store it in a dedicated list.
    
    Grouped relation paths can be generated in two ways.  The first method
    involves adding sorted and hashed partial aggregation paths on top of
    the non-grouped paths.  To limit planning time, we only consider the
    cheapest or suitably-sorted non-grouped paths during this phase.
    
    Alternatively, grouped paths can be generated by joining a grouped
    relation with a non-grouped relation.  Joining two grouped relations
    does not seem to be very useful and is currently not supported.
    
    For the partial aggregation that is pushed down to a non-aggregated
    relation, we need to consider all expressions from this relation that
    are involved in upper join clauses and include them in the grouping
    keys, using compatible operators.  This is essential to ensure that an
    aggregated row from the partial aggregation matches the other side of
    the join if and only if each row in the partial group does.  This
    ensures that all rows within the same partial group share the same
    'destiny', which is crucial for maintaining correctness.
    
    One restriction is that we cannot push partial aggregation down to a
    relation that is in the nullable side of an outer join, because the
    NULL-extended rows produced by the outer join would not be available
    when we perform the partial aggregation, while with a
    non-eager-aggregation plan these rows are available for the top-level
    aggregation.  Pushing partial aggregation in this case may result in
    the rows being grouped differently than expected, or produce incorrect
    values from the aggregate functions.
    
    If we have generated a grouped relation for the topmost join relation,
    we finalize its paths at the end.  The final paths will compete in the
    usual way with paths built from regular planning.
    
    Since eager aggregation can generate many grouped relations, we
    introduce a RelInfoList structure, which encapsulates both a list and
    a hash table, so that we can leverage the hash table for faster
    lookups not only for join relations but also for grouped relations.
    
    Eager aggregation can use significantly more CPU time and memory than
    regular planning when the query involves aggregates and many joining
    relations.  However, in some cases, the resulting plan can be much
    better, justifying the additional planning effort.  All the same, for
    now, turn this feature off by default.
    
    The patch was originally proposed by Antonin Houska in 2017.  This
    commit reworks various important aspects and rewrites most of the
    current code.  However, the original patch and reviews were very
    useful.
    
    Author: Richard Guo, Antonin Houska
    Reviewed-by: Robert Haas, Jian He, Tender Wang, Paul George, Tom Lane
    Reviewed-by: Tomas Vondra, Andy Fan, Ashutosh Bapat
    Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
    guofengrichard authored and Commitfest Bot committed Mar 17, 2025
    Configuration menu
    Copy the full SHA
    87549a6 View commit details
    Browse the repository at this point in the history
  2. [CF 4881] v16 - Eager aggregation, take 3

    This branch was automatically generated by a robot using patches from an
    email thread registered at:
    
    https://commitfest.postgresql.org/patch/4881
    
    The branch will be overwritten each time a new patch version is posted to
    the thread, and also periodically to check for bitrot caused by changes
    on the master branch.
    
    Patch(es): https://www.postgresql.org/message-id/CAMbWs48XdzvnwfTHWxQ7qK-yjvdrbwsPpqhJBuKDnO+hcbsVwA@mail.gmail.com
    Author(s): Richard Guo
    Commitfest Bot committed Mar 17, 2025
    Configuration menu
    Copy the full SHA
    aa665ff View commit details
    Browse the repository at this point in the history
Loading