@@ -284,44 +284,41 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
284
284
285
285
<para>
286
286
The driving table may be joined to one or more other tables using nested
287
- loops or hash joins. The outer side of the join may be any kind of
287
+ loops or hash joins. The inner side of the join may be any kind of
288
288
non-parallel plan that is otherwise supported by the planner provided that
289
289
it is safe to run within a parallel worker. For example, it may be an
290
- index scan which looks up a value based on a column taken from the inner
291
- table. Each worker will execute the outer side of the plan in full, which
292
- is why merge joins are not supported here. The outer side of a merge join
293
- will often involve sorting the entire inner table; even if it involves an
294
- index, it is unlikely to be productive to have multiple processes each
295
- conduct a full index scan of the inner table.
290
+ index scan which looks up a value taken from the outer side of the join.
291
+ Each worker will execute the inner side of the join in full, which for
292
+ hash join means that an identical hash table is built in each worker
293
+ process.
296
294
</para>
297
295
</sect2>
298
296
299
297
<sect2 id="parallel-aggregation">
300
298
<title>Parallel Aggregation</title>
301
299
<para>
302
- It is not possible to perform the aggregation portion of a query entirely
303
- in parallel. For example, if a query involves selecting
304
- <literal>COUNT(*)</>, each worker could compute a total, but those totals
305
- would need to combined in order to produce a final answer. If the query
306
- involved a <literal>GROUP BY</> clause, a separate total would need to
307
- be computed for each group. Even though aggregation can't be done entirely
308
- in parallel, queries involving aggregation are often excellent candidates
309
- for parallel query, because they typically read many rows but return only
310
- a few rows to the client. Queries that return many rows to the client
311
- are often limited by the speed at which the client can read the data,
312
- in which case parallel query cannot help very much.
313
- </para>
314
-
315
- <para>
316
- <productname>PostgreSQL</> supports parallel aggregation by aggregating
317
- twice. First, each process participating in the parallel portion of the
318
- query performs an aggregation step, producing a partial result for each
319
- group of which that process is aware. This is reflected in the plan as
320
- a <literal>PartialAggregate</> node. Second, the partial results are
300
+ <productname>PostgreSQL</> supports parallel aggregation by aggregating in
301
+ two stages. First, each process participating in the parallel portion of
302
+ the query performs an aggregation step, producing a partial result for
303
+ each group of which that process is aware. This is reflected in the plan
304
+ as a <literal>Partial Aggregate</> node. Second, the partial results are
321
305
transferred to the leader via the <literal>Gather</> node. Finally, the
322
306
leader re-aggregates the results across all workers in order to produce
323
307
the final result. This is reflected in the plan as a
324
- <literal>FinalizeAggregate</> node.
308
+ <literal>Finalize Aggregate</> node.
309
+ </para>
310
+
311
+ <para>
312
+ Because the <literal>Finalize Aggregate</> node runs on the leader
313
+ process, queries which produce a relatively large number of groups in
314
+ comparison to the number of input rows will appear less favorable to the
315
+ query planner. For example, in the worst-case scenario the number of
316
+ groups seen by the <literal>Finalize Aggregate</> node could be as many as
317
+ the number of input rows which were seen by all worker processes in the
318
+ <literal>Partial Aggregate</> stage. For such cases, there is clearly
319
+ going to be no performance benefit to using parallel aggregation. The
320
+ query planner takes this into account during the planning process and is
321
+ unlikely to choose parallel aggregate in this scenario.
325
322
</para>
326
323
327
324
<para>
@@ -330,10 +327,11 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
330
327
have a combine function. If the aggregate has a transition state of type
331
328
<literal>internal</>, it must have serialization and deserialization
332
329
functions. See <xref linkend="sql-createaggregate"> for more details.
333
- Parallel aggregation is not supported for ordered set aggregates or when
334
- the query involves <literal>GROUPING SETS</>. It can only be used when
335
- all joins involved in the query are also part of the parallel portion
336
- of the plan.
330
+ Parallel aggregation is not supported if any aggregate function call
331
+ contains <literal>DISTINCT</> or <literal>ORDER BY</> clause and is also
332
+ not supported for ordered set aggregates or when the query involves
333
+ <literal>GROUPING SETS</>. It can only be used when all joins involved in
334
+ the query are also part of the parallel portion of the plan.
337
335
</para>
338
336
339
337
</sect2>
0 commit comments