Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 0ede57a

Browse files
committed
Corrections and improvements to generic parallel query documentation.
David Rowley, reviewed by Brad DeJong, Amit Kapila, and me. Discussion: http://postgr.es/m/CAKJS1f81fob-M6RJyTVv3SCasxMuQpj37ReNOJ=tprhwd7hAVg@mail.gmail.com
1 parent f10637e commit 0ede57a

File tree

1 file changed

+29
-31
lines changed

1 file changed

+29
-31
lines changed

doc/src/sgml/parallel.sgml

Lines changed: 29 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -284,44 +284,41 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
284284

285285
<para>
286286
The driving table may be joined to one or more other tables using nested
287-
loops or hash joins. The outer side of the join may be any kind of
287+
loops or hash joins. The inner side of the join may be any kind of
288288
non-parallel plan that is otherwise supported by the planner provided that
289289
it is safe to run within a parallel worker. For example, it may be an
290-
index scan which looks up a value based on a column taken from the inner
291-
table. Each worker will execute the outer side of the plan in full, which
292-
is why merge joins are not supported here. The outer side of a merge join
293-
will often involve sorting the entire inner table; even if it involves an
294-
index, it is unlikely to be productive to have multiple processes each
295-
conduct a full index scan of the inner table.
290+
index scan which looks up a value taken from the outer side of the join.
291+
Each worker will execute the inner side of the join in full, which for
292+
hash join means that an identical hash table is built in each worker
293+
process.
296294
</para>
297295
</sect2>
298296

299297
<sect2 id="parallel-aggregation">
300298
<title>Parallel Aggregation</title>
301299
<para>
302-
It is not possible to perform the aggregation portion of a query entirely
303-
in parallel. For example, if a query involves selecting
304-
<literal>COUNT(*)</>, each worker could compute a total, but those totals
305-
would need to combined in order to produce a final answer. If the query
306-
involved a <literal>GROUP BY</> clause, a separate total would need to
307-
be computed for each group. Even though aggregation can't be done entirely
308-
in parallel, queries involving aggregation are often excellent candidates
309-
for parallel query, because they typically read many rows but return only
310-
a few rows to the client. Queries that return many rows to the client
311-
are often limited by the speed at which the client can read the data,
312-
in which case parallel query cannot help very much.
313-
</para>
314-
315-
<para>
316-
<productname>PostgreSQL</> supports parallel aggregation by aggregating
317-
twice. First, each process participating in the parallel portion of the
318-
query performs an aggregation step, producing a partial result for each
319-
group of which that process is aware. This is reflected in the plan as
320-
a <literal>PartialAggregate</> node. Second, the partial results are
300+
<productname>PostgreSQL</> supports parallel aggregation by aggregating in
301+
two stages. First, each process participating in the parallel portion of
302+
the query performs an aggregation step, producing a partial result for
303+
each group of which that process is aware. This is reflected in the plan
304+
as a <literal>Partial Aggregate</> node. Second, the partial results are
321305
transferred to the leader via the <literal>Gather</> node. Finally, the
322306
leader re-aggregates the results across all workers in order to produce
323307
the final result. This is reflected in the plan as a
324-
<literal>FinalizeAggregate</> node.
308+
<literal>Finalize Aggregate</> node.
309+
</para>
310+
311+
<para>
312+
Because the <literal>Finalize Aggregate</> node runs on the leader
313+
process, queries which produce a relatively large number of groups in
314+
comparison to the number of input rows will appear less favorable to the
315+
query planner. For example, in the worst-case scenario the number of
316+
groups seen by the <literal>Finalize Aggregate</> node could be as many as
317+
the number of input rows which were seen by all worker processes in the
318+
<literal>Partial Aggregate</> stage. For such cases, there is clearly
319+
going to be no performance benefit to using parallel aggregation. The
320+
query planner takes this into account during the planning process and is
321+
unlikely to choose parallel aggregate in this scenario.
325322
</para>
326323

327324
<para>
@@ -330,10 +327,11 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
330327
have a combine function. If the aggregate has a transition state of type
331328
<literal>internal</>, it must have serialization and deserialization
332329
functions. See <xref linkend="sql-createaggregate"> for more details.
333-
Parallel aggregation is not supported for ordered set aggregates or when
334-
the query involves <literal>GROUPING SETS</>. It can only be used when
335-
all joins involved in the query are also part of the parallel portion
336-
of the plan.
330+
Parallel aggregation is not supported if any aggregate function call
331+
contains <literal>DISTINCT</> or <literal>ORDER BY</> clause and is also
332+
not supported for ordered set aggregates or when the query involves
333+
<literal>GROUPING SETS</>. It can only be used when all joins involved in
334+
the query are also part of the parallel portion of the plan.
337335
</para>
338336

339337
</sect2>

0 commit comments

Comments
 (0)