Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit c45bf57

Browse files
committed
Fix planner crash from pfree'ing a partial path that a GatherPath uses.
We mustn't run generate_gather_paths() during add_paths_to_joinrel(), because that function can be invoked multiple times for the same target joinrel. Not only is it wasteful to build GatherPaths repeatedly, but a later add_partial_path() could delete the partial path that a previously created GatherPath depends on. Instead establish the convention that we do generate_gather_paths() for a rel only just before set_cheapest(). The code was accidentally not broken for baserels, because as of today there never is more than one partial path for a baserel. But that assumption obviously has a pretty short half-life, so move the generate_gather_paths() calls for those cases as well. Also add some generic comments explaining how and why this all works. Per fuzz testing by Andreas Seltenreich. Report: <871t5pgwdt.fsf@credativ.de>
1 parent 17d5db3 commit c45bf57

File tree

5 files changed

+71
-34
lines changed

5 files changed

+71
-34
lines changed

src/backend/optimizer/README

+13-2
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,16 @@ all the ways to produce the same set of joined base rels will share the
167167
same RelOptInfo, so the paths produced from different join combinations
168168
that produce equivalent joinrels will compete in add_path().
169169

170+
The dynamic-programming approach has an important property that's not
171+
immediately obvious: we will finish constructing all paths for a given
172+
relation before we construct any paths for relations containing that rel.
173+
This means that we can reliably identify the "cheapest path" for each rel
174+
before higher-level relations need to know that. Also, we can safely
175+
discard a path when we find that another path for the same rel is better,
176+
without worrying that maybe there is already a reference to that path in
177+
some higher-level join path. Without this, memory management for paths
178+
would be much more complicated.
179+
170180
Once we have built the final join rel, we use either the cheapest path
171181
for it or the cheapest path with the desired ordering (if that's cheaper
172182
than applying a sort to the cheapest other path).
@@ -321,8 +331,9 @@ set up for recursive handling of subqueries
321331
For each joinrel of the prior level, do make_rels_by_clause_joins()
322332
if it has join clauses, or make_rels_by_clauseless_joins() if not.
323333
Also generate "bushy plan" joins between joinrels of lower levels.
324-
Back at standard_join_search(), apply set_cheapest() to extract the
325-
cheapest path for each newly constructed joinrel.
334+
Back at standard_join_search(), generate gather paths if needed for
335+
each newly constructed joinrel, then apply set_cheapest() to extract
336+
the cheapest path for it.
326337
Loop back if this wasn't the top join level.
327338
Back at grouping_planner:
328339
do grouping (GROUP BY) and aggregation

src/backend/optimizer/geqo/geqo_eval.c

+3
Original file line numberDiff line numberDiff line change
@@ -266,6 +266,9 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
266266
/* Keep searching if join order is not valid */
267267
if (joinrel)
268268
{
269+
/* Create GatherPaths for any useful partial paths for rel */
270+
generate_gather_paths(root, joinrel);
271+
269272
/* Find and save the cheapest paths for this joinrel */
270273
set_cheapest(joinrel);
271274

src/backend/optimizer/path/allpaths.c

+30-20
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ static void set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
7373
Index rti, RangeTblEntry *rte);
7474
static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
7575
RangeTblEntry *rte);
76-
static void create_parallel_paths(PlannerInfo *root, RelOptInfo *rel);
76+
static void create_plain_partial_paths(PlannerInfo *root, RelOptInfo *rel);
7777
static void set_rel_consider_parallel(PlannerInfo *root, RelOptInfo *rel,
7878
RangeTblEntry *rte);
7979
static bool function_rte_parallel_ok(RangeTblEntry *rte);
@@ -447,6 +447,16 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
447447
}
448448
}
449449

450+
/*
451+
* If this is a baserel, consider gathering any partial paths we may have
452+
* created for it. (If we tried to gather inheritance children, we could
453+
* end up with a very large number of gather nodes, each trying to grab
454+
* its own pool of workers, so don't do this for otherrels. Instead,
455+
* we'll consider gathering partial paths for the parent appendrel.)
456+
*/
457+
if (rel->reloptkind == RELOPT_BASEREL)
458+
generate_gather_paths(root, rel);
459+
450460
/*
451461
* Allow a plugin to editorialize on the set of Paths for this base
452462
* relation. It could add new paths (such as CustomPaths) by calling
@@ -643,7 +653,7 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
643653

644654
/* If appropriate, consider parallel sequential scan */
645655
if (rel->consider_parallel && required_outer == NULL)
646-
create_parallel_paths(root, rel);
656+
create_plain_partial_paths(root, rel);
647657

648658
/* Consider index scans */
649659
create_index_paths(root, rel);
@@ -653,11 +663,11 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
653663
}
654664

655665
/*
656-
* create_parallel_paths
657-
* Build parallel access paths for a plain relation
666+
* create_plain_partial_paths
667+
* Build partial access paths for parallel scan of a plain relation
658668
*/
659669
static void
660-
create_parallel_paths(PlannerInfo *root, RelOptInfo *rel)
670+
create_plain_partial_paths(PlannerInfo *root, RelOptInfo *rel)
661671
{
662672
int parallel_degree = 1;
663673

@@ -712,16 +722,6 @@ create_parallel_paths(PlannerInfo *root, RelOptInfo *rel)
712722

713723
/* Add an unordered partial path based on a parallel sequential scan. */
714724
add_partial_path(rel, create_seqscan_path(root, rel, NULL, parallel_degree));
715-
716-
/*
717-
* If this is a baserel, consider gathering any partial paths we may have
718-
* just created. If we gathered an inheritance child, we could end up
719-
* with a very large number of gather nodes, each trying to grab its own
720-
* pool of workers, so don't do this in that case. Instead, we'll
721-
* consider gathering partial paths for the appendrel.
722-
*/
723-
if (rel->reloptkind == RELOPT_BASEREL)
724-
generate_gather_paths(root, rel);
725725
}
726726

727727
/*
@@ -1262,9 +1262,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
12621262
appendpath = create_append_path(rel, partial_subpaths, NULL,
12631263
parallel_degree);
12641264
add_partial_path(rel, (Path *) appendpath);
1265-
1266-
/* Consider gathering it. */
1267-
generate_gather_paths(root, rel);
12681265
}
12691266

12701267
/*
@@ -1970,6 +1967,10 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
19701967
* generate_gather_paths
19711968
* Generate parallel access paths for a relation by pushing a Gather on
19721969
* top of a partial path.
1970+
*
1971+
* This must not be called until after we're done creating all partial paths
1972+
* for the specified relation. (Otherwise, add_partial_path might delete a
1973+
* path that some GatherPath has a reference to.)
19731974
*/
19741975
void
19751976
generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
@@ -1983,7 +1984,9 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
19831984

19841985
/*
19851986
* The output of Gather is currently always unsorted, so there's only one
1986-
* partial path of interest: the cheapest one.
1987+
* partial path of interest: the cheapest one. That will be the one at
1988+
* the front of partial_pathlist because of the way add_partial_path
1989+
* works.
19871990
*
19881991
* Eventually, we should have a Gather Merge operation that can merge
19891992
* multiple tuple streams together while preserving their ordering. We
@@ -2148,12 +2151,19 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
21482151
join_search_one_level(root, lev);
21492152

21502153
/*
2151-
* Do cleanup work on each just-processed rel.
2154+
* Run generate_gather_paths() for each just-processed joinrel. We
2155+
* could not do this earlier because both regular and partial paths
2156+
* can get added to a particular joinrel at multiple times within
2157+
* join_search_one_level. After that, we're done creating paths
2158+
* for the joinrel, so run set_cheapest().
21522159
*/
21532160
foreach(lc, root->join_rel_level[lev])
21542161
{
21552162
rel = (RelOptInfo *) lfirst(lc);
21562163

2164+
/* Create GatherPaths for any useful partial paths for rel */
2165+
generate_gather_paths(root, rel);
2166+
21572167
/* Find and save the cheapest paths for this rel */
21582168
set_cheapest(rel);
21592169

src/backend/optimizer/path/joinpath.c

+1-6
Original file line numberDiff line numberDiff line change
@@ -223,12 +223,7 @@ add_paths_to_joinrel(PlannerInfo *root,
223223
jointype, &extra);
224224

225225
/*
226-
* 6. Consider gathering partial paths.
227-
*/
228-
generate_gather_paths(root, joinrel);
229-
230-
/*
231-
* 7. Finally, give extensions a chance to manipulate the path list.
226+
* 6. Finally, give extensions a chance to manipulate the path list.
232227
*/
233228
if (set_join_pathlist_hook)
234229
set_join_pathlist_hook(root, joinrel, outerrel, innerrel,

src/backend/optimizer/util/pathnode.c

+24-6
Original file line numberDiff line numberDiff line change
@@ -394,8 +394,14 @@ set_cheapest(RelOptInfo *parent_rel)
394394
* but just recycling discarded Path nodes is a very useful savings in
395395
* a large join tree. We can recycle the List nodes of pathlist, too.
396396
*
397-
* BUT: we do not pfree IndexPath objects, since they may be referenced as
398-
* children of BitmapHeapPaths as well as being paths in their own right.
397+
* As noted in optimizer/README, deleting a previously-accepted Path is
398+
* safe because we know that Paths of this rel cannot yet be referenced
399+
* from any other rel, such as a higher-level join. However, in some cases
400+
* it is possible that a Path is referenced by another Path for its own
401+
* rel; we must not delete such a Path, even if it is dominated by the new
402+
* Path. Currently this occurs only for IndexPath objects, which may be
403+
* referenced as children of BitmapHeapPaths as well as being paths in
404+
* their own right. Hence, we don't pfree IndexPaths when rejecting them.
399405
*
400406
* 'parent_rel' is the relation entry to which the path corresponds.
401407
* 'new_path' is a potential path for parent_rel.
@@ -711,6 +717,10 @@ add_path_precheck(RelOptInfo *parent_rel,
711717
* parallel such that each worker will generate a subset of the path's
712718
* overall result.
713719
*
720+
* As in add_path, the partial_pathlist is kept sorted with the cheapest
721+
* total path in front. This is depended on by multiple places, which
722+
* just take the front entry as the cheapest path without searching.
723+
*
714724
* We don't generate parameterized partial paths for several reasons. Most
715725
* importantly, they're not safe to execute, because there's nothing to
716726
* make sure that a parallel scan within the parameterized portion of the
@@ -721,15 +731,23 @@ add_path_precheck(RelOptInfo *parent_rel,
721731
* side of the plan will be small anyway. There could be rare cases where
722732
* this wins big - e.g. if join order constraints put a 1-row relation on
723733
* the outer side of the topmost join with a parameterized plan on the inner
724-
* side - but we'll have to be content not to handle such cases until somebody
725-
* builds an executor infrastructure that can cope with them.
734+
* side - but we'll have to be content not to handle such cases until
735+
* somebody builds an executor infrastructure that can cope with them.
726736
*
727737
* Because we don't consider parameterized paths here, we also don't
728738
* need to consider the row counts as a measure of quality: every path will
729739
* produce the same number of rows. Neither do we need to consider startup
730740
* costs: parallelism is only used for plans that will be run to completion.
731741
* Therefore, this routine is much simpler than add_path: it needs to
732742
* consider only pathkeys and total cost.
743+
*
744+
* As with add_path, we pfree paths that are found to be dominated by
745+
* another partial path; this requires that there be no other references to
746+
* such paths yet. Hence, GatherPaths must not be created for a rel until
747+
* we're done creating all partial paths for it. We do not currently build
748+
* partial indexscan paths, so there is no need for an exception for
749+
* IndexPaths here; for safety, we instead Assert that a path to be freed
750+
* isn't an IndexPath.
733751
*/
734752
void
735753
add_partial_path(RelOptInfo *parent_rel, Path *new_path)
@@ -808,7 +826,7 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path)
808826
{
809827
parent_rel->partial_pathlist =
810828
list_delete_cell(parent_rel->partial_pathlist, p1, p1_prev);
811-
/* add_path has a special case for IndexPath; we don't need it */
829+
/* we should not see IndexPaths here, so always safe to delete */
812830
Assert(!IsA(old_path, IndexPath));
813831
pfree(old_path);
814832
/* p1_prev does not advance */
@@ -842,7 +860,7 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path)
842860
}
843861
else
844862
{
845-
/* add_path has a special case for IndexPath; we don't need it */
863+
/* we should not see IndexPaths here, so always safe to delete */
846864
Assert(!IsA(new_path, IndexPath));
847865
/* Reject and recycle the new path */
848866
pfree(new_path);

0 commit comments

Comments
 (0)