Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit b58e8ca

Browse files
committed
Fix a PlaceHolderVar-related oversight in star-schema planning patch.
In commit b514a74, I changed the planner so that it would allow nestloop paths to remain partially parameterized, ie the inner relation might need parameters from both the current outer relation and some upper-level outer relation. That's fine so long as we're talking about distinct parameters; but the patch also allowed creation of nestloop paths for cases where the inner relation's parameter was a PlaceHolderVar whose eval_at set included the current outer relation and some upper-level one. That does *not* work. In principle we could allow such a PlaceHolderVar to be evaluated at the lower join node using values passed down from the upper relation along with values from the join's own outer relation. However, nodeNestloop.c only supports simple Vars not arbitrary expressions as nestloop parameters. createplan.c is also a few bricks shy of being able to handle such cases; it misplaces the PlaceHolderVar parameters in the plan tree, which is why the visible symptoms of this bug are "plan should not reference subplan's variable" and "failed to assign all NestLoopParams to plan nodes" planner errors. Adding the necessary complexity to make this work doesn't seem like it would be repaid in significantly better plans, because in cases where such a PHV exists, there is probably a corresponding join order constraint that would allow a good plan to be found without using the star-schema exception. Furthermore, adding complexity to nodeNestloop.c would create a run-time penalty even for plans where this whole consideration is irrelevant. So let's just reject such paths instead. Per fuzz testing by Andreas Seltenreich; the added regression test is based on his example query. Back-patch to 9.2, like the previous patch.
1 parent 3a35ca5 commit b58e8ca

File tree

3 files changed

+158
-26
lines changed

3 files changed

+158
-26
lines changed

src/backend/optimizer/path/joinpath.c

Lines changed: 76 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ add_paths_to_joinrel(PlannerInfo *root,
120120
* is usually no need to create a parameterized result path unless there
121121
* is a join order restriction that prevents joining one of our input rels
122122
* directly to the parameter source rel instead of joining to the other
123-
* input rel. (But see exception in try_nestloop_path.) This restriction
123+
* input rel. (But see allow_star_schema_join().) This restriction
124124
* reduces the number of parameterized paths we have to deal with at
125125
* higher join levels, without compromising the quality of the resulting
126126
* plan. We express the restriction as a Relids set that must overlap the
@@ -262,6 +262,74 @@ add_paths_to_joinrel(PlannerInfo *root,
262262
param_source_rels, extra_lateral_rels);
263263
}
264264

265+
/*
266+
* We override the param_source_rels heuristic to accept nestloop paths in
267+
* which the outer rel satisfies some but not all of the inner path's
268+
* parameterization. This is necessary to get good plans for star-schema
269+
* scenarios, in which a parameterized path for a large table may require
270+
* parameters from multiple small tables that will not get joined directly to
271+
* each other. We can handle that by stacking nestloops that have the small
272+
* tables on the outside; but this breaks the rule the param_source_rels
273+
* heuristic is based on, namely that parameters should not be passed down
274+
* across joins unless there's a join-order-constraint-based reason to do so.
275+
* So we ignore the param_source_rels restriction when this case applies.
276+
*
277+
* However, there's a pitfall: suppose the inner rel (call it A) has a
278+
* parameter that is a PlaceHolderVar, and that PHV's minimum eval_at set
279+
* includes the outer rel (B) and some third rel (C). If we treat this as a
280+
* star-schema case and create a B/A nestloop join that's parameterized by C,
281+
* we would end up with a plan in which the PHV's expression has to be
282+
* evaluated as a nestloop parameter at the B/A join; and the executor is only
283+
* set up to handle simple Vars as NestLoopParams. Rather than add complexity
284+
* and overhead to the executor for such corner cases, it seems better to
285+
* forbid the join. (Note that existence of such a PHV probably means there
286+
* is a join order constraint that will cause us to consider joining B and C
287+
* directly; so we can still make use of A's parameterized path, and there is
288+
* no need for the star-schema exception.) To implement this exception to the
289+
* exception, we check whether any PHVs used in the query could pose such a
290+
* hazard. We don't have any simple way of checking whether a risky PHV would
291+
* actually be used in the inner plan, and the case is so unusual that it
292+
* doesn't seem worth working very hard on it.
293+
*
294+
* allow_star_schema_join() returns TRUE if the param_source_rels restriction
295+
* should be overridden, ie, it's okay to perform this join.
296+
*/
297+
static bool
298+
allow_star_schema_join(PlannerInfo *root,
299+
Path *outer_path,
300+
Path *inner_path)
301+
{
302+
Relids innerparams = PATH_REQ_OUTER(inner_path);
303+
Relids outerrelids = outer_path->parent->relids;
304+
ListCell *lc;
305+
306+
/*
307+
* It's not a star-schema case unless the outer rel provides some but not
308+
* all of the inner rel's parameterization.
309+
*/
310+
if (!(bms_overlap(innerparams, outerrelids) &&
311+
bms_nonempty_difference(innerparams, outerrelids)))
312+
return false;
313+
314+
/* Check for hazardous PHVs */
315+
foreach(lc, root->placeholder_list)
316+
{
317+
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
318+
319+
if (!bms_is_subset(phinfo->ph_eval_at, innerparams))
320+
continue; /* ignore, could not be a nestloop param */
321+
if (!bms_overlap(phinfo->ph_eval_at, outerrelids))
322+
continue; /* ignore, not relevant to this join */
323+
if (bms_is_subset(phinfo->ph_eval_at, outerrelids))
324+
continue; /* safe, it can be eval'd within outerrel */
325+
/* Otherwise, it's potentially unsafe, so reject the join */
326+
return false;
327+
}
328+
329+
/* OK to perform the join */
330+
return true;
331+
}
332+
265333
/*
266334
* try_nestloop_path
267335
* Consider a nestloop join path; if it appears useful, push it into
@@ -285,36 +353,18 @@ try_nestloop_path(PlannerInfo *root,
285353

286354
/*
287355
* Check to see if proposed path is still parameterized, and reject if the
288-
* parameterization wouldn't be sensible.
356+
* parameterization wouldn't be sensible --- unless allow_star_schema_join
357+
* says to allow it anyway.
289358
*/
290359
required_outer = calc_nestloop_required_outer(outer_path,
291360
inner_path);
292361
if (required_outer &&
293-
!bms_overlap(required_outer, param_source_rels))
362+
!bms_overlap(required_outer, param_source_rels) &&
363+
!allow_star_schema_join(root, outer_path, inner_path))
294364
{
295-
/*
296-
* We override the param_source_rels heuristic to accept nestloop
297-
* paths in which the outer rel satisfies some but not all of the
298-
* inner path's parameterization. This is necessary to get good plans
299-
* for star-schema scenarios, in which a parameterized path for a
300-
* large table may require parameters from multiple small tables that
301-
* will not get joined directly to each other. We can handle that by
302-
* stacking nestloops that have the small tables on the outside; but
303-
* this breaks the rule the param_source_rels heuristic is based on,
304-
* namely that parameters should not be passed down across joins
305-
* unless there's a join-order-constraint-based reason to do so. So
306-
* ignore the param_source_rels restriction when this case applies.
307-
*/
308-
Relids outerrelids = outer_path->parent->relids;
309-
Relids innerparams = PATH_REQ_OUTER(inner_path);
310-
311-
if (!(bms_overlap(innerparams, outerrelids) &&
312-
bms_nonempty_difference(innerparams, outerrelids)))
313-
{
314-
/* Waste no memory when we reject a path here */
315-
bms_free(required_outer);
316-
return;
317-
}
365+
/* Waste no memory when we reject a path here */
366+
bms_free(required_outer);
367+
return;
318368
}
319369

320370
/*

src/test/regress/expected/join.out

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2909,6 +2909,57 @@ where thousand = a.q1 and tenthous = b.q1 and a.q2 = 1 and b.q2 = 2;
29092909
Index Cond: ((thousand = a.q1) AND (tenthous = b.q1))
29102910
(8 rows)
29112911

2912+
--
2913+
-- test a corner case in which we shouldn't apply the star-schema optimization
2914+
--
2915+
explain (costs off)
2916+
select t1.unique2, t1.stringu1, t2.unique1, t2.stringu2 from
2917+
tenk1 t1
2918+
inner join int4_tbl i1
2919+
left join (select v1.x2, v2.y1, 11 AS d1
2920+
from (values(1,0)) v1(x1,x2)
2921+
left join (values(3,1)) v2(y1,y2)
2922+
on v1.x1 = v2.y2) subq1
2923+
on (i1.f1 = subq1.x2)
2924+
on (t1.unique2 = subq1.d1)
2925+
left join tenk1 t2
2926+
on (subq1.y1 = t2.unique1)
2927+
where t1.unique2 < 42 and t1.stringu1 > t2.stringu2;
2928+
QUERY PLAN
2929+
------------------------------------------------------------------------------
2930+
Nested Loop
2931+
Join Filter: (t1.stringu1 > t2.stringu2)
2932+
-> Nested Loop
2933+
Join Filter: ("*VALUES*".column2 = i1.f1)
2934+
-> Nested Loop
2935+
-> Nested Loop
2936+
Join Filter: ("*VALUES*".column1 = "*VALUES*_1".column2)
2937+
-> Values Scan on "*VALUES*"
2938+
-> Values Scan on "*VALUES*_1"
2939+
-> Index Scan using tenk1_unique2 on tenk1 t1
2940+
Index Cond: ((unique2 = (11)) AND (unique2 < 42))
2941+
-> Seq Scan on int4_tbl i1
2942+
-> Index Scan using tenk1_unique1 on tenk1 t2
2943+
Index Cond: (unique1 = "*VALUES*_1".column1)
2944+
(14 rows)
2945+
2946+
select t1.unique2, t1.stringu1, t2.unique1, t2.stringu2 from
2947+
tenk1 t1
2948+
inner join int4_tbl i1
2949+
left join (select v1.x2, v2.y1, 11 AS d1
2950+
from (values(1,0)) v1(x1,x2)
2951+
left join (values(3,1)) v2(y1,y2)
2952+
on v1.x1 = v2.y2) subq1
2953+
on (i1.f1 = subq1.x2)
2954+
on (t1.unique2 = subq1.d1)
2955+
left join tenk1 t2
2956+
on (subq1.y1 = t2.unique1)
2957+
where t1.unique2 < 42 and t1.stringu1 > t2.stringu2;
2958+
unique2 | stringu1 | unique1 | stringu2
2959+
---------+----------+---------+----------
2960+
11 | WFAAAA | 3 | LKIAAA
2961+
(1 row)
2962+
29122963
--
29132964
-- test extraction of restriction OR clauses from join OR clause
29142965
-- (we used to only do this for indexable clauses)

src/test/regress/sql/join.sql

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -827,6 +827,37 @@ select * from
827827
tenk1, int8_tbl a, int8_tbl b
828828
where thousand = a.q1 and tenthous = b.q1 and a.q2 = 1 and b.q2 = 2;
829829

830+
--
831+
-- test a corner case in which we shouldn't apply the star-schema optimization
832+
--
833+
834+
explain (costs off)
835+
select t1.unique2, t1.stringu1, t2.unique1, t2.stringu2 from
836+
tenk1 t1
837+
inner join int4_tbl i1
838+
left join (select v1.x2, v2.y1, 11 AS d1
839+
from (values(1,0)) v1(x1,x2)
840+
left join (values(3,1)) v2(y1,y2)
841+
on v1.x1 = v2.y2) subq1
842+
on (i1.f1 = subq1.x2)
843+
on (t1.unique2 = subq1.d1)
844+
left join tenk1 t2
845+
on (subq1.y1 = t2.unique1)
846+
where t1.unique2 < 42 and t1.stringu1 > t2.stringu2;
847+
848+
select t1.unique2, t1.stringu1, t2.unique1, t2.stringu2 from
849+
tenk1 t1
850+
inner join int4_tbl i1
851+
left join (select v1.x2, v2.y1, 11 AS d1
852+
from (values(1,0)) v1(x1,x2)
853+
left join (values(3,1)) v2(y1,y2)
854+
on v1.x1 = v2.y2) subq1
855+
on (i1.f1 = subq1.x2)
856+
on (t1.unique2 = subq1.d1)
857+
left join tenk1 t2
858+
on (subq1.y1 = t2.unique1)
859+
where t1.unique2 < 42 and t1.stringu1 > t2.stringu2;
860+
830861
--
831862
-- test extraction of restriction OR clauses from join OR clause
832863
-- (we used to only do this for indexable clauses)

0 commit comments

Comments
 (0)