Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 529ebb2

Browse files
committed
Generate EquivalenceClass members for partitionwise child join rels.
Commit d25ea01 got rid of what I thought were entirely unnecessary derived child expressions in EquivalenceClasses for EC members that mention multiple baserels. But it turns out that some of the child expressions that code created are necessary for partitionwise joins, else we fail to find matching pathkeys for Sort nodes. (This happens only for certain shapes of the resulting plan; it may be that partitionwise aggregation is also necessary to show the failure, though I'm not sure of that.) Reverting that commit entirely would be quite painful performance-wise for large partition sets. So instead, add code that explicitly generates child expressions that match only partitionwise child join rels we have actually generated. Per report from Justin Pryzby. (Amit Langote noticed the problem earlier, though it's not clear if he recognized then that it could result in a planner error, not merely failure to exploit partitionwise join, in the code as-committed.) Back-patch to v12 where commit d25ea01 came in. Amit Langote, with lots of kibitzing from me Discussion: https://postgr.es/m/CA+HiwqG2WVUGmLJqtR0tPFhniO=H=9qQ+Z3L_ZC+Y3-EVQHFGg@mail.gmail.com Discussion: https://postgr.es/m/20191011143703.GN10470@telsasoft.com
1 parent 2a4d96e commit 529ebb2

File tree

5 files changed

+259
-14
lines changed

5 files changed

+259
-14
lines changed

src/backend/optimizer/path/equivclass.c

+148-13
Original file line numberDiff line numberDiff line change
@@ -1132,7 +1132,7 @@ generate_join_implied_equalities(PlannerInfo *root,
11321132
Relids inner_relids = inner_rel->relids;
11331133
Relids nominal_inner_relids;
11341134
Relids nominal_join_relids;
1135-
Bitmapset * matching_ecs;
1135+
Bitmapset *matching_ecs;
11361136
int i;
11371137

11381138
/* If inner rel is a child, extra setup work is needed */
@@ -2209,21 +2209,29 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
22092209

22102210
/*
22112211
* add_child_rel_equivalences
2212-
* Search for EC members that reference the parent_rel, and
2212+
* Search for EC members that reference the root parent of child_rel, and
22132213
* add transformed members referencing the child_rel.
22142214
*
22152215
* Note that this function won't be called at all unless we have at least some
22162216
* reason to believe that the EC members it generates will be useful.
22172217
*
22182218
* parent_rel and child_rel could be derived from appinfo, but since the
22192219
* caller has already computed them, we might as well just pass them in.
2220+
*
2221+
* The passed-in AppendRelInfo is not used when the parent_rel is not a
2222+
* top-level baserel, since it shows the mapping from the parent_rel but
2223+
* we need to translate EC expressions that refer to the top-level parent.
2224+
* Using it is faster than using adjust_appendrel_attrs_multilevel(), though,
2225+
* so we prefer it when we can.
22202226
*/
22212227
void
22222228
add_child_rel_equivalences(PlannerInfo *root,
22232229
AppendRelInfo *appinfo,
22242230
RelOptInfo *parent_rel,
22252231
RelOptInfo *child_rel)
22262232
{
2233+
Relids top_parent_relids = child_rel->top_parent_relids;
2234+
Relids child_relids = child_rel->relids;
22272235
int i;
22282236

22292237
/*
@@ -2248,7 +2256,7 @@ add_child_rel_equivalences(PlannerInfo *root,
22482256
continue;
22492257

22502258
/* Sanity check eclass_indexes only contain ECs for parent_rel */
2251-
Assert(bms_is_subset(child_rel->top_parent_relids, cur_ec->ec_relids));
2259+
Assert(bms_is_subset(top_parent_relids, cur_ec->ec_relids));
22522260

22532261
/*
22542262
* We don't use foreach() here because there's no point in scanning
@@ -2268,13 +2276,14 @@ add_child_rel_equivalences(PlannerInfo *root,
22682276
* already-transformed child members. Otherwise, if some original
22692277
* member expression references more than one appendrel, we'd get
22702278
* an O(N^2) explosion of useless derived expressions for
2271-
* combinations of children.
2279+
* combinations of children. (But add_child_join_rel_equivalences
2280+
* may add targeted combinations for partitionwise-join purposes.)
22722281
*/
22732282
if (cur_em->em_is_child)
22742283
continue; /* ignore children here */
22752284

22762285
/* Does this member reference child's topmost parent rel? */
2277-
if (bms_overlap(cur_em->em_relids, child_rel->top_parent_relids))
2286+
if (bms_overlap(cur_em->em_relids, top_parent_relids))
22782287
{
22792288
/* Yes, generate transformed child version */
22802289
Expr *child_expr;
@@ -2295,8 +2304,8 @@ add_child_rel_equivalences(PlannerInfo *root,
22952304
child_expr = (Expr *)
22962305
adjust_appendrel_attrs_multilevel(root,
22972306
(Node *) cur_em->em_expr,
2298-
child_rel->relids,
2299-
child_rel->top_parent_relids);
2307+
child_relids,
2308+
top_parent_relids);
23002309
}
23012310

23022311
/*
@@ -2306,21 +2315,20 @@ add_child_rel_equivalences(PlannerInfo *root,
23062315
* don't want the child member to be marked as constant.
23072316
*/
23082317
new_relids = bms_difference(cur_em->em_relids,
2309-
child_rel->top_parent_relids);
2310-
new_relids = bms_add_members(new_relids, child_rel->relids);
2318+
top_parent_relids);
2319+
new_relids = bms_add_members(new_relids, child_relids);
23112320

23122321
/*
23132322
* And likewise for nullable_relids. Note this code assumes
23142323
* parent and child relids are singletons.
23152324
*/
23162325
new_nullable_relids = cur_em->em_nullable_relids;
2317-
if (bms_overlap(new_nullable_relids,
2318-
child_rel->top_parent_relids))
2326+
if (bms_overlap(new_nullable_relids, top_parent_relids))
23192327
{
23202328
new_nullable_relids = bms_difference(new_nullable_relids,
2321-
child_rel->top_parent_relids);
2329+
top_parent_relids);
23222330
new_nullable_relids = bms_add_members(new_nullable_relids,
2323-
child_rel->relids);
2331+
child_relids);
23242332
}
23252333

23262334
(void) add_eq_member(cur_ec, child_expr,
@@ -2334,6 +2342,133 @@ add_child_rel_equivalences(PlannerInfo *root,
23342342
}
23352343
}
23362344

2345+
/*
2346+
* add_child_join_rel_equivalences
2347+
* Like add_child_rel_equivalences(), but for joinrels
2348+
*
2349+
* Here we find the ECs relevant to the top parent joinrel and add transformed
2350+
* member expressions that refer to this child joinrel.
2351+
*
2352+
* Note that this function won't be called at all unless we have at least some
2353+
* reason to believe that the EC members it generates will be useful.
2354+
*/
2355+
void
2356+
add_child_join_rel_equivalences(PlannerInfo *root,
2357+
int nappinfos, AppendRelInfo **appinfos,
2358+
RelOptInfo *parent_joinrel,
2359+
RelOptInfo *child_joinrel)
2360+
{
2361+
Relids top_parent_relids = child_joinrel->top_parent_relids;
2362+
Relids child_relids = child_joinrel->relids;
2363+
Bitmapset *matching_ecs;
2364+
int i;
2365+
2366+
Assert(IS_JOIN_REL(child_joinrel) && IS_JOIN_REL(parent_joinrel));
2367+
2368+
/* We need consider only ECs that mention the parent joinrel */
2369+
matching_ecs = get_eclass_indexes_for_relids(root, top_parent_relids);
2370+
2371+
i = -1;
2372+
while ((i = bms_next_member(matching_ecs, i)) >= 0)
2373+
{
2374+
EquivalenceClass *cur_ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
2375+
int num_members;
2376+
2377+
/*
2378+
* If this EC contains a volatile expression, then generating child
2379+
* EMs would be downright dangerous, so skip it. We rely on a
2380+
* volatile EC having only one EM.
2381+
*/
2382+
if (cur_ec->ec_has_volatile)
2383+
continue;
2384+
2385+
/* Sanity check on get_eclass_indexes_for_relids result */
2386+
Assert(bms_overlap(top_parent_relids, cur_ec->ec_relids));
2387+
2388+
/*
2389+
* We don't use foreach() here because there's no point in scanning
2390+
* newly-added child members, so we can stop after the last
2391+
* pre-existing EC member.
2392+
*/
2393+
num_members = list_length(cur_ec->ec_members);
2394+
for (int pos = 0; pos < num_members; pos++)
2395+
{
2396+
EquivalenceMember *cur_em = (EquivalenceMember *) list_nth(cur_ec->ec_members, pos);
2397+
2398+
if (cur_em->em_is_const)
2399+
continue; /* ignore consts here */
2400+
2401+
/*
2402+
* We consider only original EC members here, not
2403+
* already-transformed child members.
2404+
*/
2405+
if (cur_em->em_is_child)
2406+
continue; /* ignore children here */
2407+
2408+
/*
2409+
* We may ignore expressions that reference a single baserel,
2410+
* because add_child_rel_equivalences should have handled them.
2411+
*/
2412+
if (bms_membership(cur_em->em_relids) != BMS_MULTIPLE)
2413+
continue;
2414+
2415+
/* Does this member reference child's topmost parent rel? */
2416+
if (bms_overlap(cur_em->em_relids, top_parent_relids))
2417+
{
2418+
/* Yes, generate transformed child version */
2419+
Expr *child_expr;
2420+
Relids new_relids;
2421+
Relids new_nullable_relids;
2422+
2423+
if (parent_joinrel->reloptkind == RELOPT_JOINREL)
2424+
{
2425+
/* Simple single-level transformation */
2426+
child_expr = (Expr *)
2427+
adjust_appendrel_attrs(root,
2428+
(Node *) cur_em->em_expr,
2429+
nappinfos, appinfos);
2430+
}
2431+
else
2432+
{
2433+
/* Must do multi-level transformation */
2434+
Assert(parent_joinrel->reloptkind == RELOPT_OTHER_JOINREL);
2435+
child_expr = (Expr *)
2436+
adjust_appendrel_attrs_multilevel(root,
2437+
(Node *) cur_em->em_expr,
2438+
child_relids,
2439+
top_parent_relids);
2440+
}
2441+
2442+
/*
2443+
* Transform em_relids to match. Note we do *not* do
2444+
* pull_varnos(child_expr) here, as for example the
2445+
* transformation might have substituted a constant, but we
2446+
* don't want the child member to be marked as constant.
2447+
*/
2448+
new_relids = bms_difference(cur_em->em_relids,
2449+
top_parent_relids);
2450+
new_relids = bms_add_members(new_relids, child_relids);
2451+
2452+
/*
2453+
* For nullable_relids, we must selectively replace parent
2454+
* nullable relids with child ones.
2455+
*/
2456+
new_nullable_relids = cur_em->em_nullable_relids;
2457+
if (bms_overlap(new_nullable_relids, top_parent_relids))
2458+
new_nullable_relids =
2459+
adjust_child_relids_multilevel(root,
2460+
new_nullable_relids,
2461+
child_relids,
2462+
top_parent_relids);
2463+
2464+
(void) add_eq_member(cur_ec, child_expr,
2465+
new_relids, new_nullable_relids,
2466+
true, cur_em->em_datatype);
2467+
}
2468+
}
2469+
}
2470+
}
2471+
23372472

23382473
/*
23392474
* generate_implied_equalities_for_column

src/backend/optimizer/util/relnode.c

+14-1
Original file line numberDiff line numberDiff line change
@@ -843,6 +843,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
843843
/* Compute information relevant to foreign relations. */
844844
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
845845

846+
/* Compute information needed for mapping Vars to the child rel */
846847
appinfos = find_appinfos_by_relids(root, joinrel->relids, &nappinfos);
847848

848849
/* Set up reltarget struct */
@@ -854,7 +855,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
854855
(Node *) parent_joinrel->joininfo,
855856
nappinfos,
856857
appinfos);
857-
pfree(appinfos);
858858

859859
/*
860860
* Lateral relids referred in child join will be same as that referred in
@@ -886,6 +886,19 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
886886
/* Add the relation to the PlannerInfo. */
887887
add_join_rel(root, joinrel);
888888

889+
/*
890+
* We might need EquivalenceClass members corresponding to the child join,
891+
* so that we can represent sort pathkeys for it. As with children of
892+
* baserels, we shouldn't need this unless there are relevant eclass joins
893+
* (implying that a merge join might be possible) or pathkeys to sort by.
894+
*/
895+
if (joinrel->has_eclass_joins || has_useful_pathkeys(root, parent_joinrel))
896+
add_child_join_rel_equivalences(root,
897+
nappinfos, appinfos,
898+
parent_joinrel, joinrel);
899+
900+
pfree(appinfos);
901+
889902
return joinrel;
890903
}
891904

src/include/optimizer/paths.h

+5
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,11 @@ extern void add_child_rel_equivalences(PlannerInfo *root,
153153
AppendRelInfo *appinfo,
154154
RelOptInfo *parent_rel,
155155
RelOptInfo *child_rel);
156+
extern void add_child_join_rel_equivalences(PlannerInfo *root,
157+
int nappinfos,
158+
AppendRelInfo **appinfos,
159+
RelOptInfo *parent_rel,
160+
RelOptInfo *child_rel);
156161
extern List *generate_implied_equalities_for_column(PlannerInfo *root,
157162
RelOptInfo *rel,
158163
ec_matches_callback_type callback,

src/test/regress/expected/partition_join.out

+77
Original file line numberDiff line numberDiff line change
@@ -459,6 +459,83 @@ SELECT t1.a, ss.t2a, ss.t2c FROM prt1 t1 LEFT JOIN LATERAL
459459
550 | |
460460
(12 rows)
461461

462+
-- bug with inadequate sort key representation
463+
SET enable_partitionwise_aggregate TO true;
464+
SET enable_hashjoin TO false;
465+
EXPLAIN (COSTS OFF)
466+
SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b)
467+
WHERE a BETWEEN 490 AND 510
468+
GROUP BY 1, 2 ORDER BY 1, 2;
469+
QUERY PLAN
470+
-------------------------------------------------------------------------------------------------------------------
471+
Group
472+
Group Key: (COALESCE(prt1_p1.a, p2.a)), (COALESCE(prt1_p1.b, p2.b))
473+
-> Merge Append
474+
Sort Key: (COALESCE(prt1_p1.a, p2.a)), (COALESCE(prt1_p1.b, p2.b))
475+
-> Group
476+
Group Key: (COALESCE(prt1_p1.a, p2.a)), (COALESCE(prt1_p1.b, p2.b))
477+
-> Sort
478+
Sort Key: (COALESCE(prt1_p1.a, p2.a)), (COALESCE(prt1_p1.b, p2.b))
479+
-> Merge Full Join
480+
Merge Cond: ((prt1_p1.a = p2.a) AND (prt1_p1.b = p2.b))
481+
Filter: ((COALESCE(prt1_p1.a, p2.a) >= 490) AND (COALESCE(prt1_p1.a, p2.a) <= 510))
482+
-> Sort
483+
Sort Key: prt1_p1.a, prt1_p1.b
484+
-> Seq Scan on prt1_p1
485+
-> Sort
486+
Sort Key: p2.a, p2.b
487+
-> Seq Scan on prt2_p1 p2
488+
-> Group
489+
Group Key: (COALESCE(prt1_p2.a, p2_1.a)), (COALESCE(prt1_p2.b, p2_1.b))
490+
-> Sort
491+
Sort Key: (COALESCE(prt1_p2.a, p2_1.a)), (COALESCE(prt1_p2.b, p2_1.b))
492+
-> Merge Full Join
493+
Merge Cond: ((prt1_p2.a = p2_1.a) AND (prt1_p2.b = p2_1.b))
494+
Filter: ((COALESCE(prt1_p2.a, p2_1.a) >= 490) AND (COALESCE(prt1_p2.a, p2_1.a) <= 510))
495+
-> Sort
496+
Sort Key: prt1_p2.a, prt1_p2.b
497+
-> Seq Scan on prt1_p2
498+
-> Sort
499+
Sort Key: p2_1.a, p2_1.b
500+
-> Seq Scan on prt2_p2 p2_1
501+
-> Group
502+
Group Key: (COALESCE(prt1_p3.a, p2_2.a)), (COALESCE(prt1_p3.b, p2_2.b))
503+
-> Sort
504+
Sort Key: (COALESCE(prt1_p3.a, p2_2.a)), (COALESCE(prt1_p3.b, p2_2.b))
505+
-> Merge Full Join
506+
Merge Cond: ((prt1_p3.a = p2_2.a) AND (prt1_p3.b = p2_2.b))
507+
Filter: ((COALESCE(prt1_p3.a, p2_2.a) >= 490) AND (COALESCE(prt1_p3.a, p2_2.a) <= 510))
508+
-> Sort
509+
Sort Key: prt1_p3.a, prt1_p3.b
510+
-> Seq Scan on prt1_p3
511+
-> Sort
512+
Sort Key: p2_2.a, p2_2.b
513+
-> Seq Scan on prt2_p3 p2_2
514+
(43 rows)
515+
516+
SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b)
517+
WHERE a BETWEEN 490 AND 510
518+
GROUP BY 1, 2 ORDER BY 1, 2;
519+
a | b
520+
-----+----
521+
490 | 15
522+
492 | 17
523+
494 | 19
524+
495 | 20
525+
496 | 21
526+
498 | 23
527+
500 | 0
528+
501 | 1
529+
502 | 2
530+
504 | 4
531+
506 | 6
532+
507 | 7
533+
508 | 8
534+
510 | 10
535+
(14 rows)
536+
537+
RESET enable_partitionwise_aggregate;
538+
RESET enable_hashjoin;
462539
--
463540
-- partitioned by expression
464541
--

src/test/regress/sql/partition_join.sql

+15
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,21 @@ SELECT t1.a, ss.t2a, ss.t2c FROM prt1 t1 LEFT JOIN LATERAL
9191
(SELECT t2.a AS t2a, t3.a AS t3a, t2.b t2b, t2.c t2c, least(t1.a,t2.a,t3.a) FROM prt1 t2 JOIN prt2 t3 ON (t2.a = t3.b)) ss
9292
ON t1.c = ss.t2c WHERE (t1.b + coalesce(ss.t2b, 0)) = 0 ORDER BY t1.a;
9393

94+
-- bug with inadequate sort key representation
95+
SET enable_partitionwise_aggregate TO true;
96+
SET enable_hashjoin TO false;
97+
98+
EXPLAIN (COSTS OFF)
99+
SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b)
100+
WHERE a BETWEEN 490 AND 510
101+
GROUP BY 1, 2 ORDER BY 1, 2;
102+
SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b)
103+
WHERE a BETWEEN 490 AND 510
104+
GROUP BY 1, 2 ORDER BY 1, 2;
105+
106+
RESET enable_partitionwise_aggregate;
107+
RESET enable_hashjoin;
108+
94109
--
95110
-- partitioned by expression
96111
--

0 commit comments

Comments
 (0)