Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit d70c147

Browse files
committed
Avoid crash in partitionwise join planning under GEQO.
While trying to plan a partitionwise join, we may be faced with cases where one or both input partitions for a particular segment of the join have been pruned away. In HEAD and v11, this is problematic because earlier processing didn't bother to make a pruned RelOptInfo fully valid. With an upcoming patch to make partition pruning more efficient, this'll be even more problematic because said RelOptInfo won't exist at all. The existing code attempts to deal with this by retroactively making the RelOptInfo fully valid, but that causes crashes under GEQO because join planning is done in a short-lived memory context. In v11 we could probably have fixed this by switching to the planner's main context while fixing up the RelOptInfo, but that idea doesn't scale well to the upcoming patch. It would be better not to mess with the base-relation data structures during join planning, anyway --- that's just a recipe for order-of-operations bugs. In many cases, though, we don't actually need the child RelOptInfo, because if the input is certainly empty then the join segment's result is certainly empty, so we can skip making a join plan altogether. (The existing code ultimately arrives at the same conclusion, but only after doing a lot more work.) This approach works except when the pruned-away partition is on the nullable side of a LEFT, ANTI, or FULL join, and the other side isn't pruned. But in those cases the existing code leaves a lot to be desired anyway --- the correct output is just the result of the unpruned side of the join, but we were emitting a useless outer join against a dummy Result. Pending somebody writing code to handle that more nicely, let's just abandon the partitionwise-join optimization in such cases. When the modified code skips making a join plan, it doesn't make a join RelOptInfo either; this requires some upper-level code to cope with nulls in part_rels[] arrays. We would have had to have that anyway after the upcoming patch. Back-patch to v11 since the crash is demonstrable there. Discussion: https://postgr.es/m/8305.1553884377@sss.pgh.pa.us
1 parent 26d4fda commit d70c147

File tree

7 files changed

+157
-163
lines changed

7 files changed

+157
-163
lines changed

src/backend/optimizer/path/allpaths.c

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1166,11 +1166,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
11661166
* for partitioned child rels.
11671167
*
11681168
* Note: here we abuse the consider_partitionwise_join flag by setting
1169-
* it *even* for child rels that are not partitioned. In that case,
1170-
* we set it to tell try_partitionwise_join() that it doesn't need to
1171-
* generate their targetlists and EC entries as they have already been
1172-
* generated here, as opposed to the dummy child rels for which the
1173-
* flag is left set to false so that it will generate them.
1169+
* it for child rels that are not themselves partitioned. We do so to
1170+
* tell try_partitionwise_join() that the child rel is sufficiently
1171+
* valid to be used as a per-partition input, even if it later gets
1172+
* proven to be dummy. (It's not usable until we've set up the
1173+
* reltarget and EC entries, which we just did.)
11741174
*/
11751175
if (rel->consider_partitionwise_join)
11761176
childrel->consider_partitionwise_join = true;
@@ -3551,7 +3551,9 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
35513551
{
35523552
RelOptInfo *child_rel = part_rels[cnt_parts];
35533553

3554-
Assert(child_rel != NULL);
3554+
/* If it's been pruned entirely, it's certainly dummy. */
3555+
if (child_rel == NULL)
3556+
continue;
35553557

35563558
/* Add partitionwise join paths for partitioned child-joins. */
35573559
generate_partitionwise_join_paths(root, child_rel);

src/backend/optimizer/path/joinrels.c

Lines changed: 56 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -43,8 +43,6 @@ static void try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1,
4343
RelOptInfo *rel2, RelOptInfo *joinrel,
4444
SpecialJoinInfo *parent_sjinfo,
4545
List *parent_restrictlist);
46-
static void update_child_rel_info(PlannerInfo *root,
47-
RelOptInfo *rel, RelOptInfo *childrel);
4846
static int match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel,
4947
bool strict_op);
5048

@@ -1401,6 +1399,10 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
14011399
{
14021400
RelOptInfo *child_rel1 = rel1->part_rels[cnt_parts];
14031401
RelOptInfo *child_rel2 = rel2->part_rels[cnt_parts];
1402+
bool rel1_empty = (child_rel1 == NULL ||
1403+
IS_DUMMY_REL(child_rel1));
1404+
bool rel2_empty = (child_rel2 == NULL ||
1405+
IS_DUMMY_REL(child_rel2));
14041406
SpecialJoinInfo *child_sjinfo;
14051407
List *child_restrictlist;
14061408
RelOptInfo *child_joinrel;
@@ -1409,24 +1411,69 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
14091411
int nappinfos;
14101412

14111413
/*
1412-
* If a child table has consider_partitionwise_join=false, it means
1414+
* Check for cases where we can prove that this segment of the join
1415+
* returns no rows, due to one or both inputs being empty (including
1416+
* inputs that have been pruned away entirely). If so just ignore it.
1417+
* These rules are equivalent to populate_joinrel_with_paths's rules
1418+
* for dummy input relations.
1419+
*/
1420+
switch (parent_sjinfo->jointype)
1421+
{
1422+
case JOIN_INNER:
1423+
case JOIN_SEMI:
1424+
if (rel1_empty || rel2_empty)
1425+
continue; /* ignore this join segment */
1426+
break;
1427+
case JOIN_LEFT:
1428+
case JOIN_ANTI:
1429+
if (rel1_empty)
1430+
continue; /* ignore this join segment */
1431+
break;
1432+
case JOIN_FULL:
1433+
if (rel1_empty && rel2_empty)
1434+
continue; /* ignore this join segment */
1435+
break;
1436+
default:
1437+
/* other values not expected here */
1438+
elog(ERROR, "unrecognized join type: %d",
1439+
(int) parent_sjinfo->jointype);
1440+
break;
1441+
}
1442+
1443+
/*
1444+
* If a child has been pruned entirely then we can't generate paths
1445+
* for it, so we have to reject partitionwise joining unless we were
1446+
* able to eliminate this partition above.
1447+
*/
1448+
if (child_rel1 == NULL || child_rel2 == NULL)
1449+
{
1450+
/*
1451+
* Mark the joinrel as unpartitioned so that later functions treat
1452+
* it correctly.
1453+
*/
1454+
joinrel->nparts = 0;
1455+
return;
1456+
}
1457+
1458+
/*
1459+
* If a leaf relation has consider_partitionwise_join=false, it means
14131460
* that it's a dummy relation for which we skipped setting up tlist
1414-
* expressions and adding EC members in set_append_rel_size(), so do
1415-
* that now for use later.
1461+
* expressions and adding EC members in set_append_rel_size(), so
1462+
* again we have to fail here.
14161463
*/
14171464
if (rel1_is_simple && !child_rel1->consider_partitionwise_join)
14181465
{
14191466
Assert(child_rel1->reloptkind == RELOPT_OTHER_MEMBER_REL);
14201467
Assert(IS_DUMMY_REL(child_rel1));
1421-
update_child_rel_info(root, rel1, child_rel1);
1422-
child_rel1->consider_partitionwise_join = true;
1468+
joinrel->nparts = 0;
1469+
return;
14231470
}
14241471
if (rel2_is_simple && !child_rel2->consider_partitionwise_join)
14251472
{
14261473
Assert(child_rel2->reloptkind == RELOPT_OTHER_MEMBER_REL);
14271474
Assert(IS_DUMMY_REL(child_rel2));
1428-
update_child_rel_info(root, rel2, child_rel2);
1429-
child_rel2->consider_partitionwise_join = true;
1475+
joinrel->nparts = 0;
1476+
return;
14301477
}
14311478

14321479
/* We should never try to join two overlapping sets of rels. */
@@ -1470,28 +1517,6 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
14701517
}
14711518
}
14721519

1473-
/*
1474-
* Set up tlist expressions for the childrel, and add EC members referencing
1475-
* the childrel.
1476-
*/
1477-
static void
1478-
update_child_rel_info(PlannerInfo *root,
1479-
RelOptInfo *rel, RelOptInfo *childrel)
1480-
{
1481-
AppendRelInfo *appinfo = root->append_rel_array[childrel->relid];
1482-
1483-
/* Make child tlist expressions */
1484-
childrel->reltarget->exprs = (List *)
1485-
adjust_appendrel_attrs(root,
1486-
(Node *) rel->reltarget->exprs,
1487-
1, &appinfo);
1488-
1489-
/* Make child entries in the EquivalenceClass as well */
1490-
if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
1491-
add_child_rel_equivalences(root, appinfo, rel, childrel);
1492-
childrel->has_eclass_joins = rel->has_eclass_joins;
1493-
}
1494-
14951520
/*
14961521
* Returns true if there exists an equi-join condition for each pair of
14971522
* partition keys from given relations being joined.

src/backend/optimizer/plan/planner.c

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6963,6 +6963,10 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
69636963
List *child_scanjoin_targets = NIL;
69646964
ListCell *lc;
69656965

6966+
/* Pruned or dummy children can be ignored. */
6967+
if (child_rel == NULL || IS_DUMMY_REL(child_rel))
6968+
continue;
6969+
69666970
/* Translate scan/join targets for this child. */
69676971
appinfos = find_appinfos_by_relids(root, child_rel->relids,
69686972
&nappinfos);
@@ -7063,8 +7067,9 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
70637067
RelOptInfo *child_grouped_rel;
70647068
RelOptInfo *child_partially_grouped_rel;
70657069

7066-
/* Input child rel must have a path */
7067-
Assert(child_input_rel->pathlist != NIL);
7070+
/* Pruned or dummy children can be ignored. */
7071+
if (child_input_rel == NULL || IS_DUMMY_REL(child_input_rel))
7072+
continue;
70687073

70697074
/*
70707075
* Copy the given "extra" structure as is and then override the
@@ -7106,14 +7111,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
71067111
extra->target_parallel_safe,
71077112
child_extra.havingQual);
71087113

7109-
/* Ignore empty children. They contribute nothing. */
7110-
if (IS_DUMMY_REL(child_input_rel))
7111-
{
7112-
mark_dummy_rel(child_grouped_rel);
7113-
7114-
continue;
7115-
}
7116-
71177114
/* Create grouping paths for this child relation. */
71187115
create_ordinary_grouping_paths(root, child_input_rel,
71197116
child_grouped_rel,

src/test/regress/expected/partition_aggregate.out

Lines changed: 39 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -716,37 +716,33 @@ SELECT a.x, sum(b.x) FROM pagg_tab1 a FULL OUTER JOIN pagg_tab2 b ON a.x = b.y G
716716
| 500
717717
(16 rows)
718718

719-
-- LEFT JOIN, with dummy relation on right side,
719+
-- LEFT JOIN, with dummy relation on right side, ideally
720720
-- should produce full partitionwise aggregation plan as GROUP BY is on
721-
-- non-nullable columns
721+
-- non-nullable columns.
722+
-- But right now we are unable to do partitionwise join in this case.
722723
EXPLAIN (COSTS OFF)
723724
SELECT a.x, b.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x < 20) a LEFT JOIN (SELECT * FROM pagg_tab2 WHERE y > 10) b ON a.x = b.y WHERE a.x > 5 or b.y < 20 GROUP BY a.x, b.y ORDER BY 1, 2;
724-
QUERY PLAN
725-
-----------------------------------------------------------------------------
725+
QUERY PLAN
726+
-----------------------------------------------------------------------
726727
Sort
727-
Sort Key: pagg_tab1_p1.x, y
728-
-> Append
729-
-> HashAggregate
730-
Group Key: pagg_tab1_p1.x, y
731-
-> Hash Left Join
732-
Hash Cond: (pagg_tab1_p1.x = y)
733-
Filter: ((pagg_tab1_p1.x > 5) OR (y < 20))
728+
Sort Key: pagg_tab1_p1.x, pagg_tab2_p2.y
729+
-> HashAggregate
730+
Group Key: pagg_tab1_p1.x, pagg_tab2_p2.y
731+
-> Hash Left Join
732+
Hash Cond: (pagg_tab1_p1.x = pagg_tab2_p2.y)
733+
Filter: ((pagg_tab1_p1.x > 5) OR (pagg_tab2_p2.y < 20))
734+
-> Append
734735
-> Seq Scan on pagg_tab1_p1
735736
Filter: (x < 20)
736-
-> Hash
737-
-> Result
738-
One-Time Filter: false
739-
-> HashAggregate
740-
Group Key: pagg_tab1_p2.x, pagg_tab2_p2.y
741-
-> Hash Left Join
742-
Hash Cond: (pagg_tab1_p2.x = pagg_tab2_p2.y)
743-
Filter: ((pagg_tab1_p2.x > 5) OR (pagg_tab2_p2.y < 20))
744737
-> Seq Scan on pagg_tab1_p2
745738
Filter: (x < 20)
746-
-> Hash
739+
-> Hash
740+
-> Append
747741
-> Seq Scan on pagg_tab2_p2
748742
Filter: (y > 10)
749-
(23 rows)
743+
-> Seq Scan on pagg_tab2_p3
744+
Filter: (y > 10)
745+
(18 rows)
750746

751747
SELECT a.x, b.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x < 20) a LEFT JOIN (SELECT * FROM pagg_tab2 WHERE y > 10) b ON a.x = b.y WHERE a.x > 5 or b.y < 20 GROUP BY a.x, b.y ORDER BY 1, 2;
752748
x | y | count
@@ -760,49 +756,33 @@ SELECT a.x, b.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x < 20) a LEFT JOI
760756
18 | 18 | 100
761757
(7 rows)
762758

763-
-- FULL JOIN, with dummy relations on both sides,
759+
-- FULL JOIN, with dummy relations on both sides, ideally
764760
-- should produce partial partitionwise aggregation plan as GROUP BY is on
765-
-- nullable columns
761+
-- nullable columns.
762+
-- But right now we are unable to do partitionwise join in this case.
766763
EXPLAIN (COSTS OFF)
767764
SELECT a.x, b.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x < 20) a FULL JOIN (SELECT * FROM pagg_tab2 WHERE y > 10) b ON a.x = b.y WHERE a.x > 5 or b.y < 20 GROUP BY a.x, b.y ORDER BY 1, 2;
768-
QUERY PLAN
769-
-----------------------------------------------------------------------------------
770-
Finalize GroupAggregate
771-
Group Key: pagg_tab1_p1.x, y
772-
-> Sort
773-
Sort Key: pagg_tab1_p1.x, y
774-
-> Append
775-
-> Partial HashAggregate
776-
Group Key: pagg_tab1_p1.x, y
777-
-> Hash Full Join
778-
Hash Cond: (pagg_tab1_p1.x = y)
779-
Filter: ((pagg_tab1_p1.x > 5) OR (y < 20))
780-
-> Seq Scan on pagg_tab1_p1
781-
Filter: (x < 20)
782-
-> Hash
783-
-> Result
784-
One-Time Filter: false
785-
-> Partial HashAggregate
786-
Group Key: pagg_tab1_p2.x, pagg_tab2_p2.y
787-
-> Hash Full Join
788-
Hash Cond: (pagg_tab1_p2.x = pagg_tab2_p2.y)
789-
Filter: ((pagg_tab1_p2.x > 5) OR (pagg_tab2_p2.y < 20))
790-
-> Seq Scan on pagg_tab1_p2
791-
Filter: (x < 20)
792-
-> Hash
793-
-> Seq Scan on pagg_tab2_p2
794-
Filter: (y > 10)
795-
-> Partial HashAggregate
796-
Group Key: x, pagg_tab2_p3.y
797-
-> Hash Full Join
798-
Hash Cond: (pagg_tab2_p3.y = x)
799-
Filter: ((x > 5) OR (pagg_tab2_p3.y < 20))
765+
QUERY PLAN
766+
-----------------------------------------------------------------------
767+
Sort
768+
Sort Key: pagg_tab1_p1.x, pagg_tab2_p2.y
769+
-> HashAggregate
770+
Group Key: pagg_tab1_p1.x, pagg_tab2_p2.y
771+
-> Hash Full Join
772+
Hash Cond: (pagg_tab1_p1.x = pagg_tab2_p2.y)
773+
Filter: ((pagg_tab1_p1.x > 5) OR (pagg_tab2_p2.y < 20))
774+
-> Append
775+
-> Seq Scan on pagg_tab1_p1
776+
Filter: (x < 20)
777+
-> Seq Scan on pagg_tab1_p2
778+
Filter: (x < 20)
779+
-> Hash
780+
-> Append
781+
-> Seq Scan on pagg_tab2_p2
782+
Filter: (y > 10)
800783
-> Seq Scan on pagg_tab2_p3
801784
Filter: (y > 10)
802-
-> Hash
803-
-> Result
804-
One-Time Filter: false
805-
(35 rows)
785+
(18 rows)
806786

807787
SELECT a.x, b.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x < 20) a FULL JOIN (SELECT * FROM pagg_tab2 WHERE y > 10) b ON a.x = b.y WHERE a.x > 5 or b.y < 20 GROUP BY a.x, b.y ORDER BY 1, 2;
808788
x | y | count

0 commit comments

Comments
 (0)