Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 9404879

Browse files
author
Etsuro Fujita
committed
Disable support for partitionwise joins in problematic cases.
Commit f49842d, which added support for partitionwise joins, built the child's tlist by applying adjust_appendrel_attrs() to the parent's. So in the case where the parent's included a whole-row Var for the parent, the child's contained a ConvertRowtypeExpr. To cope with that, that commit added code to the planner, such as setrefs.c, but some code paths still assumed that the tlist for a scan (or join) rel would only include Vars and PlaceHolderVars, which was true before that commit, causing errors: * When creating an explicit sort node for an input path for a mergejoin path for a child join, prepare_sort_from_pathkeys() threw the 'could not find pathkey item to sort' error. * When deparsing a relation participating in a pushed down child join as a subquery in contrib/postgres_fdw, get_relation_column_alias_ids() threw the 'unexpected expression in subquery output' error. * When performing set_plan_references() on a local join plan generated by contrib/postgres_fdw for EvalPlanQual support for a pushed down child join, fix_join_expr() threw the 'variable not found in subplan target lists' error. To fix these, two approaches have been proposed: one by Ashutosh Bapat and one by me. While the former keeps building the child's tlist with a ConvertRowtypeExpr, the latter builds it with a whole-row Var for the child not to violate the planner assumption, and tries to fix it up later, But both approaches need more work, so refuse to generate partitionwise join paths when whole-row Vars are involved, instead. We don't need to handle ConvertRowtypeExprs in the child's tlists for now, so this commit also removes the changes to the planner. Previously, partitionwise join computed attr_needed data for each child separately, and built the child join's tlist using that data, which also required an extra step for adding PlaceHolderVars to that tlist, but it would be more efficient to build it from the parent join's tlist through the adjust_appendrel_attrs() transformation. So this commit builds that list that way, and simplifies build_joinrel_tlist() and placeholder.c as well as part of set_append_rel_size() to basically what they were before partitionwise join went in. Back-patch to PG11 where partitionwise join was introduced. Report by Rajkumar Raghuwanshi. Analysis by Ashutosh Bapat, who also provided some of regression tests. Patch by me, reviewed by Robert Haas. Discussion: https://postgr.es/m/CAKcux6ktu-8tefLWtQuuZBYFaZA83vUzuRd7c1YHC-yEWyYFpg@mail.gmail.com
1 parent 762a165 commit 9404879

File tree

13 files changed

+266
-256
lines changed

13 files changed

+266
-256
lines changed

contrib/postgres_fdw/expected/postgres_fdw.out

Lines changed: 57 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -8218,8 +8218,9 @@ ALTER TABLE fprt2_p1 SET (autovacuum_enabled = 'false');
82188218
ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false');
82198219
INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 249, 3) i;
82208220
INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(250, 499, 3) i;
8221-
CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250)
8221+
CREATE FOREIGN TABLE ftprt2_p1 (b int, c varchar, a int)
82228222
SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true');
8223+
ALTER TABLE fprt2 ATTACH PARTITION ftprt2_p1 FOR VALUES FROM (0) TO (250);
82238224
CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500)
82248225
SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true');
82258226
ANALYZE fprt2;
@@ -8270,28 +8271,42 @@ SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10)
82708271
8 | |
82718272
(5 rows)
82728273

8273-
-- with whole-row reference
8274+
-- with whole-row reference; partitionwise join does not apply
82748275
EXPLAIN (COSTS OFF)
8275-
SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
8276-
QUERY PLAN
8277-
---------------------------------------------------------------------------------
8276+
SELECT t1.wr, t2.wr FROM (SELECT t1 wr, a FROM fprt1 t1 WHERE t1.a % 25 = 0) t1 FULL JOIN (SELECT t2 wr, b FROM fprt2 t2 WHERE t2.b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY 1,2;
8277+
QUERY PLAN
8278+
--------------------------------------------------------
82788279
Sort
82798280
Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2)
8280-
-> Append
8281-
-> Foreign Scan
8282-
Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2)
8283-
-> Foreign Scan
8284-
Relations: (public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2)
8285-
(7 rows)
8281+
-> Hash Full Join
8282+
Hash Cond: (t1.a = t2.b)
8283+
-> Append
8284+
-> Foreign Scan on ftprt1_p1 t1
8285+
-> Foreign Scan on ftprt1_p2 t1_1
8286+
-> Hash
8287+
-> Append
8288+
-> Foreign Scan on ftprt2_p1 t2
8289+
-> Foreign Scan on ftprt2_p2 t2_1
8290+
(11 rows)
82868291

8287-
SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
8288-
t1 | t2
8292+
SELECT t1.wr, t2.wr FROM (SELECT t1 wr, a FROM fprt1 t1 WHERE t1.a % 25 = 0) t1 FULL JOIN (SELECT t2 wr, b FROM fprt2 t2 WHERE t2.b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY 1,2;
8293+
wr | wr
82898294
----------------+----------------
82908295
(0,0,0000) | (0,0,0000)
8296+
(50,50,0001) |
8297+
(100,100,0002) |
82918298
(150,150,0003) | (150,150,0003)
8299+
(200,200,0004) |
82928300
(250,250,0005) | (250,250,0005)
8301+
(300,300,0006) |
8302+
(350,350,0007) |
82938303
(400,400,0008) | (400,400,0008)
8294-
(4 rows)
8304+
(450,450,0009) |
8305+
| (75,75,0001)
8306+
| (225,225,0004)
8307+
| (325,325,0006)
8308+
| (475,475,0009)
8309+
(14 rows)
82958310

82968311
-- join with lateral reference
82978312
EXPLAIN (COSTS OFF)
@@ -8355,6 +8370,34 @@ SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 't1_phv' phv, * FROM fprt1 WHERE
83558370
| | 475 | t2_phv
83568371
(14 rows)
83578372

8373+
-- test FOR UPDATE; partitionwise join does not apply
8374+
EXPLAIN (COSTS OFF)
8375+
SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE t1.a % 25 = 0 ORDER BY 1,2 FOR UPDATE OF t1;
8376+
QUERY PLAN
8377+
--------------------------------------------------------------
8378+
LockRows
8379+
-> Sort
8380+
Sort Key: t1.a
8381+
-> Hash Join
8382+
Hash Cond: (t2.b = t1.a)
8383+
-> Append
8384+
-> Foreign Scan on ftprt2_p1 t2
8385+
-> Foreign Scan on ftprt2_p2 t2_1
8386+
-> Hash
8387+
-> Append
8388+
-> Foreign Scan on ftprt1_p1 t1
8389+
-> Foreign Scan on ftprt1_p2 t1_1
8390+
(12 rows)
8391+
8392+
SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE t1.a % 25 = 0 ORDER BY 1,2 FOR UPDATE OF t1;
8393+
a | b
8394+
-----+-----
8395+
0 | 0
8396+
150 | 150
8397+
250 | 250
8398+
400 | 400
8399+
(4 rows)
8400+
83588401
RESET enable_partitionwise_join;
83598402
-- ===================================================================
83608403
-- test partitionwise aggregates

contrib/postgres_fdw/sql/postgres_fdw.sql

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2216,8 +2216,9 @@ ALTER TABLE fprt2_p1 SET (autovacuum_enabled = 'false');
22162216
ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false');
22172217
INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 249, 3) i;
22182218
INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(250, 499, 3) i;
2219-
CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250)
2219+
CREATE FOREIGN TABLE ftprt2_p1 (b int, c varchar, a int)
22202220
SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true');
2221+
ALTER TABLE fprt2 ATTACH PARTITION ftprt2_p1 FOR VALUES FROM (0) TO (250);
22212222
CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500)
22222223
SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true');
22232224
ANALYZE fprt2;
@@ -2234,10 +2235,10 @@ EXPLAIN (COSTS OFF)
22342235
SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
22352236
SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
22362237

2237-
-- with whole-row reference
2238+
-- with whole-row reference; partitionwise join does not apply
22382239
EXPLAIN (COSTS OFF)
2239-
SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
2240-
SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
2240+
SELECT t1.wr, t2.wr FROM (SELECT t1 wr, a FROM fprt1 t1 WHERE t1.a % 25 = 0) t1 FULL JOIN (SELECT t2 wr, b FROM fprt2 t2 WHERE t2.b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY 1,2;
2241+
SELECT t1.wr, t2.wr FROM (SELECT t1 wr, a FROM fprt1 t1 WHERE t1.a % 25 = 0) t1 FULL JOIN (SELECT t2 wr, b FROM fprt2 t2 WHERE t2.b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY 1,2;
22412242

22422243
-- join with lateral reference
22432244
EXPLAIN (COSTS OFF)
@@ -2249,6 +2250,11 @@ EXPLAIN (COSTS OFF)
22492250
SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 't1_phv' phv, * FROM fprt1 WHERE a % 25 = 0) t1 FULL JOIN (SELECT 't2_phv' phv, * FROM fprt2 WHERE b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY t1.a, t2.b;
22502251
SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 't1_phv' phv, * FROM fprt1 WHERE a % 25 = 0) t1 FULL JOIN (SELECT 't2_phv' phv, * FROM fprt2 WHERE b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY t1.a, t2.b;
22512252

2253+
-- test FOR UPDATE; partitionwise join does not apply
2254+
EXPLAIN (COSTS OFF)
2255+
SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE t1.a % 25 = 0 ORDER BY 1,2 FOR UPDATE OF t1;
2256+
SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE t1.a % 25 = 0 ORDER BY 1,2 FOR UPDATE OF t1;
2257+
22522258
RESET enable_partitionwise_join;
22532259

22542260

src/backend/nodes/outfuncs.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2366,6 +2366,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
23662366
WRITE_UINT_FIELD(baserestrict_min_security);
23672367
WRITE_NODE_FIELD(joininfo);
23682368
WRITE_BOOL_FIELD(has_eclass_joins);
2369+
WRITE_BOOL_FIELD(consider_partitionwise_join);
23692370
WRITE_BITMAPSET_FIELD(top_parent_relids);
23702371
WRITE_NODE_FIELD(partitioned_child_rels);
23712372
}

src/backend/optimizer/path/allpaths.c

Lines changed: 30 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -909,6 +909,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
909909
did_pruning = true;
910910
}
911911

912+
/*
913+
* If this is a partitioned baserel, set the consider_partitionwise_join
914+
* flag; currently, we only consider partitionwise joins with the baserel
915+
* if its targetlist doesn't contain a whole-row Var.
916+
*/
917+
if (enable_partitionwise_join &&
918+
rel->reloptkind == RELOPT_BASEREL &&
919+
rte->relkind == RELKIND_PARTITIONED_TABLE &&
920+
rel->attr_needed[InvalidAttrNumber - rel->min_attr] == NULL)
921+
rel->consider_partitionwise_join = true;
922+
912923
/*
913924
* Initialize to compute size estimates for whole append relation.
914925
*
@@ -956,54 +967,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
956967
childrel = find_base_rel(root, childRTindex);
957968
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
958969

959-
if (rel->part_scheme)
960-
{
961-
AttrNumber attno;
962-
963-
/*
964-
* We need attr_needed data for building targetlist of a join
965-
* relation representing join between matching partitions for
966-
* partitionwise join. A given attribute of a child will be needed
967-
* in the same highest joinrel where the corresponding attribute
968-
* of parent is needed. Hence it suffices to use the same Relids
969-
* set for parent and child.
970-
*/
971-
for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
972-
{
973-
int index = attno - rel->min_attr;
974-
Relids attr_needed = rel->attr_needed[index];
975-
976-
/* System attributes do not need translation. */
977-
if (attno <= 0)
978-
{
979-
Assert(rel->min_attr == childrel->min_attr);
980-
childrel->attr_needed[index] = attr_needed;
981-
}
982-
else
983-
{
984-
Var *var = list_nth_node(Var,
985-
appinfo->translated_vars,
986-
attno - 1);
987-
int child_index;
988-
989-
/*
990-
* Ignore any column dropped from the parent.
991-
* Corresponding Var won't have any translation. It won't
992-
* have attr_needed information, since it can not be
993-
* referenced in the query.
994-
*/
995-
if (var == NULL)
996-
{
997-
Assert(attr_needed == NULL);
998-
continue;
999-
}
1000-
1001-
child_index = var->varattno - childrel->min_attr;
1002-
childrel->attr_needed[child_index] = attr_needed;
1003-
}
1004-
}
1005-
}
1006-
1007970
/*
1008971
* Copy/Modify targetlist. Even if this child is deemed empty, we need
1009972
* its targetlist in case it falls on nullable side in a child-join
@@ -1180,6 +1143,22 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
11801143
(Node *) rel->joininfo,
11811144
1, &appinfo);
11821145

1146+
/*
1147+
* Note: we could compute appropriate attr_needed data for the child's
1148+
* variables, by transforming the parent's attr_needed through the
1149+
* translated_vars mapping. However, currently there's no need
1150+
* because attr_needed is only examined for base relations not
1151+
* otherrels. So we just leave the child's attr_needed empty.
1152+
*/
1153+
1154+
/*
1155+
* If we consider partitionwise joins with the parent rel, do the same
1156+
* for partitioned child rels.
1157+
*/
1158+
if (rel->consider_partitionwise_join &&
1159+
childRTE->relkind == RELKIND_PARTITIONED_TABLE)
1160+
childrel->consider_partitionwise_join = true;
1161+
11831162
/*
11841163
* If parallelism is allowable for this query in general, see whether
11851164
* it's allowable for this childrel in particular. But if we've
@@ -3538,6 +3517,9 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
35383517
if (!IS_PARTITIONED_REL(rel))
35393518
return;
35403519

3520+
/* The relation should have consider_partitionwise_join set. */
3521+
Assert(rel->consider_partitionwise_join);
3522+
35413523
/* Guard against stack overflow due to overly deep partition hierarchy. */
35423524
check_stack_depth();
35433525

src/backend/optimizer/path/joinrels.c

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1322,6 +1322,9 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
13221322
if (!IS_PARTITIONED_REL(joinrel))
13231323
return;
13241324

1325+
/* The join relation should have consider_partitionwise_join set. */
1326+
Assert(joinrel->consider_partitionwise_join);
1327+
13251328
/*
13261329
* Since this join relation is partitioned, all the base relations
13271330
* participating in this join must be partitioned and so are all the
@@ -1330,6 +1333,10 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
13301333
Assert(IS_PARTITIONED_REL(rel1) && IS_PARTITIONED_REL(rel2));
13311334
Assert(REL_HAS_ALL_PART_PROPS(rel1) && REL_HAS_ALL_PART_PROPS(rel2));
13321335

1336+
/* The joining relations should have consider_partitionwise_join set. */
1337+
Assert(rel1->consider_partitionwise_join &&
1338+
rel2->consider_partitionwise_join);
1339+
13331340
/*
13341341
* The partition scheme of the join relation should match that of the
13351342
* joining relations.

0 commit comments

Comments
 (0)