Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit e540b97

Browse files
committed
Fix an oversight in the 8.2 patch that improved mergejoin performance by
inserting a materialize node above an inner-side sort node, when the sort is expected to spill to disk. (The materialize protects the sort from having to support mark/restore, allowing it to do its final merge pass on-the-fly.) We neglected to teach cost_mergejoin about that hack, so it was failing to include the materialize's costs in the estimated cost of the mergejoin. The materialize's costs are generally going to be pretty negligible in comparison to the sort's, so this is only a small error and probably not worth back-patching; but it's still wrong. In the similar case where a materialize is inserted to protect an inner-side node that can't do mark/restore at all, it's still true that the materialize should not spill to disk, and so we should cost it cheaply rather than expensively. Noted while thinking about a question from Tom Raney.
1 parent 5882958 commit e540b97

File tree

3 files changed

+50
-12
lines changed

3 files changed

+50
-12
lines changed

src/backend/optimizer/path/costsize.c

+14-1
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@
5454
* Portions Copyright (c) 1994, Regents of the University of California
5555
*
5656
* IDENTIFICATION
57-
* $PostgreSQL: pgsql/src/backend/optimizer/path/costsize.c,v 1.196 2008/08/25 22:42:32 tgl Exp $
57+
* $PostgreSQL: pgsql/src/backend/optimizer/path/costsize.c,v 1.197 2008/09/05 21:07:29 tgl Exp $
5858
*
5959
*-------------------------------------------------------------------------
6060
*/
@@ -1584,6 +1584,19 @@ cost_mergejoin(MergePath *path, PlannerInfo *root, SpecialJoinInfo *sjinfo)
15841584
* innerstartsel * rescanratio;
15851585
run_cost += (sort_path.total_cost - sort_path.startup_cost)
15861586
* (innerendsel - innerstartsel) * rescanratio;
1587+
1588+
/*
1589+
* If the inner sort is expected to spill to disk, we want to add a
1590+
* materialize node to shield it from the need to handle mark/restore.
1591+
* This will allow it to perform the last merge pass on-the-fly, while
1592+
* in most cases not requiring the materialize to spill to disk.
1593+
* Charge an extra cpu_tuple_cost per tuple to account for the
1594+
* materialize node. (Keep this estimate in sync with similar ones in
1595+
* create_mergejoin_path and create_mergejoin_plan.)
1596+
*/
1597+
if (relation_byte_size(inner_path_rows, inner_path->parent->width) >
1598+
(work_mem * 1024L))
1599+
run_cost += cpu_tuple_cost * inner_path_rows;
15871600
}
15881601
else
15891602
{

src/backend/optimizer/plan/createplan.c

+3-2
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
*
1111
*
1212
* IDENTIFICATION
13-
* $PostgreSQL: pgsql/src/backend/optimizer/plan/createplan.c,v 1.247 2008/08/28 23:09:46 tgl Exp $
13+
* $PostgreSQL: pgsql/src/backend/optimizer/plan/createplan.c,v 1.248 2008/09/05 21:07:29 tgl Exp $
1414
*
1515
*-------------------------------------------------------------------------
1616
*/
@@ -1545,7 +1545,8 @@ create_mergejoin_plan(PlannerInfo *root,
15451545

15461546
/*
15471547
* We assume the materialize will not spill to disk, and therefore
1548-
* charge just cpu_tuple_cost per tuple.
1548+
* charge just cpu_tuple_cost per tuple. (Keep this estimate in sync
1549+
* with similar ones in cost_mergejoin and create_mergejoin_path.)
15491550
*/
15501551
copy_plan_costsize(matplan, inner_plan);
15511552
matplan->total_cost += cpu_tuple_cost * matplan->plan_rows;

src/backend/optimizer/util/pathnode.c

+33-9
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $PostgreSQL: pgsql/src/backend/optimizer/util/pathnode.c,v 1.146 2008/08/14 18:47:59 tgl Exp $
11+
* $PostgreSQL: pgsql/src/backend/optimizer/util/pathnode.c,v 1.147 2008/09/05 21:07:29 tgl Exp $
1212
*
1313
*-------------------------------------------------------------------------
1414
*/
@@ -1304,19 +1304,43 @@ create_mergejoin_path(PlannerInfo *root,
13041304

13051305
/*
13061306
* If we are not sorting the inner path, we may need a materialize node to
1307-
* ensure it can be marked/restored. (Sort does support mark/restore, so
1308-
* no materialize is needed in that case.)
1307+
* ensure it can be marked/restored.
13091308
*
13101309
* Since the inner side must be ordered, and only Sorts and IndexScans can
1311-
* create order to begin with, you might think there's no problem --- but
1312-
* you'd be wrong. Nestloop and merge joins can *preserve* the order of
1313-
* their inputs, so they can be selected as the input of a mergejoin, and
1314-
* they don't support mark/restore at present.
1310+
* create order to begin with, and they both support mark/restore, you
1311+
* might think there's no problem --- but you'd be wrong. Nestloop and
1312+
* merge joins can *preserve* the order of their inputs, so they can be
1313+
* selected as the input of a mergejoin, and they don't support
1314+
* mark/restore at present.
1315+
*
1316+
* Note: Sort supports mark/restore, so no materialize is really needed
1317+
* in that case; but one may be desirable anyway to optimize the sort.
1318+
* However, since we aren't representing the sort step separately in
1319+
* the Path tree, we can't explicitly represent the materialize either.
1320+
* So that case is not handled here. Instead, cost_mergejoin has to
1321+
* factor in the cost and create_mergejoin_plan has to add the plan node.
13151322
*/
13161323
if (innersortkeys == NIL &&
13171324
!ExecSupportsMarkRestore(inner_path->pathtype))
1318-
inner_path = (Path *)
1319-
create_material_path(inner_path->parent, inner_path);
1325+
{
1326+
Path *mpath;
1327+
1328+
mpath = (Path *) create_material_path(inner_path->parent, inner_path);
1329+
1330+
/*
1331+
* We expect the materialize won't spill to disk (it could only do
1332+
* so if there were a whole lot of duplicate tuples, which is a case
1333+
* cost_mergejoin will avoid choosing anyway). Therefore
1334+
* cost_material's cost estimate is bogus and we should charge
1335+
* just cpu_tuple_cost per tuple. (Keep this estimate in sync with
1336+
* similar ones in cost_mergejoin and create_mergejoin_plan.)
1337+
*/
1338+
mpath->startup_cost = inner_path->startup_cost;
1339+
mpath->total_cost = inner_path->total_cost;
1340+
mpath->total_cost += cpu_tuple_cost * inner_path->parent->rows;
1341+
1342+
inner_path = mpath;
1343+
}
13201344

13211345
pathnode->jpath.path.pathtype = T_MergeJoin;
13221346
pathnode->jpath.path.parent = joinrel;

0 commit comments

Comments
 (0)