Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit caf9c83

Browse files
committed
Improve planning of Materialize nodes inserted atop the inner input of a
mergejoin to shield it from doing mark/restore and refetches. Put an explicit flag in MergePath so we can centralize the logic that knows about this, and add costing logic that considers using Materialize even when it's not forced by the previously-existing considerations. This is in response to a discussion back in August that suggested that materializing an inner indexscan can be helpful when the refetch percentage is high enough.
1 parent 29faadc commit caf9c83

File tree

7 files changed

+125
-116
lines changed

7 files changed

+125
-116
lines changed

src/backend/nodes/outfuncs.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $PostgreSQL: pgsql/src/backend/nodes/outfuncs.c,v 1.371 2009/10/28 14:55:38 tgl Exp $
11+
* $PostgreSQL: pgsql/src/backend/nodes/outfuncs.c,v 1.372 2009/11/15 02:45:34 tgl Exp $
1212
*
1313
* NOTES
1414
* Every node type that can appear in stored rules' parsetrees *must*
@@ -1501,6 +1501,7 @@ _outMergePath(StringInfo str, MergePath *node)
15011501
WRITE_NODE_FIELD(path_mergeclauses);
15021502
WRITE_NODE_FIELD(outersortkeys);
15031503
WRITE_NODE_FIELD(innersortkeys);
1504+
WRITE_BOOL_FIELD(materialize_inner);
15041505
}
15051506

15061507
static void

src/backend/optimizer/path/allpaths.c

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $PostgreSQL: pgsql/src/backend/optimizer/path/allpaths.c,v 1.188 2009/10/26 02:26:33 tgl Exp $
11+
* $PostgreSQL: pgsql/src/backend/optimizer/path/allpaths.c,v 1.189 2009/11/15 02:45:35 tgl Exp $
1212
*
1313
*-------------------------------------------------------------------------
1414
*/
@@ -1443,13 +1443,12 @@ print_path(PlannerInfo *root, Path *path, int indent)
14431443
{
14441444
MergePath *mp = (MergePath *) path;
14451445

1446-
if (mp->outersortkeys || mp->innersortkeys)
1447-
{
1448-
for (i = 0; i < indent; i++)
1449-
printf("\t");
1450-
printf(" sortouter=%d sortinner=%d\n",
1451-
((mp->outersortkeys) ? 1 : 0),
1452-
((mp->innersortkeys) ? 1 : 0));
1446+
for (i = 0; i < indent; i++)
1447+
printf("\t");
1448+
printf(" sortouter=%d sortinner=%d materializeinner=%d\n",
1449+
((mp->outersortkeys) ? 1 : 0),
1450+
((mp->innersortkeys) ? 1 : 0),
1451+
((mp->materialize_inner) ? 1 : 0));
14531452
}
14541453
}
14551454

src/backend/optimizer/path/costsize.c

Lines changed: 89 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@
5454
* Portions Copyright (c) 1994, Regents of the University of California
5555
*
5656
* IDENTIFICATION
57-
* $PostgreSQL: pgsql/src/backend/optimizer/path/costsize.c,v 1.211 2009/09/12 22:12:03 tgl Exp $
57+
* $PostgreSQL: pgsql/src/backend/optimizer/path/costsize.c,v 1.212 2009/11/15 02:45:35 tgl Exp $
5858
*
5959
*-------------------------------------------------------------------------
6060
*/
@@ -1166,23 +1166,6 @@ cost_sort(Path *path, PlannerInfo *root,
11661166
path->total_cost = startup_cost + run_cost;
11671167
}
11681168

1169-
/*
1170-
* sort_exceeds_work_mem
1171-
* Given a finished Sort plan node, detect whether it is expected to
1172-
* spill to disk (ie, will need more than work_mem workspace)
1173-
*
1174-
* This assumes there will be no available LIMIT.
1175-
*/
1176-
bool
1177-
sort_exceeds_work_mem(Sort *sort)
1178-
{
1179-
double input_bytes = relation_byte_size(sort->plan.plan_rows,
1180-
sort->plan.plan_width);
1181-
long work_mem_bytes = work_mem * 1024L;
1182-
1183-
return (input_bytes > work_mem_bytes);
1184-
}
1185-
11861169
/*
11871170
* cost_material
11881171
* Determines and returns the cost of materializing a relation, including
@@ -1543,7 +1526,18 @@ cost_nestloop(NestPath *path, PlannerInfo *root, SpecialJoinInfo *sjinfo)
15431526
* Determines and returns the cost of joining two relations using the
15441527
* merge join algorithm.
15451528
*
1546-
* 'path' is already filled in except for the cost fields
1529+
* Unlike other costsize functions, this routine makes one actual decision:
1530+
* whether we should materialize the inner path. We do that either because
1531+
* the inner path can't support mark/restore, or because it's cheaper to
1532+
* use an interposed Material node to handle mark/restore. When the decision
1533+
* is cost-based it would be logically cleaner to build and cost two separate
1534+
* paths with and without that flag set; but that would require repeating most
1535+
* of the calculations here, which are not all that cheap. Since the choice
1536+
* will not affect output pathkeys or startup cost, only total cost, there is
1537+
* no possibility of wanting to keep both paths. So it seems best to make
1538+
* the decision here and record it in the path's materialize_inner field.
1539+
*
1540+
* 'path' is already filled in except for the cost fields and materialize_inner
15471541
* 'sjinfo' is extra info about the join for selectivity estimation
15481542
*
15491543
* Notes: path's mergeclauses should be a subset of the joinrestrictinfo list;
@@ -1561,7 +1555,10 @@ cost_mergejoin(MergePath *path, PlannerInfo *root, SpecialJoinInfo *sjinfo)
15611555
List *innersortkeys = path->innersortkeys;
15621556
Cost startup_cost = 0;
15631557
Cost run_cost = 0;
1564-
Cost cpu_per_tuple;
1558+
Cost cpu_per_tuple,
1559+
inner_run_cost,
1560+
bare_inner_cost,
1561+
mat_inner_cost;
15651562
QualCost merge_qual_cost;
15661563
QualCost qp_qual_cost;
15671564
double outer_path_rows = PATH_ROWS(outer_path);
@@ -1606,10 +1603,7 @@ cost_mergejoin(MergePath *path, PlannerInfo *root, SpecialJoinInfo *sjinfo)
16061603
/*
16071604
* When there are equal merge keys in the outer relation, the mergejoin
16081605
* must rescan any matching tuples in the inner relation. This means
1609-
* re-fetching inner tuples. Our cost model for this is that a re-fetch
1610-
* costs the same as an original fetch, which is probably an overestimate;
1611-
* but on the other hand we ignore the bookkeeping costs of mark/restore.
1612-
* Not clear if it's worth developing a more refined model.
1606+
* re-fetching inner tuples; we have to estimate how often that happens.
16131607
*
16141608
* For regular inner and outer joins, the number of re-fetches can be
16151609
* estimated approximately as size of merge join output minus size of
@@ -1641,7 +1635,7 @@ cost_mergejoin(MergePath *path, PlannerInfo *root, SpecialJoinInfo *sjinfo)
16411635
if (rescannedtuples < 0)
16421636
rescannedtuples = 0;
16431637
}
1644-
/* We'll inflate inner run cost this much to account for rescanning */
1638+
/* We'll inflate various costs this much to account for rescanning */
16451639
rescanratio = 1.0 + (rescannedtuples / inner_path_rows);
16461640

16471641
/*
@@ -1778,32 +1772,83 @@ cost_mergejoin(MergePath *path, PlannerInfo *root, SpecialJoinInfo *sjinfo)
17781772
-1.0);
17791773
startup_cost += sort_path.startup_cost;
17801774
startup_cost += (sort_path.total_cost - sort_path.startup_cost)
1781-
* innerstartsel * rescanratio;
1782-
run_cost += (sort_path.total_cost - sort_path.startup_cost)
1783-
* (innerendsel - innerstartsel) * rescanratio;
1784-
1785-
/*
1786-
* If the inner sort is expected to spill to disk, we want to add a
1787-
* materialize node to shield it from the need to handle mark/restore.
1788-
* This will allow it to perform the last merge pass on-the-fly, while
1789-
* in most cases not requiring the materialize to spill to disk.
1790-
* Charge an extra cpu_tuple_cost per tuple to account for the
1791-
* materialize node. (Keep this estimate in sync with similar ones in
1792-
* create_mergejoin_path and create_mergejoin_plan.)
1793-
*/
1794-
if (relation_byte_size(inner_path_rows, inner_path->parent->width) >
1795-
(work_mem * 1024L))
1796-
run_cost += cpu_tuple_cost * inner_path_rows;
1775+
* innerstartsel;
1776+
inner_run_cost = (sort_path.total_cost - sort_path.startup_cost)
1777+
* (innerendsel - innerstartsel);
17971778
}
17981779
else
17991780
{
18001781
startup_cost += inner_path->startup_cost;
18011782
startup_cost += (inner_path->total_cost - inner_path->startup_cost)
1802-
* innerstartsel * rescanratio;
1803-
run_cost += (inner_path->total_cost - inner_path->startup_cost)
1804-
* (innerendsel - innerstartsel) * rescanratio;
1783+
* innerstartsel;
1784+
inner_run_cost = (inner_path->total_cost - inner_path->startup_cost)
1785+
* (innerendsel - innerstartsel);
18051786
}
18061787

1788+
/*
1789+
* Decide whether we want to materialize the inner input to shield it from
1790+
* mark/restore and performing re-fetches. Our cost model for regular
1791+
* re-fetches is that a re-fetch costs the same as an original fetch,
1792+
* which is probably an overestimate; but on the other hand we ignore the
1793+
* bookkeeping costs of mark/restore. Not clear if it's worth developing
1794+
* a more refined model. So we just need to inflate the inner run cost
1795+
* by rescanratio.
1796+
*/
1797+
bare_inner_cost = inner_run_cost * rescanratio;
1798+
/*
1799+
* When we interpose a Material node the re-fetch cost is assumed to be
1800+
* just cpu_tuple_cost per tuple, independently of the underlying plan's
1801+
* cost; but we have to charge an extra cpu_tuple_cost per original fetch
1802+
* as well. Note that we're assuming the materialize node will never
1803+
* spill to disk, since it only has to remember tuples back to the last
1804+
* mark. (If there are a huge number of duplicates, our other cost
1805+
* factors will make the path so expensive that it probably won't get
1806+
* chosen anyway.) So we don't use cost_rescan here.
1807+
*
1808+
* Note: keep this estimate in sync with create_mergejoin_plan's labeling
1809+
* of the generated Material node.
1810+
*/
1811+
mat_inner_cost = inner_run_cost +
1812+
cpu_tuple_cost * inner_path_rows * rescanratio;
1813+
1814+
/* Prefer materializing if it looks cheaper */
1815+
if (mat_inner_cost < bare_inner_cost)
1816+
path->materialize_inner = true;
1817+
/*
1818+
* Even if materializing doesn't look cheaper, we *must* do it if the
1819+
* inner path is to be used directly (without sorting) and it doesn't
1820+
* support mark/restore.
1821+
*
1822+
* Since the inner side must be ordered, and only Sorts and IndexScans can
1823+
* create order to begin with, and they both support mark/restore, you
1824+
* might think there's no problem --- but you'd be wrong. Nestloop and
1825+
* merge joins can *preserve* the order of their inputs, so they can be
1826+
* selected as the input of a mergejoin, and they don't support
1827+
* mark/restore at present.
1828+
*/
1829+
else if (innersortkeys == NIL &&
1830+
!ExecSupportsMarkRestore(inner_path->pathtype))
1831+
path->materialize_inner = true;
1832+
/*
1833+
* Also, force materializing if the inner path is to be sorted and the
1834+
* sort is expected to spill to disk. This is because the final merge
1835+
* pass can be done on-the-fly if it doesn't have to support mark/restore.
1836+
* We don't try to adjust the cost estimates for this consideration,
1837+
* though.
1838+
*/
1839+
else if (innersortkeys != NIL &&
1840+
relation_byte_size(inner_path_rows, inner_path->parent->width) >
1841+
(work_mem * 1024L))
1842+
path->materialize_inner = true;
1843+
else
1844+
path->materialize_inner = false;
1845+
1846+
/* Charge the right incremental cost for the chosen case */
1847+
if (path->materialize_inner)
1848+
run_cost += mat_inner_cost;
1849+
else
1850+
run_cost += bare_inner_cost;
1851+
18071852
/* CPU costs */
18081853

18091854
/*

src/backend/optimizer/plan/createplan.c

Lines changed: 8 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
*
1111
*
1212
* IDENTIFICATION
13-
* $PostgreSQL: pgsql/src/backend/optimizer/plan/createplan.c,v 1.266 2009/10/26 02:26:33 tgl Exp $
13+
* $PostgreSQL: pgsql/src/backend/optimizer/plan/createplan.c,v 1.267 2009/11/15 02:45:35 tgl Exp $
1414
*
1515
*-------------------------------------------------------------------------
1616
*/
@@ -1664,9 +1664,8 @@ create_mergejoin_plan(PlannerInfo *root,
16641664
best_path->jpath.outerjoinpath->parent->relids);
16651665

16661666
/*
1667-
* Create explicit sort nodes for the outer and inner join paths if
1668-
* necessary. The sort cost was already accounted for in the path. Make
1669-
* sure there are no excess columns in the inputs if sorting.
1667+
* Create explicit sort nodes for the outer and inner paths if necessary.
1668+
* Make sure there are no excess columns in the inputs if sorting.
16701669
*/
16711670
if (best_path->outersortkeys)
16721671
{
@@ -1695,23 +1694,17 @@ create_mergejoin_plan(PlannerInfo *root,
16951694
innerpathkeys = best_path->jpath.innerjoinpath->pathkeys;
16961695

16971696
/*
1698-
* If inner plan is a sort that is expected to spill to disk, add a
1699-
* materialize node to shield it from the need to handle mark/restore.
1700-
* This will allow it to perform the last merge pass on-the-fly, while in
1701-
* most cases not requiring the materialize to spill to disk.
1702-
*
1703-
* XXX really, Sort oughta do this for itself, probably, to avoid the
1704-
* overhead of a separate plan node.
1697+
* If specified, add a materialize node to shield the inner plan from
1698+
* the need to handle mark/restore.
17051699
*/
1706-
if (IsA(inner_plan, Sort) &&
1707-
sort_exceeds_work_mem((Sort *) inner_plan))
1700+
if (best_path->materialize_inner)
17081701
{
17091702
Plan *matplan = (Plan *) make_material(inner_plan);
17101703

17111704
/*
17121705
* We assume the materialize will not spill to disk, and therefore
17131706
* charge just cpu_tuple_cost per tuple. (Keep this estimate in sync
1714-
* with similar ones in cost_mergejoin and create_mergejoin_path.)
1707+
* with cost_mergejoin.)
17151708
*/
17161709
copy_plan_costsize(matplan, inner_plan);
17171710
matplan->total_cost += cpu_tuple_cost * matplan->plan_rows;
@@ -1887,6 +1880,7 @@ create_mergejoin_plan(PlannerInfo *root,
18871880
inner_plan,
18881881
best_path->jpath.jointype);
18891882

1883+
/* Costs of sort and material steps are included in path cost already */
18901884
copy_path_costsize(&join_plan->join.plan, &best_path->jpath.path);
18911885

18921886
return join_plan;

src/backend/optimizer/util/pathnode.c

Lines changed: 2 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $PostgreSQL: pgsql/src/backend/optimizer/util/pathnode.c,v 1.154 2009/09/17 20:49:29 tgl Exp $
11+
* $PostgreSQL: pgsql/src/backend/optimizer/util/pathnode.c,v 1.155 2009/11/15 02:45:35 tgl Exp $
1212
*
1313
*-------------------------------------------------------------------------
1414
*/
@@ -17,7 +17,6 @@
1717
#include <math.h>
1818

1919
#include "catalog/pg_operator.h"
20-
#include "executor/executor.h"
2120
#include "miscadmin.h"
2221
#include "optimizer/clauses.h"
2322
#include "optimizer/cost.h"
@@ -1414,47 +1413,6 @@ create_mergejoin_path(PlannerInfo *root,
14141413
pathkeys_contained_in(innersortkeys, inner_path->pathkeys))
14151414
innersortkeys = NIL;
14161415

1417-
/*
1418-
* If we are not sorting the inner path, we may need a materialize node to
1419-
* ensure it can be marked/restored.
1420-
*
1421-
* Since the inner side must be ordered, and only Sorts and IndexScans can
1422-
* create order to begin with, and they both support mark/restore, you
1423-
* might think there's no problem --- but you'd be wrong. Nestloop and
1424-
* merge joins can *preserve* the order of their inputs, so they can be
1425-
* selected as the input of a mergejoin, and they don't support
1426-
* mark/restore at present.
1427-
*
1428-
* Note: Sort supports mark/restore, so no materialize is really needed in
1429-
* that case; but one may be desirable anyway to optimize the sort.
1430-
* However, since we aren't representing the sort step separately in the
1431-
* Path tree, we can't explicitly represent the materialize either. So
1432-
* that case is not handled here. Instead, cost_mergejoin has to factor
1433-
* in the cost and create_mergejoin_plan has to add the plan node.
1434-
*/
1435-
if (innersortkeys == NIL &&
1436-
!ExecSupportsMarkRestore(inner_path->pathtype))
1437-
{
1438-
Path *mpath;
1439-
1440-
mpath = (Path *) create_material_path(inner_path->parent, inner_path);
1441-
1442-
/*
1443-
* We expect the materialize won't spill to disk (it could only do so
1444-
* if there were a whole lot of duplicate tuples, which is a case
1445-
* cost_mergejoin will avoid choosing anyway). Therefore
1446-
* cost_material's cost estimate is bogus and we should charge just
1447-
* cpu_tuple_cost per tuple. (Keep this estimate in sync with similar
1448-
* ones in cost_mergejoin and create_mergejoin_plan; also see
1449-
* cost_rescan.)
1450-
*/
1451-
mpath->startup_cost = inner_path->startup_cost;
1452-
mpath->total_cost = inner_path->total_cost;
1453-
mpath->total_cost += cpu_tuple_cost * inner_path->parent->rows;
1454-
1455-
inner_path = mpath;
1456-
}
1457-
14581416
pathnode->jpath.path.pathtype = T_MergeJoin;
14591417
pathnode->jpath.path.parent = joinrel;
14601418
pathnode->jpath.jointype = jointype;
@@ -1465,6 +1423,7 @@ create_mergejoin_path(PlannerInfo *root,
14651423
pathnode->path_mergeclauses = mergeclauses;
14661424
pathnode->outersortkeys = outersortkeys;
14671425
pathnode->innersortkeys = innersortkeys;
1426+
/* pathnode->materialize_inner will be set by cost_mergejoin */
14681427

14691428
cost_mergejoin(pathnode, root, sjinfo);
14701429

0 commit comments

Comments
 (0)