Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 3969f29

Browse files
committed
Revise GEQO planner to make use of some heuristic knowledge about SQL, namely
that it's good to join where there are join clauses rather than where there are not. Also enable it to generate bushy plans at need, so that it doesn't fail in the presence of multiple IN clauses containing sub-joins. These changes appear to improve the behavior enough that we can substantially reduce the default pool size and generations count, thereby decreasing the runtime, and yet get as good or better plans as we were getting in 7.4. Consequently, adjust the default GEQO parameters. I also modified the way geqo_effort is used so that it affects both population size and number of generations; it's now useful as a single control to adjust the GEQO runtime-vs-plan-quality tradeoff. Bump geqo_threshold to 12, since even with these changes GEQO seems to be slower than the regular planner at 11 relations.
1 parent 81c554b commit 3969f29

File tree

9 files changed

+326
-168
lines changed

9 files changed

+326
-168
lines changed

doc/src/sgml/runtime.sgml

Lines changed: 20 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$PostgreSQL: pgsql/doc/src/sgml/runtime.sgml,v 1.231 2004/01/21 23:33:34 tgl Exp $
2+
$PostgreSQL: pgsql/doc/src/sgml/runtime.sgml,v 1.232 2004/01/23 23:54:20 tgl Exp $
33
-->
44

55
<Chapter Id="runtime">
@@ -1396,33 +1396,41 @@ SET ENABLE_SEQSCAN TO OFF;
13961396
Use genetic query optimization to plan queries with at least
13971397
this many <literal>FROM</> items involved. (Note that an outer
13981398
<literal>JOIN</> construct counts as only one <literal>FROM</>
1399-
item.) The default is 11. For simpler queries it is usually best
1399+
item.) The default is 12. For simpler queries it is usually best
14001400
to use the deterministic, exhaustive planner, but for queries with
14011401
many tables the deterministic planner takes too long.
14021402
</para>
14031403
</listitem>
14041404
</varlistentry>
14051405

14061406
<varlistentry>
1407+
<term><varname>geqo_effort</varname> (<type>integer</type>)</term>
14071408
<term><varname>geqo_pool_size</varname> (<type>integer</type>)</term>
14081409
<term><varname>geqo_generations</varname> (<type>integer</type>)</term>
1409-
<term><varname>geqo_effort</varname> (<type>integer</type>)</term>
14101410
<term><varname>geqo_selection_bias</varname> (<type>floating point</type>)</term>
14111411
<listitem>
14121412
<para>
14131413
Various tuning parameters for the genetic query optimization
1414-
algorithm. The pool size is the number of individuals in one
1415-
population. Valid values are between 128 and 1024. If it is set
1416-
to 0 (the default) a pool size of 2^(QS+1), where QS is the
1417-
number of <literal>FROM</> items in the query, is used.
1414+
algorithm. The recommended one to modify is
1415+
<varname>geqo_effort</varname>, which can range from 1 to 10 with
1416+
a default of 5. Larger values increase the time spent in planning
1417+
but make it more likely that a good plan will be found.
1418+
<varname>geqo_effort</varname> doesn't actually do anything directly,
1419+
it is just used to compute the default values for the other
1420+
parameters. If you prefer, you can set the other parameters by hand
1421+
instead.
1422+
The pool size is the number of individuals in the genetic population.
1423+
It must be at least two, and useful values are typically 100 to 1000.
1424+
If it is set to zero (the default setting) then a suitable default
1425+
is chosen based on <varname>geqo_effort</varname> and the number of
1426+
tables in the query.
14181427
Generations specifies the number of iterations of the algorithm.
1419-
The value must be a positive integer. If 0 is specified then
1420-
<literal>Effort * Log2(PoolSize)</literal> is used.
1428+
It must be at least one, and useful values are in the same range
1429+
as the pool size.
1430+
If it is set to zero (the default setting) then a suitable default
1431+
is chosen based on the pool size.
14211432
The run time of the algorithm is roughly proportional to the sum of
14221433
pool size and generations.
1423-
<varname>geqo_effort</varname> is only used in computing the default
1424-
generations setting, as just described. The default value is 40,
1425-
and the allowed range 1 to 100.
14261434
The selection bias is the selective pressure within the
14271435
population. Values can be from 1.50 to 2.00; the latter is the
14281436
default.

src/backend/optimizer/geqo/geqo_eval.c

Lines changed: 136 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
* Portions Copyright (c) 1996-2003, PostgreSQL Global Development Group
77
* Portions Copyright (c) 1994, Regents of the University of California
88
*
9-
* $PostgreSQL: pgsql/src/backend/optimizer/geqo/geqo_eval.c,v 1.66 2003/11/29 19:51:50 pgsql Exp $
9+
* $PostgreSQL: pgsql/src/backend/optimizer/geqo/geqo_eval.c,v 1.67 2004/01/23 23:54:21 tgl Exp $
1010
*
1111
*-------------------------------------------------------------------------
1212
*/
@@ -31,13 +31,17 @@
3131
#include "utils/memutils.h"
3232

3333

34+
static bool desirable_join(Query *root,
35+
RelOptInfo *outer_rel, RelOptInfo *inner_rel);
36+
37+
3438
/*
3539
* geqo_eval
3640
*
3741
* Returns cost of a query tree as an individual of the population.
3842
*/
3943
Cost
40-
geqo_eval(Query *root, List *initial_rels, Gene *tour, int num_gene)
44+
geqo_eval(Gene *tour, int num_gene, GeqoEvalData *evaldata)
4145
{
4246
MemoryContext mycontext;
4347
MemoryContext oldcxt;
@@ -52,9 +56,9 @@ geqo_eval(Query *root, List *initial_rels, Gene *tour, int num_gene)
5256
* redundant cost calculations, we simply reject tours where tour[0] >
5357
* tour[1], assigning them an artificially bad fitness.
5458
*
55-
* (It would be better to tweak the GEQO logic to not generate such tours
56-
* in the first place, but I'm not sure of all the implications in the
57-
* mutation logic.)
59+
* init_tour() is aware of this rule and so we should never reject a
60+
* tour during the initial filling of the pool. It seems difficult to
61+
* persuade the recombination logic never to break the rule, however.
5862
*/
5963
if (num_gene >= 2 && tour[0] > tour[1])
6064
return DBL_MAX;
@@ -80,10 +84,10 @@ geqo_eval(Query *root, List *initial_rels, Gene *tour, int num_gene)
8084
* this, it'll be pointing at recycled storage after the
8185
* MemoryContextDelete below.
8286
*/
83-
savelist = root->join_rel_list;
87+
savelist = evaldata->root->join_rel_list;
8488

8589
/* construct the best path for the given combination of relations */
86-
joinrel = gimme_tree(root, initial_rels, tour, num_gene);
90+
joinrel = gimme_tree(tour, num_gene, evaldata);
8791

8892
/*
8993
* compute fitness
@@ -97,7 +101,7 @@ geqo_eval(Query *root, List *initial_rels, Gene *tour, int num_gene)
97101
fitness = DBL_MAX;
98102

99103
/* restore join_rel_list */
100-
root->join_rel_list = savelist;
104+
evaldata->root->join_rel_list = savelist;
101105

102106
/* release all the memory acquired within gimme_tree */
103107
MemoryContextSwitchTo(oldcxt);
@@ -111,63 +115,156 @@ geqo_eval(Query *root, List *initial_rels, Gene *tour, int num_gene)
111115
* Form planner estimates for a join tree constructed in the specified
112116
* order.
113117
*
114-
* 'root' is the Query
115-
* 'initial_rels' is the list of initial relations (FROM-list items)
116118
* 'tour' is the proposed join order, of length 'num_gene'
119+
* 'evaldata' contains the context we need
117120
*
118121
* Returns a new join relation whose cheapest path is the best plan for
119122
* this join order. NB: will return NULL if join order is invalid.
120123
*
121-
* Note that at each step we consider using the next rel as both left and
122-
* right side of a join. However, we cannot build general ("bushy") plan
123-
* trees this way, only left-sided and right-sided trees.
124+
* The original implementation of this routine always joined in the specified
125+
* order, and so could only build left-sided plans (and right-sided and
126+
* mixtures, as a byproduct of the fact that make_join_rel() is symmetric).
127+
* It could never produce a "bushy" plan. This had a couple of big problems,
128+
* of which the worst was that as of 7.4, there are situations involving IN
129+
* subqueries where the only valid plans are bushy.
130+
*
131+
* The present implementation takes the given tour as a guideline, but
132+
* postpones joins that seem unsuitable according to some heuristic rules.
133+
* This allows correct bushy plans to be generated at need, and as a nice
134+
* side-effect it seems to materially improve the quality of the generated
135+
* plans.
124136
*/
125137
RelOptInfo *
126-
gimme_tree(Query *root, List *initial_rels,
127-
Gene *tour, int num_gene)
138+
gimme_tree(Gene *tour, int num_gene, GeqoEvalData *evaldata)
128139
{
140+
RelOptInfo **stack;
141+
int stack_depth;
129142
RelOptInfo *joinrel;
130-
int cur_rel_index;
131143
int rel_count;
132144

133145
/*
134-
* Start with the first relation ...
146+
* Create a stack to hold not-yet-joined relations.
135147
*/
136-
cur_rel_index = (int) tour[0];
137-
138-
joinrel = (RelOptInfo *) nth(cur_rel_index - 1, initial_rels);
148+
stack = (RelOptInfo **) palloc(num_gene * sizeof(RelOptInfo *));
149+
stack_depth = 0;
139150

140151
/*
141-
* And add on each relation in the specified order ...
152+
* Push each relation onto the stack in the specified order. After
153+
* pushing each relation, see whether the top two stack entries are
154+
* joinable according to the desirable_join() heuristics. If so,
155+
* join them into one stack entry, and try again to combine with the
156+
* next stack entry down (if any). When the stack top is no longer
157+
* joinable, continue to the next input relation. After we have pushed
158+
* the last input relation, the heuristics are disabled and we force
159+
* joining all the remaining stack entries.
160+
*
161+
* If desirable_join() always returns true, this produces a straight
162+
* left-to-right join just like the old code. Otherwise we may produce
163+
* a bushy plan or a left/right-sided plan that really corresponds to
164+
* some tour other than the one given. To the extent that the heuristics
165+
* are helpful, however, this will be a better plan than the raw tour.
166+
*
167+
* Also, when a join attempt fails (because of IN-clause constraints),
168+
* we may be able to recover and produce a workable plan, where the old
169+
* code just had to give up. This case acts the same as a false result
170+
* from desirable_join().
142171
*/
143-
for (rel_count = 1; rel_count < num_gene; rel_count++)
172+
for (rel_count = 0; rel_count < num_gene; rel_count++)
144173
{
145-
RelOptInfo *inner_rel;
146-
RelOptInfo *new_rel;
174+
int cur_rel_index;
147175

176+
/* Get the next input relation and push it */
148177
cur_rel_index = (int) tour[rel_count];
149-
150-
inner_rel = (RelOptInfo *) nth(cur_rel_index - 1, initial_rels);
178+
stack[stack_depth] = (RelOptInfo *) nth(cur_rel_index - 1,
179+
evaldata->initial_rels);
180+
stack_depth++;
151181

152182
/*
153-
* Construct a RelOptInfo representing the previous joinrel joined
154-
* to inner_rel. These are always inner joins. Note that we
155-
* expect the joinrel not to exist in root->join_rel_list yet, and
156-
* so the paths constructed for it will only include the ones we
157-
* want.
183+
* While it's feasible, pop the top two stack entries and replace
184+
* with their join.
158185
*/
159-
new_rel = make_join_rel(root, joinrel, inner_rel, JOIN_INNER);
186+
while (stack_depth >= 2)
187+
{
188+
RelOptInfo *outer_rel = stack[stack_depth - 2];
189+
RelOptInfo *inner_rel = stack[stack_depth - 1];
190+
191+
/*
192+
* Don't pop if heuristics say not to join now. However,
193+
* once we have exhausted the input, the heuristics can't
194+
* prevent popping.
195+
*/
196+
if (rel_count < num_gene - 1 &&
197+
!desirable_join(evaldata->root, outer_rel, inner_rel))
198+
break;
160199

161-
/* Fail if join order is not valid */
162-
if (new_rel == NULL)
163-
return NULL;
200+
/*
201+
* Construct a RelOptInfo representing the join of these
202+
* two input relations. These are always inner joins.
203+
* Note that we expect the joinrel not to exist in
204+
* root->join_rel_list yet, and so the paths constructed for it
205+
* will only include the ones we want.
206+
*/
207+
joinrel = make_join_rel(evaldata->root, outer_rel, inner_rel,
208+
JOIN_INNER);
164209

165-
/* Find and save the cheapest paths for this rel */
166-
set_cheapest(new_rel);
210+
/* Can't pop stack here if join order is not valid */
211+
if (!joinrel)
212+
break;
167213

168-
/* and repeat... */
169-
joinrel = new_rel;
214+
/* Find and save the cheapest paths for this rel */
215+
set_cheapest(joinrel);
216+
217+
/* Pop the stack and replace the inputs with their join */
218+
stack_depth--;
219+
stack[stack_depth - 1] = joinrel;
220+
}
170221
}
171222

223+
/* Did we succeed in forming a single join relation? */
224+
if (stack_depth == 1)
225+
joinrel = stack[0];
226+
else
227+
joinrel = NULL;
228+
229+
pfree(stack);
230+
172231
return joinrel;
173232
}
233+
234+
/*
235+
* Heuristics for gimme_tree: do we want to join these two relations?
236+
*/
237+
static bool
238+
desirable_join(Query *root,
239+
RelOptInfo *outer_rel, RelOptInfo *inner_rel)
240+
{
241+
List *i;
242+
243+
/*
244+
* Join if there is an applicable join clause.
245+
*/
246+
foreach(i, outer_rel->joininfo)
247+
{
248+
JoinInfo *joininfo = (JoinInfo *) lfirst(i);
249+
250+
if (bms_is_subset(joininfo->unjoined_relids, inner_rel->relids))
251+
return true;
252+
}
253+
254+
/*
255+
* Join if the rels are members of the same IN sub-select. This is
256+
* needed to improve the odds that we will find a valid solution in
257+
* a case where an IN sub-select has a clauseless join.
258+
*/
259+
foreach(i, root->in_info_list)
260+
{
261+
InClauseInfo *ininfo = (InClauseInfo *) lfirst(i);
262+
263+
if (bms_is_subset(outer_rel->relids, ininfo->righthand) &&
264+
bms_is_subset(inner_rel->relids, ininfo->righthand))
265+
return true;
266+
}
267+
268+
/* Otherwise postpone the join till later. */
269+
return false;
270+
}

0 commit comments

Comments
 (0)