Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit e1fad50

Browse files
committed
Revise generation of hashjoin paths: generate one path per
hashjoinable clause, not one path for a randomly-chosen element of each set of clauses with the same join operator. That is, if you wrote SELECT ... WHERE t1.f1 = t2.f2 and t1.f3 = t2.f4, and both '=' ops were the same opcode (say, all four fields are int4), then the system would either consider hashing on f1=f2 or on f3=f4, but it would *not* consider both possibilities. Boo hiss. Also, revise estimation of hashjoin costs to include a penalty when the inner join var has a high disbursion --- ie, the most common value is pretty common. This tends to lead to badly skewed hash bucket occupancy and way more comparisons than you'd expect on average. I imagine that the cost calculation still needs tweaking, but at least it generates a more reasonable plan than before on George Young's example.
1 parent b7883d7 commit e1fad50

File tree

5 files changed

+201
-118
lines changed

5 files changed

+201
-118
lines changed

src/backend/optimizer/path/costsize.c

Lines changed: 47 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -3,18 +3,30 @@
33
* costsize.c
44
* Routines to compute (and set) relation sizes and path costs
55
*
6-
* Copyright (c) 1994, Regents of the University of California
6+
* Path costs are measured in units of disk accesses: one page fetch
7+
* has cost 1. The other primitive unit is the CPU time required to
8+
* process one tuple, which we set at "_cpu_page_weight_" of a page
9+
* fetch. Obviously, the CPU time per tuple depends on the query
10+
* involved, but the relative CPU and disk speeds of a given platform
11+
* are so variable that we are lucky if we can get useful numbers
12+
* at all. _cpu_page_weight_ is user-settable, in case a particular
13+
* user is clueful enough to have a better-than-default estimate
14+
* of the ratio for his platform. There is also _cpu_index_page_weight_,
15+
* the cost to process a tuple of an index during an index scan.
716
*
17+
*
18+
* Copyright (c) 1994, Regents of the University of California
819
*
920
* IDENTIFICATION
10-
* $Header: /cvsroot/pgsql/src/backend/optimizer/path/costsize.c,v 1.43 1999/07/16 04:59:14 momjian Exp $
21+
* $Header: /cvsroot/pgsql/src/backend/optimizer/path/costsize.c,v 1.44 1999/08/06 04:00:15 tgl Exp $
1122
*
1223
*-------------------------------------------------------------------------
1324
*/
1425

1526
#include <math.h>
1627

1728
#include "postgres.h"
29+
1830
#ifdef HAVE_LIMITS_H
1931
#include <limits.h>
2032
#ifndef MAXINT
@@ -26,25 +38,24 @@
2638
#endif
2739
#endif
2840

29-
41+
#include "miscadmin.h"
3042
#include "optimizer/cost.h"
3143
#include "optimizer/internal.h"
3244
#include "optimizer/tlist.h"
3345
#include "utils/lsyscache.h"
3446

35-
extern int NBuffers;
3647

48+
static int compute_targetlist_width(List *targetlist);
3749
static int compute_attribute_width(TargetEntry *tlistentry);
3850
static double relation_byte_size(int tuples, int width);
3951
static double base_log(double x, double b);
40-
static int compute_targetlist_width(List *targetlist);
52+
4153

4254
int _disable_cost_ = 30000000;
4355

4456
bool _enable_seqscan_ = true;
4557
bool _enable_indexscan_ = true;
4658
bool _enable_sort_ = true;
47-
bool _enable_hash_ = true;
4859
bool _enable_nestloop_ = true;
4960
bool _enable_mergejoin_ = true;
5061
bool _enable_hashjoin_ = true;
@@ -316,61 +327,68 @@ cost_mergejoin(Cost outercost,
316327
}
317328

318329
/*
319-
* cost_hashjoin-- XXX HASH
330+
* cost_hashjoin
331+
*
320332
* 'outercost' and 'innercost' are the (disk+cpu) costs of scanning the
321333
* outer and inner relations
322-
* 'outerkeys' and 'innerkeys' are lists of the keys to be used
323-
* to hash the outer and inner relations
324334
* 'outersize' and 'innersize' are the number of tuples in the outer
325335
* and inner relations
326336
* 'outerwidth' and 'innerwidth' are the (typical) widths (in bytes)
327337
* of the tuples of the outer and inner relations
338+
* 'innerdisbursion' is an estimate of the disbursion statistic
339+
* for the inner hash key.
328340
*
329341
* Returns a flonum.
330342
*/
331343
Cost
332344
cost_hashjoin(Cost outercost,
333345
Cost innercost,
334-
List *outerkeys,
335-
List *innerkeys,
336346
int outersize,
337347
int innersize,
338348
int outerwidth,
339-
int innerwidth)
349+
int innerwidth,
350+
Cost innerdisbursion)
340351
{
341352
Cost temp = 0;
342-
int outerpages = page_size(outersize, outerwidth);
343-
int innerpages = page_size(innersize, innerwidth);
353+
double outerbytes = relation_byte_size(outersize, outerwidth);
354+
double innerbytes = relation_byte_size(innersize, innerwidth);
355+
long hashtablebytes = SortMem * 1024L;
344356

345357
if (!_enable_hashjoin_)
346358
temp += _disable_cost_;
347359

348-
/*
349-
* Bias against putting larger relation on inside.
350-
*
351-
* Code used to use "outerpages < innerpages" but that has poor
352-
* resolution when both relations are small.
353-
*/
354-
if (relation_byte_size(outersize, outerwidth) <
355-
relation_byte_size(innersize, innerwidth))
356-
temp += _disable_cost_;
357-
358360
/* cost of source data */
359361
temp += outercost + innercost;
360362

361363
/* cost of computing hash function: must do it once per tuple */
362364
temp += _cpu_page_weight_ * (outersize + innersize);
363365

364-
/* cost of main-memory hashtable */
365-
temp += (innerpages < NBuffers) ? innerpages : NBuffers;
366+
/* the number of tuple comparisons needed is the number of outer
367+
* tuples times the typical hash bucket size, which we estimate
368+
* conservatively as the inner disbursion times the inner tuple
369+
* count. The cost per comparison is set at _cpu_index_page_weight_;
370+
* is that reasonable, or do we need another basic parameter?
371+
*/
372+
temp += _cpu_index_page_weight_ * outersize *
373+
(innersize * innerdisbursion);
366374

367375
/*
368376
* if inner relation is too big then we will need to "batch" the join,
369377
* which implies writing and reading most of the tuples to disk an
370-
* extra time.
378+
* extra time. Charge one cost unit per page of I/O.
379+
*/
380+
if (innerbytes > hashtablebytes)
381+
temp += 2 * (page_size(outersize, outerwidth) +
382+
page_size(innersize, innerwidth));
383+
384+
/*
385+
* Bias against putting larger relation on inside. We don't want
386+
* an absolute prohibition, though, since larger relation might have
387+
* better disbursion --- and we can't trust the size estimates
388+
* unreservedly, anyway.
371389
*/
372-
if (innerpages > NBuffers)
373-
temp += 2 * (outerpages + innerpages);
390+
if (innerbytes > outerbytes)
391+
temp *= 1.1; /* is this an OK fudge factor? */
374392

375393
Assert(temp >= 0);
376394

0 commit comments

Comments
 (0)