Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 25442d8

Browse files
committed
Correct oversight in hashjoin cost estimation: nodeHash sizes its hash
table for an average of NTUP_PER_BUCKET tuples/bucket, but cost_hashjoin was assuming a target load of one tuple/bucket. This was causing a noticeable underestimate of hashjoin costs.
1 parent 24864d0 commit 25442d8

File tree

3 files changed

+16
-8
lines changed

3 files changed

+16
-8
lines changed

src/backend/executor/nodeHash.c

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
* Portions Copyright (c) 1994, Regents of the University of California
88
*
99
*
10-
* $Id: nodeHash.c,v 1.44 2000/01/26 05:56:22 momjian Exp $
10+
* $Id: nodeHash.c,v 1.45 2000/04/18 05:43:01 tgl Exp $
1111
*
1212
*-------------------------------------------------------------------------
1313
*/
@@ -221,7 +221,6 @@ ExecEndHash(Hash *node)
221221
* create a hashtable in shared memory for hashjoin.
222222
* ----------------------------------------------------------------
223223
*/
224-
#define NTUP_PER_BUCKET 10
225224
#define FUDGE_FAC 2.0
226225

227226
HashJoinTable

src/backend/optimizer/path/costsize.c

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@
4242
* Portions Copyright (c) 1994, Regents of the University of California
4343
*
4444
* IDENTIFICATION
45-
* $Header: /cvsroot/pgsql/src/backend/optimizer/path/costsize.c,v 1.57 2000/04/12 17:15:19 momjian Exp $
45+
* $Header: /cvsroot/pgsql/src/backend/optimizer/path/costsize.c,v 1.58 2000/04/18 05:43:02 tgl Exp $
4646
*
4747
*-------------------------------------------------------------------------
4848
*/
@@ -51,6 +51,7 @@
5151

5252
#include <math.h>
5353

54+
#include "executor/nodeHash.h"
5455
#include "miscadmin.h"
5556
#include "nodes/plannodes.h"
5657
#include "optimizer/clauses.h"
@@ -604,12 +605,17 @@ cost_hashjoin(Path *path,
604605
run_cost += cpu_operator_cost * outer_path->parent->rows;
605606

606607
/*
607-
* the number of tuple comparisons needed is the number of outer
608-
* tuples times the typical hash bucket size, which we estimate
609-
* conservatively as the inner disbursion times the inner tuple count.
608+
* The number of tuple comparisons needed is the number of outer
609+
* tuples times the typical hash bucket size. nodeHash.c tries for
610+
* average bucket loading of NTUP_PER_BUCKET, but that goal will
611+
* be reached only if data values are uniformly distributed among
612+
* the buckets. To be conservative, we scale up the target bucket
613+
* size by the number of inner rows times inner disbursion, giving
614+
* an estimate of the typical number of duplicates of each value.
615+
* We then charge one cpu_operator_cost per tuple comparison.
610616
*/
611617
run_cost += cpu_operator_cost * outer_path->parent->rows *
612-
ceil(inner_path->parent->rows * innerdisbursion);
618+
NTUP_PER_BUCKET * ceil(inner_path->parent->rows * innerdisbursion);
613619

614620
/*
615621
* Estimate the number of tuples that get through the hashing filter

src/include/executor/nodeHash.h

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
* Portions Copyright (c) 1996-2000, PostgreSQL, Inc
88
* Portions Copyright (c) 1994, Regents of the University of California
99
*
10-
* $Id: nodeHash.h,v 1.15 2000/01/26 05:58:05 momjian Exp $
10+
* $Id: nodeHash.h,v 1.16 2000/04/18 05:43:00 tgl Exp $
1111
*
1212
*-------------------------------------------------------------------------
1313
*/
@@ -16,6 +16,9 @@
1616

1717
#include "nodes/plannodes.h"
1818

19+
/* NTUP_PER_BUCKET is exported because planner wants to see it */
20+
#define NTUP_PER_BUCKET 10
21+
1922
extern TupleTableSlot *ExecHash(Hash *node);
2023
extern bool ExecInitHash(Hash *node, EState *estate, Plan *parent);
2124
extern int ExecCountSlotsHash(Hash *node);

0 commit comments

Comments
 (0)