Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 3454876

Browse files
committed
Fix hash table size estimation error in choose_hashed_distinct().
We should account for the per-group hashtable entry overhead when considering whether to use a hash aggregate to implement DISTINCT. The comparable logic in choose_hashed_grouping() gets this right, but I think I omitted it here in the mistaken belief that there would be no overhead if there were no aggregate functions to be evaluated. This can result in more than 2X underestimate of the hash table size, if the tuples being aggregated aren't very wide. Per report from Tomas Vondra. This bug is of long standing, but per discussion we'll only back-patch into 9.3. Changing the estimation behavior in stable branches seems to carry too much risk of destabilizing plan choices for already-tuned applications.
1 parent 5dcc48c commit 3454876

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

src/backend/optimizer/plan/planner.c

+4
Original file line numberDiff line numberDiff line change
@@ -2848,7 +2848,11 @@ choose_hashed_distinct(PlannerInfo *root,
28482848
* Don't do it if it doesn't look like the hashtable will fit into
28492849
* work_mem.
28502850
*/
2851+
2852+
/* Estimate per-hash-entry space at tuple width... */
28512853
hashentrysize = MAXALIGN(path_width) + MAXALIGN(sizeof(MinimalTupleData));
2854+
/* plus the per-hash-entry overhead */
2855+
hashentrysize += hash_agg_entry_size(0);
28522856

28532857
if (hashentrysize * dNumDistinctRows > work_mem * 1024L)
28542858
return false;

0 commit comments

Comments
 (0)