Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 391159e

Browse files
committed
Partially revert commit 3d3bf62.
On reflection, the pre-existing logic in ANALYZE is specifically meant to compare the frequency of a candidate MCV against the estimated frequency of a random distinct value across the whole table. The change to compare it against the average frequency of values actually seen in the sample doesn't seem very principled, and if anything it would make us less likely not more likely to consider a value an MCV. So revert that, but keep the aspect of considering only nonnull values, which definitely is correct. In passing, rename the local variables in these stanzas to "ndistinct_table", to avoid confusion with the "ndistinct" that appears at an outer scope in compute_scalar_stats.
1 parent c9ff752 commit 391159e

File tree

1 file changed

+10
-6
lines changed

1 file changed

+10
-6
lines changed

src/backend/commands/analyze.c

+10-6
Original file line numberDiff line numberDiff line change
@@ -2133,13 +2133,15 @@ compute_distinct_stats(VacAttrStatsP stats,
21332133
}
21342134
else
21352135
{
2136-
/* d here is the same as d in the Haas-Stokes formula */
2137-
int d = nonnull_cnt - summultiple + nmultiple;
2136+
double ndistinct_table = stats->stadistinct;
21382137
double avgcount,
21392138
mincount;
21402139

2140+
/* Re-extract estimate of # distinct nonnull values in table */
2141+
if (ndistinct_table < 0)
2142+
ndistinct_table = -ndistinct_table * totalrows;
21412143
/* estimate # occurrences in sample of a typical nonnull value */
2142-
avgcount = (double) nonnull_cnt / (double) d;
2144+
avgcount = (double) nonnull_cnt / ndistinct_table;
21432145
/* set minimum threshold count to store a value */
21442146
mincount = avgcount * 1.25;
21452147
if (mincount < 2)
@@ -2493,14 +2495,16 @@ compute_scalar_stats(VacAttrStatsP stats,
24932495
}
24942496
else
24952497
{
2496-
/* d here is the same as d in the Haas-Stokes formula */
2497-
int d = ndistinct + toowide_cnt;
2498+
double ndistinct_table = stats->stadistinct;
24982499
double avgcount,
24992500
mincount,
25002501
maxmincount;
25012502

2503+
/* Re-extract estimate of # distinct nonnull values in table */
2504+
if (ndistinct_table < 0)
2505+
ndistinct_table = -ndistinct_table * totalrows;
25022506
/* estimate # occurrences in sample of a typical nonnull value */
2503-
avgcount = (double) values_cnt / (double) d;
2507+
avgcount = (double) nonnull_cnt / ndistinct_table;
25042508
/* set minimum threshold count to store a value */
25052509
mincount = avgcount * 1.25;
25062510
if (mincount < 2)

0 commit comments

Comments
 (0)