Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 52b6053

Browse files
committed
Fix tsmatchsel() to account properly for null rows.
ts_typanalyze.c computes MCE statistics as fractions of the non-null rows, which seems fairly reasonable, and anyway changing it in released versions wouldn't be a good idea. But then ts_selfuncs.c has to account for that. Failure to do so results in overestimates in columns with a significant fraction of null documents. Back-patch to 8.4 where this stuff was introduced. Jesper Krogh
1 parent de623f3 commit 52b6053

File tree

2 files changed

+8
-0
lines changed

2 files changed

+8
-0
lines changed

src/backend/tsearch/ts_selfuncs.c

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,11 +189,17 @@ tsquerysel(VariableStatData *vardata, Datum constval)
189189
/* No most-common-elements info, so do without */
190190
selec = tsquery_opr_selec_no_stats(query);
191191
}
192+
193+
/*
194+
* MCE stats count only non-null rows, so adjust for null rows.
195+
*/
196+
selec *= (1.0 - stats->stanullfrac);
192197
}
193198
else
194199
{
195200
/* No stats at all, so do without */
196201
selec = tsquery_opr_selec_no_stats(query);
202+
/* we assume no nulls here, so no stanullfrac correction */
197203
}
198204

199205
return selec;

src/include/catalog/pg_statistic.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -246,6 +246,8 @@ typedef FormData_pg_statistic *Form_pg_statistic;
246246
* type with identifiable elements (for instance, tsvector). staop contains
247247
* the equality operator appropriate to the element type. stavalues contains
248248
* the most common element values, and stanumbers their frequencies. Unlike
249+
* MCV slots, frequencies are measured as the fraction of non-null rows the
250+
* element value appears in, not the frequency of all rows. Also unlike
249251
* MCV slots, the values are sorted into order (to support binary search
250252
* for a particular value). Since this puts the minimum and maximum
251253
* frequencies at unpredictable spots in stanumbers, there are two extra

0 commit comments

Comments
 (0)