Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 88b0898

Browse files
committed
Improve estimation of OR clauses using multiple extended statistics.
When estimating an OR clause using multiple extended statistics objects, treat the estimates for each set of clauses for each statistics object as independent of one another. The overlap estimates produced for each statistics object do not apply to clauses covered by other statistics objects. Dean Rasheed, reviewed by Tomas Vondra. Discussion: https://postgr.es/m/CAEZATCW=J65GUFm50RcPv-iASnS2mTXQbr=CfBvWRVhFLJ_fWA@mail.gmail.com
1 parent f2a69b3 commit 88b0898

File tree

2 files changed

+18
-9
lines changed

2 files changed

+18
-9
lines changed

src/backend/statistics/extended_stats.c

+17-8
Original file line numberDiff line numberDiff line change
@@ -1356,17 +1356,19 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
13561356
if (is_or)
13571357
{
13581358
bool *or_matches = NULL;
1359-
Selectivity simple_or_sel = 0.0;
1359+
Selectivity simple_or_sel = 0.0,
1360+
stat_sel = 0.0;
13601361
MCVList *mcv_list;
13611362

13621363
/* Load the MCV list stored in the statistics object */
13631364
mcv_list = statext_mcv_load(stat->statOid);
13641365

13651366
/*
1366-
* Compute the selectivity of the ORed list of clauses by
1367-
* estimating each in turn and combining them using the formula
1368-
* P(A OR B) = P(A) + P(B) - P(A AND B). This allows us to use
1369-
* the multivariate MCV stats to better estimate each term.
1367+
* Compute the selectivity of the ORed list of clauses covered by
1368+
* this statistics object by estimating each in turn and combining
1369+
* them using the formula P(A OR B) = P(A) + P(B) - P(A AND B).
1370+
* This allows us to use the multivariate MCV stats to better
1371+
* estimate the individual terms and their overlap.
13701372
*
13711373
* Each time we iterate this formula, the clause "A" above is
13721374
* equal to all the clauses processed so far, combined with "OR".
@@ -1437,12 +1439,19 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
14371439
overlap_basesel,
14381440
mcv_totalsel);
14391441

1440-
/* Factor these into the overall result */
1441-
sel += clause_sel - overlap_sel;
1442-
CLAMP_PROBABILITY(sel);
1442+
/* Factor these into the result for this statistics object */
1443+
stat_sel += clause_sel - overlap_sel;
1444+
CLAMP_PROBABILITY(stat_sel);
14431445

14441446
listidx++;
14451447
}
1448+
1449+
/*
1450+
* Factor the result for this statistics object into the overall
1451+
* result. We treat the results from each separate statistics
1452+
* object as independent of one another.
1453+
*/
1454+
sel = sel + stat_sel - sel * stat_sel;
14461455
}
14471456
else /* Implicitly-ANDed list of clauses */
14481457
{

src/test/regress/expected/stats_ext.out

+1-1
Original file line numberDiff line numberDiff line change
@@ -1706,7 +1706,7 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE (a = 0 A
17061706
SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0');
17071707
estimated | actual
17081708
-----------+--------
1709-
1714 | 1572
1709+
1571 | 1572
17101710
(1 row)
17111711

17121712
DROP TABLE mcv_lists_multi;

0 commit comments

Comments
 (0)