Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit c676e65

Browse files
committed
Fix choose_best_statistics to check clauses individually
When picking the best extended statistics object for a list of clauses, it's not enough to look at attnums extracted from the clause list as a whole. Consider for example this query with OR clauses: SELECT * FROM t WHERE (t.a = 1) OR (t.b = 1) OR (t.c = 1) with a statistics defined on columns (a,b). Relying on attnums extracted from the whole OR clause, we'd consider the statistics usable. That does not work, as we see the conditions as a single OR-clause, referencing an attribute not covered by the statistic, leading to empty list of clauses to be estimated using the statistics and an assert failure. This changes choose_best_statistics to check which clauses are actually covered, and only using attributes from the fully covered ones. For the previous example this means the statistics object will not be considered as compatible with the OR-clause. Backpatch to 12, where MCVs were introduced. The issue does not affect older versions because functional dependencies don't handle OR clauses. Author: Tomas Vondra Reviewed-by: Dean Rasheed Reported-By: Manuel Rigger Discussion: https://postgr.es/m/CA+u7OA7H5rcE2=8f263w4NZD6ipO_XOrYB816nuLXbmSTH9pQQ@mail.gmail.com Backpatch-through: 12
1 parent 3974c4a commit c676e65

File tree

5 files changed

+81
-24
lines changed

5 files changed

+81
-24
lines changed

src/backend/statistics/dependencies.c

+15-11
Original file line numberDiff line numberDiff line change
@@ -951,14 +951,14 @@ dependencies_clauselist_selectivity(PlannerInfo *root,
951951
Bitmapset *clauses_attnums = NULL;
952952
StatisticExtInfo *stat;
953953
MVDependencies *dependencies;
954-
AttrNumber *list_attnums;
954+
Bitmapset **list_attnums;
955955
int listidx;
956956

957957
/* check if there's any stats that might be useful for us. */
958958
if (!has_stats_of_kind(rel->statlist, STATS_EXT_DEPENDENCIES))
959959
return 1.0;
960960

961-
list_attnums = (AttrNumber *) palloc(sizeof(AttrNumber) *
961+
list_attnums = (Bitmapset **) palloc(sizeof(Bitmapset *) *
962962
list_length(clauses));
963963

964964
/*
@@ -981,11 +981,11 @@ dependencies_clauselist_selectivity(PlannerInfo *root,
981981
if (!bms_is_member(listidx, *estimatedclauses) &&
982982
dependency_is_compatible_clause(clause, rel->relid, &attnum))
983983
{
984-
list_attnums[listidx] = attnum;
984+
list_attnums[listidx] = bms_make_singleton(attnum);
985985
clauses_attnums = bms_add_member(clauses_attnums, attnum);
986986
}
987987
else
988-
list_attnums[listidx] = InvalidAttrNumber;
988+
list_attnums[listidx] = NULL;
989989

990990
listidx++;
991991
}
@@ -1002,8 +1002,8 @@ dependencies_clauselist_selectivity(PlannerInfo *root,
10021002
}
10031003

10041004
/* find the best suited statistics object for these attnums */
1005-
stat = choose_best_statistics(rel->statlist, clauses_attnums,
1006-
STATS_EXT_DEPENDENCIES);
1005+
stat = choose_best_statistics(rel->statlist, STATS_EXT_DEPENDENCIES,
1006+
list_attnums, list_length(clauses));
10071007

10081008
/* if no matching stats could be found then we've nothing to do */
10091009
if (!stat)
@@ -1043,15 +1043,21 @@ dependencies_clauselist_selectivity(PlannerInfo *root,
10431043
foreach(l, clauses)
10441044
{
10451045
Node *clause;
1046+
AttrNumber attnum;
10461047

10471048
listidx++;
10481049

10491050
/*
10501051
* Skip incompatible clauses, and ones we've already estimated on.
10511052
*/
1052-
if (list_attnums[listidx] == InvalidAttrNumber)
1053+
if (!list_attnums[listidx])
10531054
continue;
10541055

1056+
/*
1057+
* We expect the bitmaps ton contain a single attribute number.
1058+
*/
1059+
attnum = bms_singleton_member(list_attnums[listidx]);
1060+
10551061
/*
10561062
* Technically we could find more than one clause for a given
10571063
* attnum. Since these clauses must be equality clauses, we choose
@@ -1061,8 +1067,7 @@ dependencies_clauselist_selectivity(PlannerInfo *root,
10611067
* anyway. If it happens to be compared to the same Const, then
10621068
* ignoring the additional clause is just the thing to do.
10631069
*/
1064-
if (dependency_implies_attribute(dependency,
1065-
list_attnums[listidx]))
1070+
if (dependency_implies_attribute(dependency, attnum))
10661071
{
10671072
clause = (Node *) lfirst(l);
10681073

@@ -1077,8 +1082,7 @@ dependencies_clauselist_selectivity(PlannerInfo *root,
10771082
* We'll want to ignore this when looking for the next
10781083
* strongest dependency above.
10791084
*/
1080-
clauses_attnums = bms_del_member(clauses_attnums,
1081-
list_attnums[listidx]);
1085+
clauses_attnums = bms_del_member(clauses_attnums, attnum);
10821086
}
10831087
}
10841088

src/backend/statistics/extended_stats.c

+29-11
Original file line numberDiff line numberDiff line change
@@ -844,15 +844,20 @@ has_stats_of_kind(List *stats, char requiredkind)
844844
* there's no match.
845845
*
846846
* The current selection criteria is very simple - we choose the statistics
847-
* object referencing the most of the requested attributes, breaking ties
848-
* in favor of objects with fewer keys overall.
847+
* object referencing the most attributes in covered (and still unestimated
848+
* clauses), breaking ties in favor of objects with fewer keys overall.
849+
*
850+
* The clause_attnums is an array of bitmaps, storing attnums for individual
851+
* clauses. A NULL element means the clause is either incompatible or already
852+
* estimated.
849853
*
850854
* XXX If multiple statistics objects tie on both criteria, then which object
851855
* is chosen depends on the order that they appear in the stats list. Perhaps
852856
* further tiebreakers are needed.
853857
*/
854858
StatisticExtInfo *
855-
choose_best_statistics(List *stats, Bitmapset *attnums, char requiredkind)
859+
choose_best_statistics(List *stats, char requiredkind,
860+
Bitmapset **clause_attnums, int nclauses)
856861
{
857862
ListCell *lc;
858863
StatisticExtInfo *best_match = NULL;
@@ -861,17 +866,33 @@ choose_best_statistics(List *stats, Bitmapset *attnums, char requiredkind)
861866

862867
foreach(lc, stats)
863868
{
869+
int i;
864870
StatisticExtInfo *info = (StatisticExtInfo *) lfirst(lc);
871+
Bitmapset *matched = NULL;
865872
int num_matched;
866873
int numkeys;
867-
Bitmapset *matched;
868874

869875
/* skip statistics that are not of the correct type */
870876
if (info->kind != requiredkind)
871877
continue;
872878

873-
/* determine how many attributes of these stats can be matched to */
874-
matched = bms_intersect(attnums, info->keys);
879+
/*
880+
* Collect attributes in remaining (unestimated) clauses fully covered
881+
* by this statistic object.
882+
*/
883+
for (i = 0; i < nclauses; i++)
884+
{
885+
/* ignore incompatible/estimated clauses */
886+
if (!clause_attnums[i])
887+
continue;
888+
889+
/* ignore clauses that are not covered by this object */
890+
if (!bms_is_subset(clause_attnums[i], info->keys))
891+
continue;
892+
893+
matched = bms_add_members(matched, clause_attnums[i]);
894+
}
895+
875896
num_matched = bms_num_members(matched);
876897
bms_free(matched);
877898

@@ -1233,12 +1254,9 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
12331254
listidx++;
12341255
}
12351256

1236-
/* We need at least two attributes for multivariate statistics. */
1237-
if (bms_membership(clauses_attnums) != BMS_MULTIPLE)
1238-
return 1.0;
1239-
12401257
/* find the best suited statistics object for these attnums */
1241-
stat = choose_best_statistics(rel->statlist, clauses_attnums, STATS_EXT_MCV);
1258+
stat = choose_best_statistics(rel->statlist, STATS_EXT_MCV,
1259+
list_attnums, list_length(clauses));
12421260

12431261
/* if no matching stats could be found then we've nothing to do */
12441262
if (!stat)

src/include/statistics/statistics.h

+3-2
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,8 @@ extern Selectivity statext_clauselist_selectivity(PlannerInfo *root,
118118
RelOptInfo *rel,
119119
Bitmapset **estimatedclauses);
120120
extern bool has_stats_of_kind(List *stats, char requiredkind);
121-
extern StatisticExtInfo *choose_best_statistics(List *stats,
122-
Bitmapset *attnums, char requiredkind);
121+
extern StatisticExtInfo *choose_best_statistics(List *stats, char requiredkind,
122+
Bitmapset **clause_attnums,
123+
int nclauses);
123124

124125
#endif /* STATISTICS_H */

src/test/regress/expected/stats_ext.out

+25
Original file line numberDiff line numberDiff line change
@@ -534,6 +534,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a <= 4 AND b <
534534
1 | 50
535535
(1 row)
536536

537+
SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1');
538+
estimated | actual
539+
-----------+--------
540+
343 | 200
541+
(1 row)
542+
543+
SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1 OR d IS NOT NULL');
544+
estimated | actual
545+
-----------+--------
546+
343 | 200
547+
(1 row)
548+
537549
-- create statistics
538550
CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, c FROM mcv_lists;
539551
ANALYZE mcv_lists;
@@ -573,6 +585,19 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a <= 4 AND b <
573585
50 | 50
574586
(1 row)
575587

588+
SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1');
589+
estimated | actual
590+
-----------+--------
591+
200 | 200
592+
(1 row)
593+
594+
-- we can't use the statistic for OR clauses that are not fully covered (missing 'd' attribute)
595+
SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1 OR d IS NOT NULL');
596+
estimated | actual
597+
-----------+--------
598+
343 | 200
599+
(1 row)
600+
576601
-- check change of unrelated column type does not reset the MCV statistics
577602
ALTER TABLE mcv_lists ALTER COLUMN d TYPE VARCHAR(64);
578603
SELECT d.stxdmcv IS NOT NULL

src/test/regress/sql/stats_ext.sql

+9
Original file line numberDiff line numberDiff line change
@@ -342,6 +342,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < 5 AND b <
342342

343343
SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a <= 4 AND b <= ''0'' AND c <= 4');
344344

345+
SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1');
346+
347+
SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1 OR d IS NOT NULL');
348+
345349
-- create statistics
346350
CREATE STATISTICS mcv_lists_stats (mcv) ON a, b, c FROM mcv_lists;
347351

@@ -359,6 +363,11 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a < 5 AND b <
359363

360364
SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a <= 4 AND b <= ''0'' AND c <= 4');
361365

366+
SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1');
367+
368+
-- we can't use the statistic for OR clauses that are not fully covered (missing 'd' attribute)
369+
SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a = 1 OR b = ''1'' OR c = 1 OR d IS NOT NULL');
370+
362371
-- check change of unrelated column type does not reset the MCV statistics
363372
ALTER TABLE mcv_lists ALTER COLUMN d TYPE VARCHAR(64);
364373

0 commit comments

Comments
 (0)