Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 2b74303

Browse files
committed
Make the planner assume that the entries in a VALUES list are distinct.
Previously, if we had to estimate the number of distinct values in a VALUES column, we fell back on the default behavior used whenever we lack statistics, which effectively is that there are Min(# of entries, 200) distinct values. This can be very badly off with a large VALUES list, as noted by Jeff Janes. We could consider actually running an ANALYZE-like scan on the VALUES, but that seems unduly expensive, and anyway it could not deliver reliable info if the entries are not all constants. What seems like a better choice is to assume that the values are all distinct. This will sometimes be just as wrong as the old code, but it seems more likely to be more nearly right in many common cases. Also, it is more consistent with what happens in some related cases, for example WHERE x = ANY(ARRAY[1,2,3,...,n]) and WHERE x = ANY(VALUES (1),(2),(3),...,(n)) now are estimated similarly. This was discussed some time ago, but consensus was it'd be better to slip it in at the start of a development cycle not near the end. (It should've gone into v10, really, but I forgot about it.) Discussion: https://postgr.es/m/CAMkU=1xHkyPa8VQgGcCNg3RMFFvVxUdOpus1gKcFuvVi0w6Acg@mail.gmail.com
1 parent ac883ac commit 2b74303

File tree

2 files changed

+13
-2
lines changed

2 files changed

+13
-2
lines changed

src/backend/utils/adt/selfuncs.c

+11
Original file line numberDiff line numberDiff line change
@@ -5009,6 +5009,17 @@ get_variable_numdistinct(VariableStatData *vardata, bool *isdefault)
50095009
*/
50105010
stadistinct = 2.0;
50115011
}
5012+
else if (vardata->rel && vardata->rel->rtekind == RTE_VALUES)
5013+
{
5014+
/*
5015+
* If the Var represents a column of a VALUES RTE, assume it's unique.
5016+
* This could of course be very wrong, but it should tend to be true
5017+
* in well-written queries. We could consider examining the VALUES'
5018+
* contents to get some real statistics; but that only works if the
5019+
* entries are all constants, and it would be pretty expensive anyway.
5020+
*/
5021+
stadistinct = -1.0; /* unique (and all non null) */
5022+
}
50125023
else
50135024
{
50145025
/*

src/include/nodes/relation.h

+2-2
Original file line numberDiff line numberDiff line change
@@ -407,7 +407,7 @@ typedef struct PlannerInfo
407407
*
408408
* relid - RTE index (this is redundant with the relids field, but
409409
* is provided for convenience of access)
410-
* rtekind - distinguishes plain relation, subquery, or function RTE
410+
* rtekind - copy of RTE's rtekind field
411411
* min_attr, max_attr - range of valid AttrNumbers for rel
412412
* attr_needed - array of bitmapsets indicating the highest joinrel
413413
* in which each attribute is needed; if bit 0 is set then
@@ -552,7 +552,7 @@ typedef struct RelOptInfo
552552
/* information about a base rel (not set for join rels!) */
553553
Index relid;
554554
Oid reltablespace; /* containing tablespace */
555-
RTEKind rtekind; /* RELATION, SUBQUERY, or FUNCTION */
555+
RTEKind rtekind; /* RELATION, SUBQUERY, FUNCTION, etc */
556556
AttrNumber min_attr; /* smallest attrno of rel (often <0) */
557557
AttrNumber max_attr; /* largest attrno of rel */
558558
Relids *attr_needed; /* array indexed [min_attr .. max_attr] */

0 commit comments

Comments
 (0)