Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 3f5d2fe

Browse files
committed
Be more wary of missing statistics in eqjoinsel_semi().
In particular, if we don't have real ndistinct estimates for both sides, fall back to assuming that half of the left-hand rows have join partners. This is what was done in 8.2 and 8.3 (cf nulltestsel() in those versions). It's pretty stupid but it won't lead us to think that an antijoin produces no rows out, as seen in recent example from Uwe Schroeder.
1 parent 921b993 commit 3f5d2fe

File tree

1 file changed

+32
-17
lines changed

1 file changed

+32
-17
lines changed

src/backend/utils/adt/selfuncs.c

+32-17
Original file line numberDiff line numberDiff line change
@@ -2342,7 +2342,9 @@ eqjoinsel_semi(Oid operator,
23422342
bool *hasmatch1;
23432343
bool *hasmatch2;
23442344
double nullfrac1 = stats1->stanullfrac;
2345-
double matchfreq1;
2345+
double matchfreq1,
2346+
uncertainfrac,
2347+
uncertain;
23462348
int i,
23472349
nmatches;
23482350

@@ -2396,18 +2398,26 @@ eqjoinsel_semi(Oid operator,
23962398
* the uncertain rows that a fraction nd2/nd1 have join partners. We
23972399
* can discount the known-matched MCVs from the distinct-values counts
23982400
* before doing the division.
2401+
*
2402+
* Crude as the above is, it's completely useless if we don't have
2403+
* reliable ndistinct values for both sides. Hence, if either nd1
2404+
* or nd2 is default, punt and assume half of the uncertain rows
2405+
* have join partners.
23992406
*/
2400-
nd1 -= nmatches;
2401-
nd2 -= nmatches;
2402-
if (nd1 <= nd2 || nd2 <= 0)
2403-
selec = Max(matchfreq1, 1.0 - nullfrac1);
2404-
else
2407+
if (nd1 != DEFAULT_NUM_DISTINCT && nd2 != DEFAULT_NUM_DISTINCT)
24052408
{
2406-
double uncertain = 1.0 - matchfreq1 - nullfrac1;
2407-
2408-
CLAMP_PROBABILITY(uncertain);
2409-
selec = matchfreq1 + (nd2 / nd1) * uncertain;
2409+
nd1 -= nmatches;
2410+
nd2 -= nmatches;
2411+
if (nd1 <= nd2 || nd2 <= 0)
2412+
uncertainfrac = 1.0;
2413+
else
2414+
uncertainfrac = nd2 / nd1;
24102415
}
2416+
else
2417+
uncertainfrac = 0.5;
2418+
uncertain = 1.0 - matchfreq1 - nullfrac1;
2419+
CLAMP_PROBABILITY(uncertain);
2420+
selec = matchfreq1 + uncertainfrac * uncertain;
24112421
}
24122422
else
24132423
{
@@ -2417,15 +2427,20 @@ eqjoinsel_semi(Oid operator,
24172427
*/
24182428
double nullfrac1 = stats1 ? stats1->stanullfrac : 0.0;
24192429

2420-
if (vardata1->rel)
2421-
nd1 = Min(nd1, vardata1->rel->rows);
2422-
if (vardata2->rel)
2423-
nd2 = Min(nd2, vardata2->rel->rows);
2430+
if (nd1 != DEFAULT_NUM_DISTINCT && nd2 != DEFAULT_NUM_DISTINCT)
2431+
{
2432+
if (vardata1->rel)
2433+
nd1 = Min(nd1, vardata1->rel->rows);
2434+
if (vardata2->rel)
2435+
nd2 = Min(nd2, vardata2->rel->rows);
24242436

2425-
if (nd1 <= nd2 || nd2 <= 0)
2426-
selec = 1.0 - nullfrac1;
2437+
if (nd1 <= nd2 || nd2 <= 0)
2438+
selec = 1.0 - nullfrac1;
2439+
else
2440+
selec = (nd2 / nd1) * (1.0 - nullfrac1);
2441+
}
24272442
else
2428-
selec = (nd2 / nd1) * (1.0 - nullfrac1);
2443+
selec = 0.5 * (1.0 - nullfrac1);
24292444
}
24302445

24312446
if (have_mcvs1)

0 commit comments

Comments
 (0)