Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 1bfb567

Browse files
committed
When updating reltuples after ANALYZE, just extrapolate from our sample.
The existing logic for updating pg_class.reltuples trusted the sampling results only for the pages ANALYZE actually visited, preferring to believe the previous tuple density estimate for all the unvisited pages. While there's some rationale for doing that for VACUUM (first that VACUUM is likely to visit a very nonrandom subset of pages, and second that we know for sure that the unvisited pages did not change), there's no such rationale for ANALYZE: by assumption, it's looked at an unbiased random sample of the table's pages. Furthermore, in a very large table ANALYZE will have examined only a tiny fraction of the table's pages, meaning it cannot slew the overall density estimate very far at all. In a table that is physically growing, this causes reltuples to increase nearly proportionally to the change in relpages, regardless of what is actually happening in the table. This has been observed to cause reltuples to become so much larger than reality that it effectively shuts off autovacuum, whose threshold for doing anything is a fraction of reltuples. (Getting to the point where that would happen seems to require some additional, not well understood, conditions. But it's undeniable that if reltuples is seriously off in a large table, ANALYZE alone will not fix it in any reasonable number of iterations, especially not if the table is continuing to grow.) Hence, restrict the use of vac_estimate_reltuples() to VACUUM alone, and in ANALYZE, just extrapolate from the sample pages on the assumption that they provide an accurate model of the whole table. If, by very bad luck, they don't, at least another ANALYZE will fix it; in the old logic a single bad estimate could cause problems indefinitely. In HEAD, let's remove vac_estimate_reltuples' is_analyze argument altogether; it was never used for anything and now it's totally pointless. But keep it in the back branches, in case any third-party code is calling this function. Per bug #15005. Back-patch to all supported branches. David Gould, reviewed by Alexander Kuzmenkov, cosmetic changes by me Discussion: https://postgr.es/m/20180117164916.3fdcf2e9@engels
1 parent 4460964 commit 1bfb567

File tree

2 files changed

+24
-39
lines changed

2 files changed

+24
-39
lines changed

src/backend/commands/analyze.c

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1215,19 +1215,22 @@ acquire_sample_rows(Relation onerel, int elevel,
12151215
qsort((void *) rows, numrows, sizeof(HeapTuple), compare_rows);
12161216

12171217
/*
1218-
* Estimate total numbers of rows in relation. For live rows, use
1219-
* vac_estimate_reltuples; for dead rows, we have no source of old
1220-
* information, so we have to assume the density is the same in unseen
1221-
* pages as in the pages we scanned.
1218+
* Estimate total numbers of live and dead rows in relation, extrapolating
1219+
* on the assumption that the average tuple density in pages we didn't
1220+
* scan is the same as in the pages we did scan. Since what we scanned is
1221+
* a random sample of the pages in the relation, this should be a good
1222+
* assumption.
12221223
*/
1223-
*totalrows = vac_estimate_reltuples(onerel, true,
1224-
totalblocks,
1225-
bs.m,
1226-
liverows);
12271224
if (bs.m > 0)
1225+
{
1226+
*totalrows = floor((liverows / bs.m) * totalblocks + 0.5);
12281227
*totaldeadrows = floor((deadrows / bs.m) * totalblocks + 0.5);
1228+
}
12291229
else
1230+
{
1231+
*totalrows = 0.0;
12301232
*totaldeadrows = 0.0;
1233+
}
12311234

12321235
/*
12331236
* Emit some interesting relation info

src/backend/commands/vacuum.c

Lines changed: 13 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -685,13 +685,13 @@ vacuum_set_xid_limits(Relation rel,
685685
* vac_estimate_reltuples() -- estimate the new value for pg_class.reltuples
686686
*
687687
* If we scanned the whole relation then we should just use the count of
688-
* live tuples seen; but if we did not, we should not trust the count
689-
* unreservedly, especially not in VACUUM, which may have scanned a quite
690-
* nonrandom subset of the table. When we have only partial information,
691-
* we take the old value of pg_class.reltuples as a measurement of the
688+
* live tuples seen; but if we did not, we should not blindly extrapolate
689+
* from that number, since VACUUM may have scanned a quite nonrandom
690+
* subset of the table. When we have only partial information, we take
691+
* the old value of pg_class.reltuples as a measurement of the
692692
* tuple density in the unscanned pages.
693693
*
694-
* This routine is shared by VACUUM and ANALYZE.
694+
* The is_analyze argument is historical.
695695
*/
696696
double
697697
vac_estimate_reltuples(Relation relation, bool is_analyze,
@@ -702,9 +702,8 @@ vac_estimate_reltuples(Relation relation, bool is_analyze,
702702
BlockNumber old_rel_pages = relation->rd_rel->relpages;
703703
double old_rel_tuples = relation->rd_rel->reltuples;
704704
double old_density;
705-
double new_density;
706-
double multiplier;
707-
double updated_density;
705+
double unscanned_pages;
706+
double total_tuples;
708707

709708
/* If we did scan the whole table, just use the count as-is */
710709
if (scanned_pages >= total_pages)
@@ -728,31 +727,14 @@ vac_estimate_reltuples(Relation relation, bool is_analyze,
728727

729728
/*
730729
* Okay, we've covered the corner cases. The normal calculation is to
731-
* convert the old measurement to a density (tuples per page), then update
732-
* the density using an exponential-moving-average approach, and finally
733-
* compute reltuples as updated_density * total_pages.
734-
*
735-
* For ANALYZE, the moving average multiplier is just the fraction of the
736-
* table's pages we scanned. This is equivalent to assuming that the
737-
* tuple density in the unscanned pages didn't change. Of course, it
738-
* probably did, if the new density measurement is different. But over
739-
* repeated cycles, the value of reltuples will converge towards the
740-
* correct value, if repeated measurements show the same new density.
741-
*
742-
* For VACUUM, the situation is a bit different: we have looked at a
743-
* nonrandom sample of pages, but we know for certain that the pages we
744-
* didn't look at are precisely the ones that haven't changed lately.
745-
* Thus, there is a reasonable argument for doing exactly the same thing
746-
* as for the ANALYZE case, that is use the old density measurement as the
747-
* value for the unscanned pages.
748-
*
749-
* This logic could probably use further refinement.
730+
* convert the old measurement to a density (tuples per page), then
731+
* estimate the number of tuples in the unscanned pages using that figure,
732+
* and finally add on the number of tuples in the scanned pages.
750733
*/
751734
old_density = old_rel_tuples / old_rel_pages;
752-
new_density = scanned_tuples / scanned_pages;
753-
multiplier = (double) scanned_pages / (double) total_pages;
754-
updated_density = old_density + (new_density - old_density) * multiplier;
755-
return floor(updated_density * total_pages + 0.5);
735+
unscanned_pages = (double) total_pages - (double) scanned_pages;
736+
total_tuples = old_density * unscanned_pages + scanned_tuples;
737+
return floor(total_tuples + 0.5);
756738
}
757739

758740

0 commit comments

Comments
 (0)