Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit f6af9c7

Browse files
committed
Ignore old stats file timestamps when starting the stats collector.
The stats collector disregards inquiry messages that bear a cutoff_time before when it last wrote the relevant stats file. That's fine, but at startup when it reads the "permanent" stats files, it absorbed their timestamps as if they were the times at which the corresponding temporary stats files had been written. In reality, of course, there's no data out there at all. This led to disregarding inquiry messages soon after startup if the postmaster had been shut down and restarted within less than PGSTAT_STAT_INTERVAL; which is a pretty common scenario, both for testing and in the field. Requesting backends would hang for 10 seconds and then report failure to read statistics, unless they got bailed out by some other backend coming along and making a newer request within that interval. I came across this through investigating unexpected delays in the src/test/recovery TAP tests: it manifests there because the autovacuum launcher hangs for 10 seconds when it can't get statistics at startup, thus preventing a second shutdown from occurring promptly. We might want to do some things in the autovac code to make it less prone to getting stuck that way, but this change is a good bug fix regardless. In passing, also fix pgstat_read_statsfiles() to ensure that it re-zeroes its global stats variables if they are corrupted by a short read from the stats file. (Other reads in that function go into temp variables, so that the issue doesn't arise.) This has been broken since we created the separation between permanent and temporary stats files in 8.4, so back-patch to all supported branches. Discussion: https://postgr.es/m/16860.1498442626@sss.pgh.pa.us
1 parent 9bfb1f2 commit f6af9c7

File tree

1 file changed

+21
-0
lines changed

1 file changed

+21
-0
lines changed

src/backend/postmaster/pgstat.c

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4278,16 +4278,28 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
42784278
{
42794279
ereport(pgStatRunningInCollector ? LOG : WARNING,
42804280
(errmsg("corrupted statistics file \"%s\"", statfile)));
4281+
memset(&globalStats, 0, sizeof(globalStats));
42814282
goto done;
42824283
}
42834284

4285+
/*
4286+
* In the collector, disregard the timestamp we read from the permanent
4287+
* stats file; we should be willing to write a temp stats file immediately
4288+
* upon the first request from any backend. This only matters if the old
4289+
* file's timestamp is less than PGSTAT_STAT_INTERVAL ago, but that's not
4290+
* an unusual scenario.
4291+
*/
4292+
if (pgStatRunningInCollector)
4293+
globalStats.stats_timestamp = 0;
4294+
42844295
/*
42854296
* Read archiver stats struct
42864297
*/
42874298
if (fread(&archiverStats, 1, sizeof(archiverStats), fpin) != sizeof(archiverStats))
42884299
{
42894300
ereport(pgStatRunningInCollector ? LOG : WARNING,
42904301
(errmsg("corrupted statistics file \"%s\"", statfile)));
4302+
memset(&archiverStats, 0, sizeof(archiverStats));
42914303
goto done;
42924304
}
42934305

@@ -4332,6 +4344,15 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
43324344
dbentry->tables = NULL;
43334345
dbentry->functions = NULL;
43344346

4347+
/*
4348+
* In the collector, disregard the timestamp we read from the
4349+
* permanent stats file; we should be willing to write a temp
4350+
* stats file immediately upon the first request from any
4351+
* backend.
4352+
*/
4353+
if (pgStatRunningInCollector)
4354+
dbentry->stats_timestamp = 0;
4355+
43354356
/*
43364357
* Don't create tables/functions hashtables for uninteresting
43374358
* databases.

0 commit comments

Comments
 (0)