Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 8498d29

Browse files
Atsushi TorikoshiCommitfest Bot
Atsushi Torikoshi
authored and
Commitfest Bot
committed
Add storage I/O tracking to 'BUFFERS' option
The 'BUFFERS' option currently indicates whether a block hit the shared buffer, but does not distinguish between a cache hit in the OS cache or a storage I/O operation. While shared buffers and OS cache offer similar performance, storage I/O is significantly slower in comparison in general. By measuring the numbers of storage I/O read and write, we can better identify if storage I/O is a bottleneck in performance. Added tracking of storage I/O usage by calling getrusage(2) at both the planning and execution phase start and end points. A more granular approach as well as current BUFFERS option(tracking at each plan node) was considered but found to be impractical due to the high performance cost of frequent getrusage() calls. This output is shown when io_method=worker, since asynchronous workers handle I/O for multiple processes, and isolating the EXPLAIN target's I/O is difficult. TODO: I believe this information is mainly useful when used in auto_explain. I'm going to implement it if this patch is merged.
1 parent c362370 commit 8498d29

File tree

16 files changed

+1148
-34
lines changed

16 files changed

+1148
-34
lines changed

doc/src/sgml/ref/explain.sgml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,30 @@ ROLLBACK;
204204
format, only non-zero values are printed. Buffers information is
205205
automatically included when <literal>ANALYZE</literal> is used.
206206
</para>
207+
<para>
208+
If possible, this option also displays the number of read and write
209+
operations performed on storage during the planning and execution phases,
210+
shown at the end of the plan. These values are obtained from the
211+
<function>getrusage()</function> system call. Note that on platforms that
212+
do not support <function>getrusage()</function>, such as Windows, no output
213+
will be shown, even if reads or writes actually occur. Additionally, even
214+
on platforms where <function>getrusage()</function> is supported, if the
215+
kernel is built without the necessary options to track storage read and
216+
write operations, no output will be shown. Also, When
217+
<varname>io_method</varname> is set to <literal>worker</literal>, no output
218+
will be shown, as I/O handled by asynchronous workers cannot be measured
219+
accurately.
220+
The timing and unit of measurement for read and write operations may vary
221+
depending on the platform. For example, on Linux, a read is counted only
222+
if this process caused data to be fetched from the storage layer, and a
223+
write is counted at the page-dirtying time. On Linux, the unit of
224+
measurement for read and write operations is 512 bytes.
225+
</para>
226+
<para>
227+
Buffers information is included by default when <literal>ANALYZE</literal>
228+
is used but otherwise is not included by default, but can be enabled using
229+
this option.
230+
</para>
207231
</listitem>
208232
</varlistentry>
209233

src/backend/access/brin/brin.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2557,7 +2557,7 @@ _brin_end_parallel(BrinLeader *brinleader, BrinBuildState *state)
25572557
* or we might get incomplete data.)
25582558
*/
25592559
for (i = 0; i < brinleader->pcxt->nworkers_launched; i++)
2560-
InstrAccumParallelQuery(&brinleader->bufferusage[i], &brinleader->walusage[i]);
2560+
InstrAccumParallelQuery(&brinleader->bufferusage[i], NULL, &brinleader->walusage[i]);
25612561

25622562
/* Free last reference to MVCC snapshot, if one was used */
25632563
if (IsMVCCSnapshot(brinleader->snapshot))
@@ -2919,7 +2919,7 @@ _brin_parallel_build_main(dsm_segment *seg, shm_toc *toc)
29192919
tuplesort_attach_shared(sharedsort, seg);
29202920

29212921
/* Prepare to track buffer usage during parallel execution */
2922-
InstrStartParallelQuery();
2922+
InstrStartParallelQuery(NULL);
29232923

29242924
/*
29252925
* Might as well use reliable figure when doling out maintenance_work_mem
@@ -2934,8 +2934,8 @@ _brin_parallel_build_main(dsm_segment *seg, shm_toc *toc)
29342934
/* Report WAL/buffer usage during parallel execution */
29352935
bufferusage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
29362936
walusage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false);
2937-
InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
2938-
&walusage[ParallelWorkerNumber]);
2937+
InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber], NULL,
2938+
&walusage[ParallelWorkerNumber], NULL);
29392939

29402940
index_close(indexRel, indexLockmode);
29412941
table_close(heapRel, heapLockmode);

src/backend/access/gin/gininsert.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1084,7 +1084,7 @@ _gin_end_parallel(GinLeader *ginleader, GinBuildState *state)
10841084
* or we might get incomplete data.)
10851085
*/
10861086
for (i = 0; i < ginleader->pcxt->nworkers_launched; i++)
1087-
InstrAccumParallelQuery(&ginleader->bufferusage[i], &ginleader->walusage[i]);
1087+
InstrAccumParallelQuery(&ginleader->bufferusage[i], NULL, &ginleader->walusage[i]);
10881088

10891089
/* Free last reference to MVCC snapshot, if one was used */
10901090
if (IsMVCCSnapshot(ginleader->snapshot))
@@ -2147,7 +2147,7 @@ _gin_parallel_build_main(dsm_segment *seg, shm_toc *toc)
21472147
tuplesort_attach_shared(sharedsort, seg);
21482148

21492149
/* Prepare to track buffer usage during parallel execution */
2150-
InstrStartParallelQuery();
2150+
InstrStartParallelQuery(NULL);
21512151

21522152
/*
21532153
* Might as well use reliable figure when doling out maintenance_work_mem
@@ -2162,8 +2162,8 @@ _gin_parallel_build_main(dsm_segment *seg, shm_toc *toc)
21622162
/* Report WAL/buffer usage during parallel execution */
21632163
bufferusage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
21642164
walusage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false);
2165-
InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
2166-
&walusage[ParallelWorkerNumber]);
2165+
InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber], NULL,
2166+
&walusage[ParallelWorkerNumber], NULL);
21672167

21682168
index_close(indexRel, indexLockmode);
21692169
table_close(heapRel, heapLockmode);

src/backend/access/nbtree/nbtsort.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1618,7 +1618,7 @@ _bt_end_parallel(BTLeader *btleader)
16181618
* or we might get incomplete data.)
16191619
*/
16201620
for (i = 0; i < btleader->pcxt->nworkers_launched; i++)
1621-
InstrAccumParallelQuery(&btleader->bufferusage[i], &btleader->walusage[i]);
1621+
InstrAccumParallelQuery(&btleader->bufferusage[i], NULL, &btleader->walusage[i]);
16221622

16231623
/* Free last reference to MVCC snapshot, if one was used */
16241624
if (IsMVCCSnapshot(btleader->snapshot))
@@ -1826,7 +1826,7 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
18261826
}
18271827

18281828
/* Prepare to track buffer usage during parallel execution */
1829-
InstrStartParallelQuery();
1829+
InstrStartParallelQuery(NULL);
18301830

18311831
/* Perform sorting of spool, and possibly a spool2 */
18321832
sortmem = maintenance_work_mem / btshared->scantuplesortstates;
@@ -1836,8 +1836,8 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
18361836
/* Report WAL/buffer usage during parallel execution */
18371837
bufferusage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
18381838
walusage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false);
1839-
InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
1840-
&walusage[ParallelWorkerNumber]);
1839+
InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber], NULL,
1840+
&walusage[ParallelWorkerNumber], NULL);
18411841

18421842
#ifdef BTREE_BUILD_STATS
18431843
if (log_btree_build_stats)

src/backend/commands/explain.c

Lines changed: 120 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
#include "parser/analyze.h"
3333
#include "parser/parsetree.h"
3434
#include "rewrite/rewriteHandler.h"
35+
#include "storage/aio_subsys.h"
3536
#include "storage/bufmgr.h"
3637
#include "tcop/tcopprot.h"
3738
#include "utils/builtins.h"
@@ -144,6 +145,8 @@ static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
144145
static const char *explain_get_index_name(Oid indexId);
145146
static bool peek_buffer_usage(ExplainState *es, const BufferUsage *usage);
146147
static void show_buffer_usage(ExplainState *es, const BufferUsage *usage);
148+
static bool peek_storageio_usage(ExplainState *es, const StorageIOUsage *usage);
149+
static void show_storageio_usage(ExplainState *es, const StorageIOUsage *usage);
147150
static void show_wal_usage(ExplainState *es, const WalUsage *usage);
148151
static void show_memory_counters(ExplainState *es,
149152
const MemoryContextCounters *mem_counters);
@@ -325,6 +328,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
325328
planduration;
326329
BufferUsage bufusage_start,
327330
bufusage;
331+
StorageIOUsage storageio,
332+
storageio_start;
328333
MemoryContextCounters mem_counters;
329334
MemoryContext planner_ctx = NULL;
330335
MemoryContext saved_ctx = NULL;
@@ -346,7 +351,10 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
346351
}
347352

348353
if (es->buffers)
354+
{
349355
bufusage_start = pgBufferUsage;
356+
GetStorageIOUsage(&storageio_start);
357+
}
350358
INSTR_TIME_SET_CURRENT(planstart);
351359

352360
/* plan the query */
@@ -361,16 +369,20 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
361369
MemoryContextMemConsumed(planner_ctx, &mem_counters);
362370
}
363371

364-
/* calc differences of buffer counters. */
372+
/* calc differences of buffer and storage I/O counters. */
365373
if (es->buffers)
366374
{
367375
memset(&bufusage, 0, sizeof(BufferUsage));
368376
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
377+
378+
GetStorageIOUsage(&storageio);
379+
StorageIOUsageDiff(&storageio, &storageio_start);
369380
}
370381

371382
/* run it (if needed) and produce output */
372383
ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
373384
&planduration, (es->buffers ? &bufusage : NULL),
385+
es->buffers ? &storageio : NULL,
374386
es->memory ? &mem_counters : NULL);
375387
}
376388

@@ -494,7 +506,7 @@ void
494506
ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
495507
const char *queryString, ParamListInfo params,
496508
QueryEnvironment *queryEnv, const instr_time *planduration,
497-
const BufferUsage *bufusage,
509+
const BufferUsage *bufusage, const StorageIOUsage *planstorageio,
498510
const MemoryContextCounters *mem_counters)
499511
{
500512
DestReceiver *dest;
@@ -504,6 +516,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
504516
int eflags;
505517
int instrument_option = 0;
506518
SerializeMetrics serializeMetrics = {0};
519+
StorageIOUsage storageio_start;
507520

508521
Assert(plannedstmt->commandType != CMD_UTILITY);
509522

@@ -513,7 +526,19 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
513526
instrument_option |= INSTRUMENT_ROWS;
514527

515528
if (es->buffers)
529+
{
530+
GetStorageIOUsage(&storageio_start);
531+
532+
/*
533+
* Initialize global variable counters for parallel query workers.
534+
* Even if the query is cancelled on the way, the EXPLAIN execution
535+
* always passes here, so it can be initialized here.
536+
*/
537+
pgStorageIOUsageParallel.inblock = 0;
538+
pgStorageIOUsageParallel.outblock = 0;
539+
516540
instrument_option |= INSTRUMENT_BUFFERS;
541+
}
517542
if (es->wal)
518543
instrument_option |= INSTRUMENT_WAL;
519544

@@ -597,8 +622,9 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
597622
/* Create textual dump of plan tree */
598623
ExplainPrintPlan(es, queryDesc);
599624

600-
/* Show buffer and/or memory usage in planning */
601-
if (peek_buffer_usage(es, bufusage) || mem_counters)
625+
/* Show buffer, storage I/O, and/or memory usage in planning */
626+
if (peek_buffer_usage(es, bufusage) || peek_storageio_usage(es, planstorageio) ||
627+
mem_counters)
602628
{
603629
ExplainOpenGroup("Planning", "Planning", true, es);
604630

@@ -610,8 +636,10 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
610636
}
611637

612638
if (bufusage)
639+
{
613640
show_buffer_usage(es, bufusage);
614-
641+
show_storageio_usage(es, planstorageio);
642+
}
615643
if (mem_counters)
616644
show_memory_counters(es, mem_counters);
617645

@@ -668,6 +696,34 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
668696

669697
totaltime += elapsed_time(&starttime);
670698

699+
/* Show storage I/O usage in execution */
700+
if (es->buffers)
701+
{
702+
StorageIOUsage storageio;
703+
704+
GetStorageIOUsage(&storageio);
705+
StorageIOUsageDiff(&storageio, &storageio_start);
706+
StorageIOUsageAdd(&storageio, &pgStorageIOUsageParallel);
707+
708+
if (peek_storageio_usage(es, &storageio))
709+
{
710+
ExplainOpenGroup("Execution", "Execution", true, es);
711+
712+
if (es->format == EXPLAIN_FORMAT_TEXT)
713+
{
714+
ExplainIndentText(es);
715+
appendStringInfoString(es->str, "Execution:\n");
716+
es->indent++;
717+
}
718+
show_storageio_usage(es, &storageio);
719+
720+
if (es->format == EXPLAIN_FORMAT_TEXT)
721+
es->indent--;
722+
723+
ExplainCloseGroup("Execution", "Execution", true, es);
724+
}
725+
}
726+
671727
/*
672728
* We only report execution time if we actually ran the query (that is,
673729
* the user specified ANALYZE), and if summary reporting is enabled (the
@@ -4252,6 +4308,65 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
42524308
}
42534309
}
42544310

4311+
/*
4312+
* Return whether show_storageio_usage would have anything to print, if given
4313+
* the same 'usage' data. Note that when the format is anything other than
4314+
* text, we print even if the counters are all zeroes.
4315+
*/
4316+
static bool
4317+
peek_storageio_usage(ExplainState *es, const StorageIOUsage *usage)
4318+
{
4319+
if (usage == NULL)
4320+
return false;
4321+
4322+
/*
4323+
* Since showing only the I/O excluding AIO workers underestimates the
4324+
* total I/O, treat this case as having nothing to print.
4325+
*/
4326+
if (pgaio_workers_enabled())
4327+
return false;
4328+
4329+
if (es->format != EXPLAIN_FORMAT_TEXT)
4330+
return true;
4331+
4332+
return usage->inblock > 0 || usage->outblock > 0;
4333+
}
4334+
4335+
/*
4336+
* Show storage I/O usage.
4337+
*/
4338+
static void
4339+
show_storageio_usage(ExplainState *es, const StorageIOUsage *usage)
4340+
{
4341+
/*
4342+
* Since showing only the I/O excluding AIO workers underestimates the
4343+
* total I/O, do not show anything.
4344+
*/
4345+
if (pgaio_workers_enabled())
4346+
return;
4347+
4348+
if (es->format == EXPLAIN_FORMAT_TEXT)
4349+
{
4350+
/* Show only positive counter values. */
4351+
if (usage->inblock <= 0 && usage->outblock <= 0)
4352+
return;
4353+
4354+
ExplainIndentText(es);
4355+
appendStringInfoString(es->str, "Storage I/O:");
4356+
appendStringInfo(es->str, " read=%ld times", (long) usage->inblock);
4357+
appendStringInfo(es->str, " write=%ld times", (long) usage->outblock);
4358+
4359+
appendStringInfoChar(es->str, '\n');
4360+
}
4361+
else
4362+
{
4363+
ExplainPropertyInteger("Storage I/O Read", NULL,
4364+
usage->inblock, es);
4365+
ExplainPropertyInteger("Storage I/O Read", NULL,
4366+
usage->outblock, es);
4367+
}
4368+
}
4369+
42554370
/*
42564371
* Show WAL usage details.
42574372
*/

src/backend/commands/prepare.c

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -582,6 +582,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
582582
instr_time planduration;
583583
BufferUsage bufusage_start,
584584
bufusage;
585+
StorageIOUsage storageio,
586+
storageio_start;
585587
MemoryContextCounters mem_counters;
586588
MemoryContext planner_ctx = NULL;
587589
MemoryContext saved_ctx = NULL;
@@ -597,7 +599,11 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
597599
}
598600

599601
if (es->buffers)
602+
{
600603
bufusage_start = pgBufferUsage;
604+
GetStorageIOUsage(&storageio_start);
605+
}
606+
601607
INSTR_TIME_SET_CURRENT(planstart);
602608

603609
/* Look it up in the hash table */
@@ -647,6 +653,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
647653
{
648654
memset(&bufusage, 0, sizeof(BufferUsage));
649655
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
656+
GetStorageIOUsage(&storageio);
650657
}
651658

652659
plan_list = cplan->stmt_list;
@@ -659,6 +666,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
659666
if (pstmt->commandType != CMD_UTILITY)
660667
ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
661668
&planduration, (es->buffers ? &bufusage : NULL),
669+
es->buffers ? &storageio : NULL,
662670
es->memory ? &mem_counters : NULL);
663671
else
664672
ExplainOneUtility(pstmt->utilityStmt, into, es, pstate, paramLI);

0 commit comments

Comments
 (0)