Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 8cc139b

Browse files
committed
Introduce pg_shmem_allocations_numa view
Introduce new pg_shmem_alloctions_numa view with information about how shared memory is distributed across NUMA nodes. For each shared memory segment, the view returns one row for each NUMA node backing it, with the total amount of memory allocated from that node. The view may be relatively expensive, especially when executed for the first time in a backend, as it has to touch all memory pages to get reliable information about the NUMA node. This may also force allocation of the shared memory. Unlike pg_shmem_allocations, the view does not show anonymous shared memory allocations. It also does not show memory allocated using the dynamic shared memory infrastructure. Author: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.com
1 parent 65c298f commit 8cc139b

File tree

12 files changed

+322
-6
lines changed

12 files changed

+322
-6
lines changed

doc/src/sgml/system-views.sgml

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -181,6 +181,11 @@
181181
<entry>shared memory allocations</entry>
182182
</row>
183183

184+
<row>
185+
<entry><link linkend="view-pg-shmem-allocations-numa"><structname>pg_shmem_allocations_numa</structname></link></entry>
186+
<entry>NUMA node mappings for shared memory allocations</entry>
187+
</row>
188+
184189
<row>
185190
<entry><link linkend="view-pg-stats"><structname>pg_stats</structname></link></entry>
186191
<entry>planner statistics</entry>
@@ -4051,6 +4056,96 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
40514056
</para>
40524057
</sect1>
40534058

4059+
<sect1 id="view-pg-shmem-allocations-numa">
4060+
<title><structname>pg_shmem_allocations_numa</structname></title>
4061+
4062+
<indexterm zone="view-pg-shmem-allocations-numa">
4063+
<primary>pg_shmem_allocations_numa</primary>
4064+
</indexterm>
4065+
4066+
<para>
4067+
The <structname>pg_shmem_allocations_numa</structname> shows how shared
4068+
memory allocations in the server's main shared memory segment are distributed
4069+
across NUMA nodes. This includes both memory allocated by
4070+
<productname>PostgreSQL</productname> itself and memory allocated
4071+
by extensions using the mechanisms detailed in
4072+
<xref linkend="xfunc-shared-addin" />. This view will output multiple rows
4073+
for each of the shared memory segments provided that they are spread accross
4074+
multiple NUMA nodes. This view should not be queried by monitoring systems
4075+
as it is very slow and may end up allocating shared memory in case it was not
4076+
used earlier.
4077+
Current limitation for this view is that won't show anonymous shared memory
4078+
allocations.
4079+
</para>
4080+
4081+
<para>
4082+
Note that this view does not include memory allocated using the dynamic
4083+
shared memory infrastructure.
4084+
</para>
4085+
4086+
<warning>
4087+
<para>
4088+
When determining the <acronym>NUMA</acronym> node, the view touches
4089+
all memory pages for the shared memory segment. This will force
4090+
allocation of the shared memory, if it wasn't allocated already,
4091+
and the memory may get allocated in a single <acronym>NUMA</acronym>
4092+
node (depending on system configuration).
4093+
</para>
4094+
</warning>
4095+
4096+
<table>
4097+
<title><structname>pg_shmem_allocations_numa</structname> Columns</title>
4098+
<tgroup cols="1">
4099+
<thead>
4100+
<row>
4101+
<entry role="catalog_table_entry"><para role="column_definition">
4102+
Column Type
4103+
</para>
4104+
<para>
4105+
Description
4106+
</para></entry>
4107+
</row>
4108+
</thead>
4109+
4110+
<tbody>
4111+
<row>
4112+
<entry role="catalog_table_entry"><para role="column_definition">
4113+
<structfield>name</structfield> <type>text</type>
4114+
</para>
4115+
<para>
4116+
The name of the shared memory allocation.
4117+
</para></entry>
4118+
</row>
4119+
4120+
<row>
4121+
<entry role="catalog_table_entry"><para role="column_definition">
4122+
<structfield>numa_node</structfield> <type>int4</type>
4123+
</para>
4124+
<para>
4125+
ID of <acronym>NUMA</acronym> node
4126+
</para></entry>
4127+
</row>
4128+
4129+
<row>
4130+
<entry role="catalog_table_entry"><para role="column_definition">
4131+
<structfield>size</structfield> <type>int4</type>
4132+
</para>
4133+
<para>
4134+
Size of the allocation on this particular NUMA memory node in bytes
4135+
</para></entry>
4136+
</row>
4137+
4138+
</tbody>
4139+
</tgroup>
4140+
</table>
4141+
4142+
<para>
4143+
By default, the <structname>pg_shmem_allocations_numa</structname> view can be
4144+
read only by superusers or roles with privileges of the
4145+
<literal>pg_read_all_stats</literal> role.
4146+
</para>
4147+
</sect1>
4148+
40544149
<sect1 id="view-pg-stats">
40554150
<title><structname>pg_stats</structname></title>
40564151

src/backend/catalog/system_views.sql

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -658,6 +658,14 @@ GRANT SELECT ON pg_shmem_allocations TO pg_read_all_stats;
658658
REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC;
659659
GRANT EXECUTE ON FUNCTION pg_get_shmem_allocations() TO pg_read_all_stats;
660660

661+
CREATE VIEW pg_shmem_allocations_numa AS
662+
SELECT * FROM pg_get_shmem_allocations_numa();
663+
664+
REVOKE ALL ON pg_shmem_allocations_numa FROM PUBLIC;
665+
GRANT SELECT ON pg_shmem_allocations_numa TO pg_read_all_stats;
666+
REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations_numa() FROM PUBLIC;
667+
GRANT EXECUTE ON FUNCTION pg_get_shmem_allocations_numa() TO pg_read_all_stats;
668+
661669
CREATE VIEW pg_backend_memory_contexts AS
662670
SELECT * FROM pg_get_backend_memory_contexts();
663671

src/backend/storage/ipc/shmem.c

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@
6868
#include "fmgr.h"
6969
#include "funcapi.h"
7070
#include "miscadmin.h"
71+
#include "port/pg_numa.h"
7172
#include "storage/lwlock.h"
7273
#include "storage/pg_shmem.h"
7374
#include "storage/shmem.h"
@@ -89,6 +90,8 @@ slock_t *ShmemLock; /* spinlock for shared memory and LWLock
8990

9091
static HTAB *ShmemIndex = NULL; /* primary index hashtable for shmem */
9192

93+
/* To get reliable results for NUMA inquiry we need to "touch pages" once */
94+
static bool firstNumaTouch = true;
9295

9396
/*
9497
* InitShmemAccess() --- set up basic pointers to shared memory.
@@ -568,3 +571,159 @@ pg_get_shmem_allocations(PG_FUNCTION_ARGS)
568571

569572
return (Datum) 0;
570573
}
574+
575+
/*
576+
* SQL SRF showing NUMA memory nodes for allocated shared memory
577+
*
578+
* Compared to pg_get_shmem_allocations(), this function does not return
579+
* information about shared anonymous allocations and unused shared memory.
580+
*/
581+
Datum
582+
pg_get_shmem_allocations_numa(PG_FUNCTION_ARGS)
583+
{
584+
#define PG_GET_SHMEM_NUMA_SIZES_COLS 3
585+
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
586+
HASH_SEQ_STATUS hstat;
587+
ShmemIndexEnt *ent;
588+
Datum values[PG_GET_SHMEM_NUMA_SIZES_COLS];
589+
bool nulls[PG_GET_SHMEM_NUMA_SIZES_COLS];
590+
Size os_page_size;
591+
void **page_ptrs;
592+
int *pages_status;
593+
uint64 shm_total_page_count,
594+
shm_ent_page_count,
595+
max_nodes;
596+
Size *nodes;
597+
598+
if (pg_numa_init() == -1)
599+
elog(ERROR, "libnuma initialization failed or NUMA is not supported on this platform");
600+
601+
InitMaterializedSRF(fcinfo, 0);
602+
603+
max_nodes = pg_numa_get_max_node();
604+
nodes = palloc(sizeof(Size) * (max_nodes + 1));
605+
606+
/*
607+
* Different database block sizes (4kB, 8kB, ..., 32kB) can be used, while
608+
* the OS may have different memory page sizes.
609+
*
610+
* To correctly map between them, we need to: 1. Determine the OS memory
611+
* page size 2. Calculate how many OS pages are used by all buffer blocks
612+
* 3. Calculate how many OS pages are contained within each database
613+
* block.
614+
*
615+
* This information is needed before calling move_pages() for NUMA memory
616+
* node inquiry.
617+
*/
618+
os_page_size = pg_numa_get_pagesize();
619+
620+
/*
621+
* Allocate memory for page pointers and status based on total shared
622+
* memory size. This simplified approach allocates enough space for all
623+
* pages in shared memory rather than calculating the exact requirements
624+
* for each segment.
625+
*
626+
* Add 1, because we don't know how exactly the segments align to OS
627+
* pages, so the allocation might use one more memory page. In practice
628+
* this is not very likely, and moreover we have more entries, each of
629+
* them using only fraction of the total pages.
630+
*/
631+
shm_total_page_count = (ShmemSegHdr->totalsize / os_page_size) + 1;
632+
page_ptrs = palloc0(sizeof(void *) * shm_total_page_count);
633+
pages_status = palloc(sizeof(int) * shm_total_page_count);
634+
635+
if (firstNumaTouch)
636+
elog(DEBUG1, "NUMA: page-faulting shared memory segments for proper NUMA readouts");
637+
638+
LWLockAcquire(ShmemIndexLock, LW_SHARED);
639+
640+
hash_seq_init(&hstat, ShmemIndex);
641+
642+
/* output all allocated entries */
643+
memset(nulls, 0, sizeof(nulls));
644+
while ((ent = (ShmemIndexEnt *) hash_seq_search(&hstat)) != NULL)
645+
{
646+
int i;
647+
char *startptr,
648+
*endptr;
649+
Size total_len;
650+
651+
/*
652+
* Calculate the range of OS pages used by this segment. The segment
653+
* may start / end half-way through a page, we want to count these
654+
* pages too. So we align the start/end pointers down/up, and then
655+
* calculate the number of pages from that.
656+
*/
657+
startptr = (char *) TYPEALIGN_DOWN(os_page_size, ent->location);
658+
endptr = (char *) TYPEALIGN(os_page_size,
659+
(char *) ent->location + ent->allocated_size);
660+
total_len = (endptr - startptr);
661+
662+
shm_ent_page_count = total_len / os_page_size;
663+
664+
/*
665+
* If we ever get 0xff (-1) back from kernel inquiry, then we probably
666+
* have a bug in mapping buffers to OS pages.
667+
*/
668+
memset(pages_status, 0xff, sizeof(int) * shm_ent_page_count);
669+
670+
/*
671+
* Setup page_ptrs[] with pointers to all OS pages for this segment,
672+
* and get the NUMA status using pg_numa_query_pages.
673+
*
674+
* In order to get reliable results we also need to touch memory
675+
* pages, so that inquiry about NUMA memory node doesn't return -2
676+
* (ENOENT, which indicates unmapped/unallocated pages).
677+
*/
678+
for (i = 0; i < shm_ent_page_count; i++)
679+
{
680+
volatile uint64 touch pg_attribute_unused();
681+
682+
page_ptrs[i] = startptr + (i * os_page_size);
683+
684+
if (firstNumaTouch)
685+
pg_numa_touch_mem_if_required(touch, page_ptrs[i]);
686+
687+
CHECK_FOR_INTERRUPTS();
688+
}
689+
690+
if (pg_numa_query_pages(0, shm_ent_page_count, page_ptrs, pages_status) == -1)
691+
elog(ERROR, "failed NUMA pages inquiry status: %m");
692+
693+
/* Count number of NUMA nodes used for this shared memory entry */
694+
memset(nodes, 0, sizeof(Size) * (max_nodes + 1));
695+
696+
for (i = 0; i < shm_ent_page_count; i++)
697+
{
698+
int s = pages_status[i];
699+
700+
/* Ensure we are adding only valid index to the array */
701+
if (s < 0 || s > max_nodes)
702+
{
703+
elog(ERROR, "invalid NUMA node id outside of allowed range "
704+
"[0, " UINT64_FORMAT "]: %d", max_nodes, s);
705+
}
706+
707+
nodes[s]++;
708+
}
709+
710+
/*
711+
* Add one entry for each NUMA node, including those without allocated
712+
* memory for this segment.
713+
*/
714+
for (i = 0; i <= max_nodes; i++)
715+
{
716+
values[0] = CStringGetTextDatum(ent->key);
717+
values[1] = i;
718+
values[2] = Int64GetDatum(nodes[i] * os_page_size);
719+
720+
tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
721+
values, nulls);
722+
}
723+
}
724+
725+
LWLockRelease(ShmemIndexLock);
726+
firstNumaTouch = false;
727+
728+
return (Datum) 0;
729+
}

src/include/catalog/catversion.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,6 @@
5757
*/
5858

5959
/* yyyymmddN */
60-
#define CATALOG_VERSION_NO 202504072
60+
#define CATALOG_VERSION_NO 202504073
6161

6262
#endif

src/include/catalog/pg_proc.dat

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8546,6 +8546,14 @@
85468546
proname => 'pg_numa_available', provolatile => 's', prorettype => 'bool',
85478547
proargtypes => '', prosrc => 'pg_numa_available' },
85488548

8549+
# shared memory usage with NUMA info
8550+
{ oid => '4100', descr => 'NUMA mappings for the main shared memory segment',
8551+
proname => 'pg_get_shmem_allocations_numa', prorows => '50', proretset => 't',
8552+
provolatile => 'v', prorettype => 'record', proargtypes => '',
8553+
proallargtypes => '{text,int4,int8}', proargmodes => '{o,o,o}',
8554+
proargnames => '{name,numa_node,size}',
8555+
prosrc => 'pg_get_shmem_allocations_numa' },
8556+
85498557
# memory context of local backend
85508558
{ oid => '2282',
85518559
descr => 'information about all memory contexts of local backend',

src/test/regress/expected/numa.out

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
SELECT NOT(pg_numa_available()) AS skip_test \gset
2+
\if :skip_test
3+
SELECT COUNT(*) = 0 AS ok FROM pg_shmem_allocations_numa;
4+
\quit
5+
\endif
6+
-- switch to superuser
7+
\c -
8+
SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa;
9+
ok
10+
----
11+
t
12+
(1 row)
13+

src/test/regress/expected/numa_1.out

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
SELECT NOT(pg_numa_available()) AS skip_test \gset
2+
\if :skip_test
3+
SELECT COUNT(*) = 0 AS ok FROM pg_shmem_allocations_numa;
4+
ERROR: libnuma initialization failed or NUMA is not supported on this platform
5+
\quit

src/test/regress/expected/privileges.out

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3219,8 +3219,8 @@ REVOKE MAINTAIN ON lock_table FROM regress_locktable_user;
32193219
-- clean up
32203220
DROP TABLE lock_table;
32213221
DROP USER regress_locktable_user;
3222-
-- test to check privileges of system views pg_shmem_allocations and
3223-
-- pg_backend_memory_contexts.
3222+
-- test to check privileges of system views pg_shmem_allocations,
3223+
-- pg_shmem_allocations_numa and pg_backend_memory_contexts.
32243224
-- switch to superuser
32253225
\c -
32263226
CREATE ROLE regress_readallstats;
@@ -3242,6 +3242,12 @@ SELECT has_table_privilege('regress_readallstats','pg_shmem_allocations','SELECT
32423242
f
32433243
(1 row)
32443244

3245+
SELECT has_table_privilege('regress_readallstats','pg_shmem_allocations_numa','SELECT'); -- no
3246+
has_table_privilege
3247+
---------------------
3248+
f
3249+
(1 row)
3250+
32453251
GRANT pg_read_all_stats TO regress_readallstats;
32463252
SELECT has_table_privilege('regress_readallstats','pg_aios','SELECT'); -- yes
32473253
has_table_privilege
@@ -3261,6 +3267,12 @@ SELECT has_table_privilege('regress_readallstats','pg_shmem_allocations','SELECT
32613267
t
32623268
(1 row)
32633269

3270+
SELECT has_table_privilege('regress_readallstats','pg_shmem_allocations_numa','SELECT'); -- yes
3271+
has_table_privilege
3272+
---------------------
3273+
t
3274+
(1 row)
3275+
32643276
-- run query to ensure that functions within views can be executed
32653277
SET ROLE regress_readallstats;
32663278
SELECT COUNT(*) >= 0 AS ok FROM pg_aios;

src/test/regress/expected/rules.out

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1757,6 +1757,10 @@ pg_shmem_allocations| SELECT name,
17571757
size,
17581758
allocated_size
17591759
FROM pg_get_shmem_allocations() pg_get_shmem_allocations(name, off, size, allocated_size);
1760+
pg_shmem_allocations_numa| SELECT name,
1761+
numa_node,
1762+
size
1763+
FROM pg_get_shmem_allocations_numa() pg_get_shmem_allocations_numa(name, numa_node, size);
17601764
pg_stat_activity| SELECT s.datid,
17611765
d.datname,
17621766
s.pid,

0 commit comments

Comments
 (0)