Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit d2bddc2

Browse files
committed
Add huge_page_size setting for use on Linux.
This allows the huge page size to be set explicitly. The default is 0, meaning it will use the system default, as before. Author: Odin Ugedal <odin@ugedal.com> Discussion: https://postgr.es/m/20200608154639.20254-1-odin%40ugedal.com
1 parent d66b23b commit d2bddc2

File tree

6 files changed

+141
-38
lines changed

6 files changed

+141
-38
lines changed

doc/src/sgml/config.sgml

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1582,6 +1582,33 @@ include_dir 'conf.d'
15821582
</listitem>
15831583
</varlistentry>
15841584

1585+
<varlistentry id="guc-huge-page-size" xreflabel="huge_page_size">
1586+
<term><varname>huge_page_size</varname> (<type>integer</type>)
1587+
<indexterm>
1588+
<primary><varname>huge_page_size</varname> configuration parameter</primary>
1589+
</indexterm>
1590+
</term>
1591+
<listitem>
1592+
<para>
1593+
Controls the size of huge pages, when they are enabled with
1594+
<xref linkend="guc-huge-pages"/>.
1595+
The default is zero (<literal>0</literal>).
1596+
When set to <literal>0</literal>, the default huge page size on the
1597+
system will be used.
1598+
</para>
1599+
<para>
1600+
Some commonly available page sizes on modern 64 bit server architectures include:
1601+
<literal>2MB</literal> and <literal>1GB</literal> (Intel and AMD), <literal>16MB</literal> and
1602+
<literal>16GB</literal> (IBM POWER), and <literal>64kB</literal>, <literal>2MB</literal>,
1603+
<literal>32MB</literal> and <literal>1GB</literal> (ARM). For more information
1604+
about usage and support, see <xref linkend="linux-huge-pages"/>.
1605+
</para>
1606+
<para>
1607+
Non-default settings are currently supported only on Linux.
1608+
</para>
1609+
</listitem>
1610+
</varlistentry>
1611+
15851612
<varlistentry id="guc-temp-buffers" xreflabel="temp_buffers">
15861613
<term><varname>temp_buffers</varname> (<type>integer</type>)
15871614
<indexterm>

doc/src/sgml/runtime.sgml

Lines changed: 35 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1391,41 +1391,55 @@ export PG_OOM_ADJUST_VALUE=0
13911391
using large values of <xref linkend="guc-shared-buffers"/>. To use this
13921392
feature in <productname>PostgreSQL</productname> you need a kernel
13931393
with <varname>CONFIG_HUGETLBFS=y</varname> and
1394-
<varname>CONFIG_HUGETLB_PAGE=y</varname>. You will also have to adjust
1395-
the kernel setting <varname>vm.nr_hugepages</varname>. To estimate the
1396-
number of huge pages needed, start <productname>PostgreSQL</productname>
1397-
without huge pages enabled and check the
1398-
postmaster's anonymous shared memory segment size, as well as the system's
1399-
huge page size, using the <filename>/proc</filename> file system. This might
1400-
look like:
1394+
<varname>CONFIG_HUGETLB_PAGE=y</varname>. You will also have to configure
1395+
the operating system to provide enough huge pages of the desired size.
1396+
To estimate the number of huge pages needed, start
1397+
<productname>PostgreSQL</productname> without huge pages enabled and check
1398+
the postmaster's anonymous shared memory segment size, as well as the
1399+
system's default and supported huge page sizes, using the
1400+
<filename>/proc</filename> and <filename>/sys</filename> file systems.
1401+
This might look like:
14011402
<programlisting>
14021403
$ <userinput>head -1 $PGDATA/postmaster.pid</userinput>
14031404
4170
14041405
$ <userinput>pmap 4170 | awk '/rw-s/ &amp;&amp; /zero/ {print $2}'</userinput>
14051406
6490428K
14061407
$ <userinput>grep ^Hugepagesize /proc/meminfo</userinput>
14071408
Hugepagesize: 2048 kB
1409+
$ <userinput>ls /sys/kernel/mm/hugepages</userinput>
1410+
hugepages-1048576kB hugepages-2048kB
14081411
</programlisting>
1412+
1413+
In this example the default is 2MB, but you can also explicitly request
1414+
either 2MB or 1GB with <xref linkend="guc-huge-page-size"/>.
1415+
1416+
Assuming <literal>2MB</literal> huge pages,
14091417
<literal>6490428</literal> / <literal>2048</literal> gives approximately
14101418
<literal>3169.154</literal>, so in this example we need at
1411-
least <literal>3170</literal> huge pages, which we can set with:
1419+
least <literal>3170</literal> huge pages. A larger setting would be
1420+
appropriate if other programs on the machine also need huge pages.
1421+
We can set this with:
1422+
<programlisting>
1423+
# <userinput>sysctl -w vm.nr_hugepages=3170</userinput>
1424+
</programlisting>
1425+
Don't forget to add this setting to <filename>/etc/sysctl.conf</filename>
1426+
so that it is reapplied after reboots. For non-default huge page sizes,
1427+
we can instead use:
14121428
<programlisting>
1413-
$ <userinput>sysctl -w vm.nr_hugepages=3170</userinput>
1429+
# <userinput>echo 3170 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages</userinput>
14141430
</programlisting>
1415-
A larger setting would be appropriate if other programs on the machine
1416-
also need huge pages. Don't forget to add this setting
1417-
to <filename>/etc/sysctl.conf</filename> so that it will be reapplied
1418-
after reboots.
1431+
It is also possible to provide these settings at boot time using
1432+
kernel parameters such as <literal>hugepagesz=2M hugepages=3170</literal>.
14191433
</para>
14201434

14211435
<para>
14221436
Sometimes the kernel is not able to allocate the desired number of huge
1423-
pages immediately, so it might be necessary to repeat the command or to
1424-
reboot. (Immediately after a reboot, most of the machine's memory
1425-
should be available to convert into huge pages.) To verify the huge
1426-
page allocation situation, use:
1437+
pages immediately due to fragmentation, so it might be necessary
1438+
to repeat the command or to reboot. (Immediately after a reboot, most of
1439+
the machine's memory should be available to convert into huge pages.)
1440+
To verify the huge page allocation situation for a given size, use:
14271441
<programlisting>
1428-
$ <userinput>grep Huge /proc/meminfo</userinput>
1442+
$ <userinput>cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages</userinput>
14291443
</programlisting>
14301444
</para>
14311445

@@ -1438,8 +1452,9 @@ $ <userinput>grep Huge /proc/meminfo</userinput>
14381452

14391453
<para>
14401454
The default behavior for huge pages in
1441-
<productname>PostgreSQL</productname> is to use them when possible and
1442-
to fall back to normal pages when failing. To enforce the use of huge
1455+
<productname>PostgreSQL</productname> is to use them when possible, with
1456+
the system's default huge page size, and
1457+
to fall back to normal pages on failure. To enforce the use of huge
14431458
pages, you can set <xref linkend="guc-huge-pages"/>
14441459
to <literal>on</literal> in <filename>postgresql.conf</filename>.
14451460
Note that with this setting <productname>PostgreSQL</productname> will fail to

src/backend/port/sysv_shmem.c

Lines changed: 45 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
#endif
3333

3434
#include "miscadmin.h"
35+
#include "port/pg_bitutils.h"
3536
#include "portability/mem.h"
3637
#include "storage/dsm.h"
3738
#include "storage/fd.h"
@@ -448,7 +449,7 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
448449
#ifdef MAP_HUGETLB
449450

450451
/*
451-
* Identify the huge page size to use.
452+
* Identify the huge page size to use, and compute the related mmap flags.
452453
*
453454
* Some Linux kernel versions have a bug causing mmap() to fail on requests
454455
* that are not a multiple of the hugepage size. Versions without that bug
@@ -464,25 +465,13 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
464465
* hugepage sizes, we might want to think about more invasive strategies,
465466
* such as increasing shared_buffers to absorb the extra space.
466467
*
467-
* Returns the (real or assumed) page size into *hugepagesize,
468+
* Returns the (real, assumed or config provided) page size into *hugepagesize,
468469
* and the hugepage-related mmap flags to use into *mmap_flags.
469-
*
470-
* Currently *mmap_flags is always just MAP_HUGETLB. Someday, on systems
471-
* that support it, we might OR in additional bits to specify a particular
472-
* non-default huge page size.
473470
*/
474471
static void
475472
GetHugePageSize(Size *hugepagesize, int *mmap_flags)
476473
{
477-
/*
478-
* If we fail to find out the system's default huge page size, assume it
479-
* is 2MB. This will work fine when the actual size is less. If it's
480-
* more, we might get mmap() or munmap() failures due to unaligned
481-
* requests; but at this writing, there are no reports of any non-Linux
482-
* systems being picky about that.
483-
*/
484-
*hugepagesize = 2 * 1024 * 1024;
485-
*mmap_flags = MAP_HUGETLB;
474+
Size default_hugepagesize = 0;
486475

487476
/*
488477
* System-dependent code to find out the default huge page size.
@@ -491,6 +480,7 @@ GetHugePageSize(Size *hugepagesize, int *mmap_flags)
491480
* nnnn kB". Ignore any failures, falling back to the preset default.
492481
*/
493482
#ifdef __linux__
483+
494484
{
495485
FILE *fp = AllocateFile("/proc/meminfo", "r");
496486
char buf[128];
@@ -505,7 +495,7 @@ GetHugePageSize(Size *hugepagesize, int *mmap_flags)
505495
{
506496
if (ch == 'k')
507497
{
508-
*hugepagesize = sz * (Size) 1024;
498+
default_hugepagesize = sz * (Size) 1024;
509499
break;
510500
}
511501
/* We could accept other units besides kB, if needed */
@@ -515,6 +505,44 @@ GetHugePageSize(Size *hugepagesize, int *mmap_flags)
515505
}
516506
}
517507
#endif /* __linux__ */
508+
509+
if (huge_page_size != 0)
510+
{
511+
/* If huge page size is requested explicitly, use that. */
512+
*hugepagesize = (Size) huge_page_size * 1024;
513+
}
514+
else if (default_hugepagesize != 0)
515+
{
516+
/* Otherwise use the system default, if we have it. */
517+
*hugepagesize = default_hugepagesize;
518+
}
519+
else
520+
{
521+
/*
522+
* If we fail to find out the system's default huge page size, or no
523+
* huge page size is requested explicitly, assume it is 2MB. This will
524+
* work fine when the actual size is less. If it's more, we might get
525+
* mmap() or munmap() failures due to unaligned requests; but at this
526+
* writing, there are no reports of any non-Linux systems being picky
527+
* about that.
528+
*/
529+
*hugepagesize = 2 * 1024 * 1024;
530+
}
531+
532+
*mmap_flags = MAP_HUGETLB;
533+
534+
/*
535+
* On recent enough Linux, also include the explicit page size, if
536+
* necessary.
537+
*/
538+
#if defined(MAP_HUGE_MASK) && defined(MAP_HUGE_SHIFT)
539+
if (*hugepagesize != default_hugepagesize)
540+
{
541+
int shift = pg_ceil_log2_64(*hugepagesize);
542+
543+
*mmap_flags |= (shift & MAP_HUGE_MASK) << MAP_HUGE_SHIFT;
544+
}
545+
#endif
518546
}
519547

520548
#endif /* MAP_HUGETLB */
@@ -583,7 +611,7 @@ CreateAnonymousSegment(Size *size)
583611
"(currently %zu bytes), reduce PostgreSQL's shared "
584612
"memory usage, perhaps by reducing shared_buffers or "
585613
"max_connections.",
586-
*size) : 0));
614+
allocsize) : 0));
587615
}
588616

589617
*size = allocsize;

src/backend/utils/misc/guc.c

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,14 @@
2020
#include <float.h>
2121
#include <math.h>
2222
#include <limits.h>
23-
#include <unistd.h>
23+
#ifndef WIN32
24+
#include <sys/mman.h>
25+
#endif
2426
#include <sys/stat.h>
2527
#ifdef HAVE_SYSLOG
2628
#include <syslog.h>
2729
#endif
30+
#include <unistd.h>
2831

2932
#include "access/commit_ts.h"
3033
#include "access/gin.h"
@@ -198,6 +201,7 @@ static bool check_max_wal_senders(int *newval, void **extra, GucSource source);
198201
static bool check_autovacuum_work_mem(int *newval, void **extra, GucSource source);
199202
static bool check_effective_io_concurrency(int *newval, void **extra, GucSource source);
200203
static bool check_maintenance_io_concurrency(int *newval, void **extra, GucSource source);
204+
static bool check_huge_page_size(int *newval, void **extra, GucSource source);
201205
static void assign_pgstat_temp_directory(const char *newval, void *extra);
202206
static bool check_application_name(char **newval, void **extra, GucSource source);
203207
static void assign_application_name(const char *newval, void *extra);
@@ -576,6 +580,7 @@ int ssl_renegotiation_limit;
576580
* need to be duplicated in all the different implementations of pg_shmem.c.
577581
*/
578582
int huge_pages;
583+
int huge_page_size;
579584

580585
/*
581586
* These variables are all dummies that don't do anything, except in some
@@ -3381,6 +3386,17 @@ static struct config_int ConfigureNamesInt[] =
33813386
NULL, assign_tcp_user_timeout, show_tcp_user_timeout
33823387
},
33833388

3389+
{
3390+
{"huge_page_size", PGC_POSTMASTER, RESOURCES_MEM,
3391+
gettext_noop("The size of huge page that should be requested."),
3392+
NULL,
3393+
GUC_UNIT_KB
3394+
},
3395+
&huge_page_size,
3396+
0, 0, INT_MAX,
3397+
check_huge_page_size, NULL, NULL
3398+
},
3399+
33843400
/* End-of-list marker */
33853401
{
33863402
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
@@ -11565,6 +11581,20 @@ check_maintenance_io_concurrency(int *newval, void **extra, GucSource source)
1156511581
return true;
1156611582
}
1156711583

11584+
static bool
11585+
check_huge_page_size(int *newval, void **extra, GucSource source)
11586+
{
11587+
#if !(defined(MAP_HUGE_MASK) && defined(MAP_HUGE_SHIFT))
11588+
/* Recent enough Linux only, for now. See GetHugePageSize(). */
11589+
if (*newval != 0)
11590+
{
11591+
GUC_check_errdetail("huge_page_size must be 0 on this platform.");
11592+
return false;
11593+
}
11594+
#endif
11595+
return true;
11596+
}
11597+
1156811598
static void
1156911599
assign_pgstat_temp_directory(const char *newval, void *extra)
1157011600
{

src/backend/utils/misc/postgresql.conf.sample

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,8 @@
122122
# (change requires restart)
123123
#huge_pages = try # on, off, or try
124124
# (change requires restart)
125+
#huge_page_size = 0 # zero for system default
126+
# (change requires restart)
125127
#temp_buffers = 8MB # min 800kB
126128
#max_prepared_transactions = 0 # zero disables the feature
127129
# (change requires restart)

src/include/storage/pg_shmem.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ typedef struct PGShmemHeader /* standard header for all Postgres shmem */
4444
/* GUC variables */
4545
extern int shared_memory_type;
4646
extern int huge_pages;
47+
extern int huge_page_size;
4748

4849
/* Possible values for huge_pages */
4950
typedef enum

0 commit comments

Comments
 (0)