Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit df8b7bc

Browse files
committed
Improve our mechanism for controlling the Linux out-of-memory killer.
Arrange for postmaster child processes to respond to two environment variables, PG_OOM_ADJUST_FILE and PG_OOM_ADJUST_VALUE, to determine whether they reset their OOM score adjustments and if so to what. This is superior to the previous design involving #ifdef's in several ways. The behavior is now available in a default build, and both ends of the adjustment --- the original adjustment of the postmaster's level and the subsequent readjustment by child processes --- can now be controlled in one place, namely the postmaster launch script. So it's no longer necessary for the launch script to act on faith that the server was compiled with the appropriate options. In addition, if someone wants to use an OOM score other than zero for the child processes, that doesn't take a recompile anymore; and we no longer have to cater separately to the two different historical kernel APIs for this adjustment. Gurjeet Singh, somewhat revised by me
1 parent 9606619 commit df8b7bc

File tree

3 files changed

+44
-54
lines changed

3 files changed

+44
-54
lines changed

contrib/start-scripts/linux

+9-6
Original file line numberDiff line numberDiff line change
@@ -43,14 +43,17 @@ PGLOG="$PGDATA/serverlog"
4343
# It's often a good idea to protect the postmaster from being killed by the
4444
# OOM killer (which will tend to preferentially kill the postmaster because
4545
# of the way it accounts for shared memory). Setting the OOM_SCORE_ADJ value
46-
# to -1000 will disable OOM kill altogether. If you enable this, you probably
47-
# want to compile PostgreSQL with "-DLINUX_OOM_SCORE_ADJ=0", so that
48-
# individual backends can still be killed by the OOM killer.
46+
# to -1000 will disable OOM kill altogether, which is a good thing for the
47+
# postmaster, but not so much for individual backends. If you enable this,
48+
# also uncomment the DAEMON_ENV line, which will instruct backends to set
49+
# their OOM adjustments back to the default setting of zero.
4950
#OOM_SCORE_ADJ=-1000
51+
#DAEMON_ENV="PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj"
5052
# Older Linux kernels may not have /proc/self/oom_score_adj, but instead
5153
# /proc/self/oom_adj, which works similarly except the disable value is -17.
52-
# For such a system, enable this and compile with "-DLINUX_OOM_ADJ=0".
54+
# For such a system, uncomment these two lines instead.
5355
#OOM_ADJ=-17
56+
#DAEMON_ENV="PG_OOM_ADJUST_FILE=/proc/self/oom_adj"
5457

5558
## STOP EDITING HERE
5659

@@ -84,7 +87,7 @@ case $1 in
8487
echo -n "Starting PostgreSQL: "
8588
test x"$OOM_SCORE_ADJ" != x && echo "$OOM_SCORE_ADJ" > /proc/self/oom_score_adj
8689
test x"$OOM_ADJ" != x && echo "$OOM_ADJ" > /proc/self/oom_adj
87-
su - $PGUSER -c "$DAEMON -D '$PGDATA' &" >>$PGLOG 2>&1
90+
su - $PGUSER -c "$DAEMON_ENV $DAEMON -D '$PGDATA' &" >>$PGLOG 2>&1
8891
echo "ok"
8992
;;
9093
stop)
@@ -97,7 +100,7 @@ case $1 in
97100
su - $PGUSER -c "$PGCTL stop -D '$PGDATA' -s -m fast -w"
98101
test x"$OOM_SCORE_ADJ" != x && echo "$OOM_SCORE_ADJ" > /proc/self/oom_score_adj
99102
test x"$OOM_ADJ" != x && echo "$OOM_ADJ" > /proc/self/oom_adj
100-
su - $PGUSER -c "$DAEMON -D '$PGDATA' &" >>$PGLOG 2>&1
103+
su - $PGUSER -c "$DAEMON_ENV $DAEMON -D '$PGDATA' &" >>$PGLOG 2>&1
101104
echo "ok"
102105
;;
103106
reload)

doc/src/sgml/runtime.sgml

+17-9
Original file line numberDiff line numberDiff line change
@@ -1275,7 +1275,7 @@ sysctl -w vm.overcommit_memory=2
12751275
<para>
12761276
Another approach, which can be used with or without altering
12771277
<varname>vm.overcommit_memory</>, is to set the process-specific
1278-
<varname>oom_score_adj</> value for the postmaster process to
1278+
<firstterm>OOM score adjustment</> value for the postmaster process to
12791279
<literal>-1000</>, thereby guaranteeing it will not be targeted by the OOM
12801280
killer. The simplest way to do this is to execute
12811281
<programlisting>
@@ -1284,20 +1284,28 @@ echo -1000 > /proc/self/oom_score_adj
12841284
in the postmaster's startup script just before invoking the postmaster.
12851285
Note that this action must be done as root, or it will have no effect;
12861286
so a root-owned startup script is the easiest place to do it. If you
1287-
do this, you may also wish to build <productname>PostgreSQL</>
1288-
with <literal>-DLINUX_OOM_SCORE_ADJ=0</> added to <varname>CPPFLAGS</>.
1289-
That will cause postmaster child processes to run with the normal
1290-
<varname>oom_score_adj</> value of zero, so that the OOM killer can still
1291-
target them at need.
1287+
do this, you should also set these environment variables in the startup
1288+
script before invoking the postmaster:
1289+
<programlisting>
1290+
export PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj
1291+
export PG_OOM_ADJUST_VALUE=0
1292+
</programlisting>
1293+
These settings will cause postmaster child processes to run with the
1294+
normal OOM score adjustment of zero, so that the OOM killer can still
1295+
target them at need. You could use some other value for
1296+
<envar>PG_OOM_ADJUST_VALUE</> if you want the child processes to run
1297+
with some other OOM score adjustment. (<envar>PG_OOM_ADJUST_VALUE</>
1298+
can also be omitted, in which case it defaults to zero.) If you do not
1299+
set <envar>PG_OOM_ADJUST_FILE</>, the child processes will run with the
1300+
same OOM score adjustment as the postmaster, which is unwise since the
1301+
whole point is to ensure that the postmaster has a preferential setting.
12921302
</para>
12931303

12941304
<para>
12951305
Older Linux kernels do not offer <filename>/proc/self/oom_score_adj</>,
12961306
but may have a previous version of the same functionality called
12971307
<filename>/proc/self/oom_adj</>. This works the same except the disable
1298-
value is <literal>-17</> not <literal>-1000</>. The corresponding
1299-
build flag for <productname>PostgreSQL</> is
1300-
<literal>-DLINUX_OOM_ADJ=0</>.
1308+
value is <literal>-17</> not <literal>-1000</>.
13011309
</para>
13021310

13031311
<note>

src/backend/postmaster/fork_process.c

+18-39
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ pid_t
3131
fork_process(void)
3232
{
3333
pid_t result;
34+
const char *oomfilename;
3435

3536
#ifdef LINUX_PROFILE
3637
struct itimerval prof_itimer;
@@ -71,62 +72,40 @@ fork_process(void)
7172
* process sizes *including shared memory*. (This is unbelievably
7273
* stupid, but the kernel hackers seem uninterested in improving it.)
7374
* Therefore it's often a good idea to protect the postmaster by
74-
* setting its oom_score_adj value negative (which has to be done in a
75-
* root-owned startup script). If you just do that much, all child
76-
* processes will also be protected against OOM kill, which might not
77-
* be desirable. You can then choose to build with
78-
* LINUX_OOM_SCORE_ADJ #defined to 0, or to some other value that you
79-
* want child processes to adopt here.
75+
* setting its OOM score adjustment negative (which has to be done in
76+
* a root-owned startup script). Since the adjustment is inherited by
77+
* child processes, this would ordinarily mean that all the
78+
* postmaster's children are equally protected against OOM kill, which
79+
* is not such a good idea. So we provide this code to allow the
80+
* children to change their OOM score adjustments again. Both the
81+
* file name to write to and the value to write are controlled by
82+
* environment variables, which can be set by the same startup script
83+
* that did the original adjustment.
8084
*/
81-
#ifdef LINUX_OOM_SCORE_ADJ
82-
{
83-
/*
84-
* Use open() not stdio, to ensure we control the open flags. Some
85-
* Linux security environments reject anything but O_WRONLY.
86-
*/
87-
int fd = open("/proc/self/oom_score_adj", O_WRONLY, 0);
88-
89-
/* We ignore all errors */
90-
if (fd >= 0)
91-
{
92-
char buf[16];
93-
int rc;
85+
oomfilename = getenv("PG_OOM_ADJUST_FILE");
9486

95-
snprintf(buf, sizeof(buf), "%d\n", LINUX_OOM_SCORE_ADJ);
96-
rc = write(fd, buf, strlen(buf));
97-
(void) rc;
98-
close(fd);
99-
}
100-
}
101-
#endif /* LINUX_OOM_SCORE_ADJ */
102-
103-
/*
104-
* Older Linux kernels have oom_adj not oom_score_adj. This works
105-
* similarly except with a different scale of adjustment values. If
106-
* it's necessary to build Postgres to work with either API, you can
107-
* define both LINUX_OOM_SCORE_ADJ and LINUX_OOM_ADJ.
108-
*/
109-
#ifdef LINUX_OOM_ADJ
87+
if (oomfilename != NULL)
11088
{
11189
/*
11290
* Use open() not stdio, to ensure we control the open flags. Some
11391
* Linux security environments reject anything but O_WRONLY.
11492
*/
115-
int fd = open("/proc/self/oom_adj", O_WRONLY, 0);
93+
int fd = open(oomfilename, O_WRONLY, 0);
11694

11795
/* We ignore all errors */
11896
if (fd >= 0)
11997
{
120-
char buf[16];
98+
const char *oomvalue = getenv("PG_OOM_ADJUST_VALUE");
12199
int rc;
122100

123-
snprintf(buf, sizeof(buf), "%d\n", LINUX_OOM_ADJ);
124-
rc = write(fd, buf, strlen(buf));
101+
if (oomvalue == NULL) /* supply a useful default */
102+
oomvalue = "0";
103+
104+
rc = write(fd, oomvalue, strlen(oomvalue));
125105
(void) rc;
126106
close(fd);
127107
}
128108
}
129-
#endif /* LINUX_OOM_ADJ */
130109

131110
/*
132111
* Make sure processes do not share OpenSSL randomness state.

0 commit comments

Comments
 (0)