Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit b9515b6

Browse files
committed
Use pselect(2) not select(2), if available, to wait in postmaster's loop.
Traditionally we've unblocked signals, called select(2), and then blocked signals again. The code expects that the select() will be cancelled with EINTR if an interrupt occurs; but there's a race condition, which is that an already-pending signal will be delivered as soon as we unblock, and then when we reach select() there will be nothing preventing it from waiting. This can result in a long delay before we perform any action that ServerLoop was supposed to have taken in response to the signal. As with the somewhat-similar symptoms fixed by commit 8939020, the main practical problem is slow launching of parallel workers. The window for trouble is usually pretty short, corresponding to one iteration of ServerLoop; but it's not negligible. To fix, use pselect(2) in place of select(2) where available, as that's designed to solve exactly this problem. Where not available, we continue to use the old way, and are no worse off than before. pselect(2) has been required by POSIX since about 2001, so most modern platforms should have it. A bigger portability issue is that some implementations are said to be non-atomic, ie pselect() isn't really any different from unblock/select/reblock. Still, we're no worse off than before on such a platform. There is talk of rewriting the postmaster to use a WaitEventSet and not do signal response work in signal handlers, at which point this could be reverted, since we'd be using a self-pipe to solve the race condition. But that's not happening before v11 at the earliest. Back-patch to 9.6. The problem exists much further back, but the worst symptom arises only in connection with parallel query, so it does not seem worth taking any portability risks in older branches. Discussion: https://postgr.es/m/9205.1492833041@sss.pgh.pa.us
1 parent dfa4baf commit b9515b6

File tree

5 files changed

+54
-19
lines changed

5 files changed

+54
-19
lines changed

configure

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12504,7 +12504,7 @@ fi
1250412504
LIBS_including_readline="$LIBS"
1250512505
LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
1250612506

12507-
for ac_func in cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat pthread_is_threaded_np readlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes wcstombs wcstombs_l
12507+
for ac_func in cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pselect pstat pthread_is_threaded_np readlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes wcstombs wcstombs_l
1250812508
do :
1250912509
as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
1251012510
ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"

configure.in

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1456,7 +1456,7 @@ PGAC_FUNC_WCSTOMBS_L
14561456
LIBS_including_readline="$LIBS"
14571457
LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
14581458

1459-
AC_CHECK_FUNCS([cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat pthread_is_threaded_np readlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes wcstombs wcstombs_l])
1459+
AC_CHECK_FUNCS([cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pselect pstat pthread_is_threaded_np readlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes wcstombs wcstombs_l])
14601460

14611461
AC_REPLACE_FUNCS(fseeko)
14621462
case $host_os in

src/backend/postmaster/postmaster.c

Lines changed: 46 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -606,9 +606,9 @@ PostmasterMain(int argc, char *argv[])
606606
* In the postmaster, we want to install non-ignored handlers *without*
607607
* SA_RESTART. This is because they'll be blocked at all times except
608608
* when ServerLoop is waiting for something to happen, and during that
609-
* window, we want signals to exit the select(2) wait so that ServerLoop
609+
* window, we want signals to exit the pselect(2) wait so that ServerLoop
610610
* can respond if anything interesting happened. On some platforms,
611-
* signals marked SA_RESTART would not cause the select() wait to end.
611+
* signals marked SA_RESTART would not cause the pselect() wait to end.
612612
* Child processes will generally want SA_RESTART, but we expect them to
613613
* set up their own handlers before unblocking signals.
614614
*
@@ -1642,6 +1642,8 @@ ServerLoop(void)
16421642
for (;;)
16431643
{
16441644
fd_set rmask;
1645+
fd_set *rmask_p;
1646+
struct timeval timeout;
16451647
int selres;
16461648
time_t now;
16471649

@@ -1651,37 +1653,64 @@ ServerLoop(void)
16511653
* We block all signals except while sleeping. That makes it safe for
16521654
* signal handlers, which again block all signals while executing, to
16531655
* do nontrivial work.
1654-
*
1655-
* If we are in PM_WAIT_DEAD_END state, then we don't want to accept
1656-
* any new connections, so we don't call select(), and just sleep.
16571656
*/
1658-
memcpy((char *) &rmask, (char *) &readmask, sizeof(fd_set));
1659-
16601657
if (pmState == PM_WAIT_DEAD_END)
16611658
{
1662-
PG_SETMASK(&UnBlockSig);
1663-
1664-
pg_usleep(100000L); /* 100 msec seems reasonable */
1665-
selres = 0;
1666-
1667-
PG_SETMASK(&BlockSig);
1659+
/*
1660+
* If we are in PM_WAIT_DEAD_END state, then we don't want to
1661+
* accept any new connections, so pass a null rmask.
1662+
*/
1663+
rmask_p = NULL;
1664+
timeout.tv_sec = 0;
1665+
timeout.tv_usec = 100000; /* 100 msec seems reasonable */
16681666
}
16691667
else
16701668
{
1671-
/* must set timeout each time; some OSes change it! */
1672-
struct timeval timeout;
1669+
/* Normal case: check sockets, and compute a suitable timeout */
1670+
memcpy(&rmask, &readmask, sizeof(fd_set));
1671+
rmask_p = &rmask;
16731672

16741673
/* Needs to run with blocked signals! */
16751674
DetermineSleepTime(&timeout);
1675+
}
16761676

1677+
/*
1678+
* We prefer to wait with pselect(2) if available, as using that,
1679+
* together with *not* using SA_RESTART for signals, guarantees that
1680+
* we will get kicked off the wait if a signal occurs.
1681+
*
1682+
* If we lack pselect(2), fake it with select(2). This has a race
1683+
* condition: a signal that was already pending will be delivered
1684+
* before we reach the select(), and therefore the select() will wait,
1685+
* even though we might wish to do something in response. Therefore,
1686+
* beware of putting any time-critical signal response logic into
1687+
* ServerLoop rather than into the signal handler itself. It will run
1688+
* eventually, but maybe not till after a timeout delay.
1689+
*
1690+
* Some implementations of pselect() are reportedly not atomic, making
1691+
* the first alternative here functionally equivalent to the second.
1692+
* Not much we can do about that though.
1693+
*/
1694+
{
1695+
#ifdef HAVE_PSELECT
1696+
/* pselect uses a randomly different timeout API, sigh */
1697+
struct timespec ptimeout;
1698+
1699+
ptimeout.tv_sec = timeout.tv_sec;
1700+
ptimeout.tv_nsec = timeout.tv_usec * 1000;
1701+
1702+
selres = pselect(nSockets, rmask_p, NULL, NULL,
1703+
&ptimeout, &UnBlockSig);
1704+
#else
16771705
PG_SETMASK(&UnBlockSig);
16781706

1679-
selres = select(nSockets, &rmask, NULL, NULL, &timeout);
1707+
selres = select(nSockets, rmask_p, NULL, NULL, &timeout);
16801708

16811709
PG_SETMASK(&BlockSig);
1710+
#endif
16821711
}
16831712

1684-
/* Now check the select() result */
1713+
/* Now check the select()/pselect() result */
16851714
if (selres < 0)
16861715
{
16871716
if (errno != EINTR && errno != EWOULDBLOCK)

src/include/pg_config.h.in

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -397,6 +397,9 @@
397397
/* Define to 1 if the assembler supports PPC's LWARX mutex hint bit. */
398398
#undef HAVE_PPC_LWARX_MUTEX_HINT
399399

400+
/* Define to 1 if you have the `pselect' function. */
401+
#undef HAVE_PSELECT
402+
400403
/* Define to 1 if you have the `pstat' function. */
401404
#undef HAVE_PSTAT
402405

src/include/pg_config.h.win32

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -264,6 +264,9 @@
264264
/* Define to 1 if you have the <poll.h> header file. */
265265
/* #undef HAVE_POLL_H */
266266

267+
/* Define to 1 if you have the `pselect' function. */
268+
/* #undef HAVE_PSELECT */
269+
267270
/* Define to 1 if you have the `pstat' function. */
268271
/* #undef HAVE_PSTAT */
269272

0 commit comments

Comments
 (0)