Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 6753333

Browse files
committed
Move deadlock and other interrupt handling in proc.c out of signal handlers.
Deadlock checking was performed inside signal handlers up to now. While it's a remarkable feat to have made this work reliably, it's quite complex to understand why that is the case. Partially it worked due to the assumption that semaphores are signal safe - which is not actually documented to be the case for sysv semaphores. The reason we had to rely on performing this work inside signal handlers is that semaphores aren't guaranteed to be interruptable by signals on all platforms. But now that latches provide a somewhat similar API, which actually has the guarantee of being interruptible, we can avoid doing so. Signalling between ProcSleep, ProcWakeup, ProcWaitForSignal and ProcSendSignal is now done using latches. This increases the likelihood of spurious wakeups. As spurious wakeup already were possible and aren't likely to be frequent enough to be an actual problem, this seems acceptable. This change would allow for further simplification of the deadlock checking, now that it doesn't have to run in a signal handler. But even if I were motivated to do so right now, it would still be better to do that separately. Such a cleanup shouldn't have to be reviewed a the same time as the more fundamental changes in this commit. There is one possible usability regression due to this commit. Namely it is more likely than before that log_lock_waits messages are output more than once. Reviewed-By: Heikki Linnakangas
1 parent 6647248 commit 6753333

File tree

3 files changed

+63
-72
lines changed

3 files changed

+63
-72
lines changed

src/backend/storage/lmgr/proc.c

+61-70
Original file line numberDiff line numberDiff line change
@@ -80,13 +80,15 @@ PGPROC *PreparedXactProcs = NULL;
8080
/* If we are waiting for a lock, this points to the associated LOCALLOCK */
8181
static LOCALLOCK *lockAwaited = NULL;
8282

83-
/* Mark this volatile because it can be changed by signal handler */
84-
static volatile DeadLockState deadlock_state = DS_NOT_YET_CHECKED;
83+
static DeadLockState deadlock_state = DS_NOT_YET_CHECKED;
8584

85+
/* Is a deadlock check pending? */
86+
static volatile sig_atomic_t got_deadlock_timeout;
8687

8788
static void RemoveProcFromArray(int code, Datum arg);
8889
static void ProcKill(int code, Datum arg);
8990
static void AuxiliaryProcKill(int code, Datum arg);
91+
static void CheckDeadLock(void);
9092

9193

9294
/*
@@ -705,16 +707,6 @@ LockErrorCleanup(void)
705707

706708
LWLockRelease(partitionLock);
707709

708-
/*
709-
* We used to do PGSemaphoreReset() here to ensure that our proc's wait
710-
* semaphore is reset to zero. This prevented a leftover wakeup signal
711-
* from remaining in the semaphore if someone else had granted us the lock
712-
* we wanted before we were able to remove ourselves from the wait-list.
713-
* However, now that ProcSleep loops until waitStatus changes, a leftover
714-
* wakeup signal isn't harmful, and it seems not worth expending cycles to
715-
* get rid of a signal that most likely isn't there.
716-
*/
717-
718710
RESUME_INTERRUPTS();
719711
}
720712

@@ -942,9 +934,6 @@ ProcQueueInit(PROC_QUEUE *queue)
942934
* we release the partition lock.
943935
*
944936
* NOTES: The process queue is now a priority queue for locking.
945-
*
946-
* P() on the semaphore should put us to sleep. The process
947-
* semaphore is normally zero, so when we try to acquire it, we sleep.
948937
*/
949938
int
950939
ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
@@ -1084,6 +1073,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
10841073

10851074
/* Reset deadlock_state before enabling the timeout handler */
10861075
deadlock_state = DS_NOT_YET_CHECKED;
1076+
got_deadlock_timeout = false;
10871077

10881078
/*
10891079
* Set timer so we can wake up after awhile and check for a deadlock. If a
@@ -1113,32 +1103,37 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
11131103
enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
11141104

11151105
/*
1116-
* If someone wakes us between LWLockRelease and PGSemaphoreLock,
1117-
* PGSemaphoreLock will not block. The wakeup is "saved" by the semaphore
1118-
* implementation. While this is normally good, there are cases where a
1119-
* saved wakeup might be leftover from a previous operation (for example,
1120-
* we aborted ProcWaitForSignal just before someone did ProcSendSignal).
1121-
* So, loop to wait again if the waitStatus shows we haven't been granted
1122-
* nor denied the lock yet.
1106+
* If somebody wakes us between LWLockRelease and WaitLatch, the latch
1107+
* will not wait. But a set latch does not necessarily mean that the lock
1108+
* is free now, as there are many other sources for latch sets than
1109+
* somebody releasing the lock.
11231110
*
1124-
* We pass interruptOK = true, which eliminates a window in which
1125-
* cancel/die interrupts would be held off undesirably. This is a promise
1126-
* that we don't mind losing control to a cancel/die interrupt here. We
1127-
* don't, because we have no shared-state-change work to do after being
1128-
* granted the lock (the grantor did it all). We do have to worry about
1129-
* canceling the deadlock timeout and updating the locallock table, but if
1130-
* we lose control to an error, LockErrorCleanup will fix that up.
1111+
* We process interrupts whenever the latch has been set, so cancel/die
1112+
* interrupts are processed quickly. This means we must not mind losing
1113+
* control to a cancel/die interrupt here. We don't, because we have no
1114+
* shared-state-change work to do after being granted the lock (the
1115+
* grantor did it all). We do have to worry about canceling the deadlock
1116+
* timeout and updating the locallock table, but if we lose control to an
1117+
* error, LockErrorCleanup will fix that up.
11311118
*/
11321119
do
11331120
{
1134-
PGSemaphoreLock(&MyProc->sem, true);
1121+
WaitLatch(MyLatch, WL_LATCH_SET, 0);
1122+
ResetLatch(MyLatch);
1123+
/* check for deadlocks first, as that's probably log-worthy */
1124+
if (got_deadlock_timeout)
1125+
{
1126+
CheckDeadLock();
1127+
got_deadlock_timeout = false;
1128+
}
1129+
CHECK_FOR_INTERRUPTS();
11351130

11361131
/*
11371132
* waitStatus could change from STATUS_WAITING to something else
11381133
* asynchronously. Read it just once per loop to prevent surprising
11391134
* behavior (such as missing log messages).
11401135
*/
1141-
myWaitStatus = MyProc->waitStatus;
1136+
myWaitStatus = *((volatile int *) &MyProc->waitStatus);
11421137

11431138
/*
11441139
* If we are not deadlocked, but are waiting on an autovacuum-induced
@@ -1435,7 +1430,7 @@ ProcWakeup(PGPROC *proc, int waitStatus)
14351430
proc->waitStatus = waitStatus;
14361431

14371432
/* And awaken it */
1438-
PGSemaphoreUnlock(&proc->sem);
1433+
SetLatch(&proc->procLatch);
14391434

14401435
return retProc;
14411436
}
@@ -1502,17 +1497,13 @@ ProcLockWakeup(LockMethod lockMethodTable, LOCK *lock)
15021497
/*
15031498
* CheckDeadLock
15041499
*
1505-
* We only get to this routine if the DEADLOCK_TIMEOUT fired
1506-
* while waiting for a lock to be released by some other process. Look
1507-
* to see if there's a deadlock; if not, just return and continue waiting.
1508-
* (But signal ProcSleep to log a message, if log_lock_waits is true.)
1509-
* If we have a real deadlock, remove ourselves from the lock's wait queue
1510-
* and signal an error to ProcSleep.
1511-
*
1512-
* NB: this is run inside a signal handler, so be very wary about what is done
1513-
* here or in called routines.
1500+
* We only get to this routine, if DEADLOCK_TIMEOUT fired while waiting for a
1501+
* lock to be released by some other process. Check if there's a deadlock; if
1502+
* not, just return. (But signal ProcSleep to log a message, if
1503+
* log_lock_waits is true.) If we have a real deadlock, remove ourselves from
1504+
* the lock's wait queue and signal an error to ProcSleep.
15141505
*/
1515-
void
1506+
static void
15161507
CheckDeadLock(void)
15171508
{
15181509
int i;
@@ -1570,12 +1561,6 @@ CheckDeadLock(void)
15701561
Assert(MyProc->waitLock != NULL);
15711562
RemoveFromWaitQueue(MyProc, LockTagHashCode(&(MyProc->waitLock->tag)));
15721563

1573-
/*
1574-
* Unlock my semaphore so that the interrupted ProcSleep() call can
1575-
* finish.
1576-
*/
1577-
PGSemaphoreUnlock(&MyProc->sem);
1578-
15791564
/*
15801565
* We're done here. Transaction abort caused by the error that
15811566
* ProcSleep will raise will cause any other locks we hold to be
@@ -1587,19 +1572,6 @@ CheckDeadLock(void)
15871572
* RemoveFromWaitQueue took care of waking up any such processes.
15881573
*/
15891574
}
1590-
else if (log_lock_waits || deadlock_state == DS_BLOCKED_BY_AUTOVACUUM)
1591-
{
1592-
/*
1593-
* Unlock my semaphore so that the interrupted ProcSleep() call can
1594-
* print the log message (we daren't do it here because we are inside
1595-
* a signal handler). It will then sleep again until someone releases
1596-
* the lock.
1597-
*
1598-
* If blocked by autovacuum, this wakeup will enable ProcSleep to send
1599-
* the canceling signal to the autovacuum worker.
1600-
*/
1601-
PGSemaphoreUnlock(&MyProc->sem);
1602-
}
16031575

16041576
/*
16051577
* And release locks. We do this in reverse order for two reasons: (1)
@@ -1613,22 +1585,39 @@ CheckDeadLock(void)
16131585
LWLockRelease(LockHashPartitionLockByIndex(i));
16141586
}
16151587

1588+
/*
1589+
* CheckDeadLockAlert - Handle the expiry of deadlock_timeout.
1590+
*
1591+
* NB: Runs inside a signal handler, be careful.
1592+
*/
1593+
void
1594+
CheckDeadLockAlert(void)
1595+
{
1596+
int save_errno = errno;
1597+
1598+
got_deadlock_timeout = true;
1599+
/*
1600+
* Have to set the latch again, even if handle_sig_alarm already did. Back
1601+
* then got_deadlock_timeout wasn't yet set... It's unlikely that this
1602+
* ever would be a problem, but setting a set latch again is cheap.
1603+
*/
1604+
SetLatch(MyLatch);
1605+
errno = save_errno;
1606+
}
16161607

16171608
/*
16181609
* ProcWaitForSignal - wait for a signal from another backend.
16191610
*
1620-
* This can share the semaphore normally used for waiting for locks,
1621-
* since a backend could never be waiting for a lock and a signal at
1622-
* the same time. As with locks, it's OK if the signal arrives just
1623-
* before we actually reach the waiting state. Also as with locks,
1624-
* it's necessary that the caller be robust against bogus wakeups:
1625-
* always check that the desired state has occurred, and wait again
1626-
* if not. This copes with possible "leftover" wakeups.
1611+
* As this uses the generic process latch the caller has to be robust against
1612+
* unrelated wakeups: Always check that the desired state has occurred, and
1613+
* wait again if not.
16271614
*/
16281615
void
16291616
ProcWaitForSignal(void)
16301617
{
1631-
PGSemaphoreLock(&MyProc->sem, true);
1618+
WaitLatch(MyLatch, WL_LATCH_SET, 0);
1619+
ResetLatch(MyLatch);
1620+
CHECK_FOR_INTERRUPTS();
16321621
}
16331622

16341623
/*
@@ -1664,5 +1653,7 @@ ProcSendSignal(int pid)
16641653
proc = BackendPidGetProc(pid);
16651654

16661655
if (proc != NULL)
1667-
PGSemaphoreUnlock(&proc->sem);
1656+
{
1657+
SetLatch(&proc->procLatch);
1658+
}
16681659
}

src/backend/utils/init/postinit.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -578,7 +578,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
578578
*/
579579
if (!bootstrap)
580580
{
581-
RegisterTimeout(DEADLOCK_TIMEOUT, CheckDeadLock);
581+
RegisterTimeout(DEADLOCK_TIMEOUT, CheckDeadLockAlert);
582582
RegisterTimeout(STATEMENT_TIMEOUT, StatementTimeoutHandler);
583583
RegisterTimeout(LOCK_TIMEOUT, LockTimeoutHandler);
584584
}

src/include/storage/proc.h

+1-1
Original file line numberDiff line numberDiff line change
@@ -251,7 +251,7 @@ extern void ProcQueueInit(PROC_QUEUE *queue);
251251
extern int ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable);
252252
extern PGPROC *ProcWakeup(PGPROC *proc, int waitStatus);
253253
extern void ProcLockWakeup(LockMethod lockMethodTable, LOCK *lock);
254-
extern void CheckDeadLock(void);
254+
extern void CheckDeadLockAlert(void);
255255
extern bool IsWaitingForLock(void);
256256
extern void LockErrorCleanup(void);
257257

0 commit comments

Comments
 (0)