Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 8d7d219

Browse files
author
Amit Kapila
committed
Fix an undetected deadlock due to apply worker.
The apply worker needs to update the state of the subscription tables to 'READY' during the synchronization phase which requires locking the corresponding subscription. The apply worker also waits for the subscription tables to reach the 'SYNCDONE' state after holding the locks on the subscription and the wait is done using WaitLatch. The 'SYNCDONE' state is changed by tablesync workers again by locking the corresponding subscription. Both the state updates use AccessShareLock mode to lock the subscription, so they can't block each other. However, a backend can simultaneously try to acquire a lock on the same subscription using AccessExclusiveLock mode to alter the subscription. Now, the backend's wait on a lock can sneak in between the apply worker and table sync worker causing deadlock. In other words, apply_worker waits for tablesync worker which waits for backend, and backend waits for apply worker. This is not detected by the deadlock detector because apply worker uses WaitLatch. The fix is to release existing locks in apply worker before it starts to wait for tablesync worker to change the state. Reported-by: Tomas Vondra Author: Shlok Kyal Reviewed-by: Amit Kapila, Peter Smith Backpatch-through: 12 Discussion: https://postgr.es/m/d291bb50-12c4-e8af-2af2-7bb9bb4d8e3e@enterprisedb.com
1 parent 90834ce commit 8d7d219

File tree

1 file changed

+15
-5
lines changed

1 file changed

+15
-5
lines changed

src/backend/replication/logical/tablesync.c

+15-5
Original file line numberDiff line numberDiff line change
@@ -541,15 +541,25 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
541541
/* Now safe to release the LWLock */
542542
LWLockRelease(LogicalRepWorkerLock);
543543

544+
if (started_tx)
545+
{
546+
/*
547+
* We must commit the existing transaction to release
548+
* the existing locks before entering a busy loop.
549+
* This is required to avoid any undetected deadlocks
550+
* due to any existing lock as deadlock detector won't
551+
* be able to detect the waits on the latch.
552+
*/
553+
CommitTransactionCommand();
554+
pgstat_report_stat(false);
555+
}
556+
544557
/*
545558
* Enter busy loop and wait for synchronization worker to
546559
* reach expected state (or die trying).
547560
*/
548-
if (!started_tx)
549-
{
550-
StartTransactionCommand();
551-
started_tx = true;
552-
}
561+
StartTransactionCommand();
562+
started_tx = true;
553563

554564
wait_for_relation_state_change(rstate->relid,
555565
SUBREL_STATE_SYNCDONE);

0 commit comments

Comments
 (0)