Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAmit Kapila2023-01-24 03:55:36 +0000
committerAmit Kapila2023-01-24 03:55:36 +0000
commit6c6d6ba3ee2c160b53f727cf8e612014b316d6e4 (patch)
treee2b20e5a6ad953cabaa6377eaba91d8d9e3ecea3 /src/backend
parent728f86fec65537eade8d9e751961782ddb527934 (diff)
Fix the Drop Database hang.
The drop database command waits for the logical replication sync worker to accept ProcSignalBarrier and the worker's slot creation waits for the drop database to finish which leads to a deadlock. This happens because the tablesync worker holds interrupts while creating a slot. We prevent cancel/die interrupts while creating a slot in the table sync worker because it is possible that before the server finishes this command, a concurrent drop subscription happens which would complete without removing this slot and that leads to the slot existing until the end of walsender. However, the slot will eventually get dropped at the walsender exit time, so there is no danger of the dangling slot. This patch reallows cancel/die interrupts while creating a slot and modifies the test to wait for slots to become zero to prevent finding an ephemeral slot. The reported hang doesn't happen in PG14 as the drop database starts to wait for ProcSignalBarrier with PG15 (commits 4eb2176318 and e2f65f4255) but it is good to backpatch this till PG14 as it is not a good idea to prevent interrupts during a network call that could block indefinitely. Reported-by: Lakshmi Narayanan Sreethar Diagnosed-by: Andres Freund Author: Hou Zhijie Reviewed-by: Vignesh C, Amit Kapila Backpatch-through: 14, where it was introduced in commit 6b67d72b60 Discussion: https://postgr.es/m/CA+kvmZELXQ4ZD3U=XCXuG3KvFgkuPoN1QrEj8c-rMRodrLOnsg@mail.gmail.com
Diffstat (limited to 'src/backend')
-rw-r--r--src/backend/replication/logical/tablesync.c7
1 files changed, 0 insertions, 7 deletions
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 4647837b823..07eea504ba8 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -1396,17 +1396,10 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
* Create a new permanent logical decoding slot. This slot will be used
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
- *
- * Prevent cancel/die interrupts while creating slot here because it is
- * possible that before the server finishes this command, a concurrent
- * drop subscription happens which would complete without removing this
- * slot leading to a dangling slot on the server.
*/
- HOLD_INTERRUPTS();
walrcv_create_slot(LogRepWorkerWalRcvConn,
slotname, false /* permanent */ , false /* two_phase */ ,
CRS_USE_SNAPSHOT, origin_startpos);
- RESUME_INTERRUPTS();
/*
* Setup replication origin tracking. The purpose of doing this before the