Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 941188a

Browse files
committed
Fix ordering of operations in SyncRepWakeQueue to avoid assertion failure.
Commit 14e8803 removed the locking in SyncRepWaitForLSN, but that introduced a race condition, where SyncRepWaitForLSN might see syncRepState already set to SYNC_REP_WAIT_COMPLETE, but the process was not yet removed from the queue. That tripped the assertion, that the process should no longer be in the uqeue. Reorder the operations in SyncRepWakeQueue to remove the process from the queue first, and update syncRepState only after that, and add a memory barrier in between to make sure the operations are made visible to other processes in that order. Fixes bug #14721 reported by Const Zhang. Analysis and fix by Thomas Munro. Backpatch down to 9.5, where the locking was removed. Discussion: https://www.postgresql.org/message-id/20170629023623.1480.26508%40wrigleys.postgresql.org
1 parent f1edf84 commit 941188a

File tree

1 file changed

+16
-6
lines changed

1 file changed

+16
-6
lines changed

src/backend/replication/syncrep.c

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -265,8 +265,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
265265
* WalSender has checked our LSN and has removed us from queue. Clean up
266266
* state and leave. It's OK to reset these shared memory fields without
267267
* holding SyncRepLock, because any walsenders will ignore us anyway when
268-
* we're not on the queue.
268+
* we're not on the queue. We need a read barrier to make sure we see
269+
* the changes to the queue link (this might be unnecessary without
270+
* assertions, but better safe than sorry).
269271
*/
272+
pg_read_barrier();
270273
Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));
271274
MyProc->syncRepState = SYNC_REP_NOT_WAITING;
272275
MyProc->waitLSN = 0;
@@ -791,15 +794,22 @@ SyncRepWakeQueue(bool all, int mode)
791794
offsetof(PGPROC, syncRepLinks));
792795

793796
/*
794-
* Set state to complete; see SyncRepWaitForLSN() for discussion of
795-
* the various states.
797+
* Remove thisproc from queue.
796798
*/
797-
thisproc->syncRepState = SYNC_REP_WAIT_COMPLETE;
799+
SHMQueueDelete(&(thisproc->syncRepLinks));
798800

799801
/*
800-
* Remove thisproc from queue.
802+
* SyncRepWaitForLSN() reads syncRepState without holding the lock, so
803+
* make sure that it sees the queue link being removed before the
804+
* syncRepState change.
801805
*/
802-
SHMQueueDelete(&(thisproc->syncRepLinks));
806+
pg_write_barrier();
807+
808+
/*
809+
* Set state to complete; see SyncRepWaitForLSN() for discussion of
810+
* the various states.
811+
*/
812+
thisproc->syncRepState = SYNC_REP_WAIT_COMPLETE;
803813

804814
/*
805815
* Wake only when we have set state and removed from queue.

0 commit comments

Comments
 (0)