Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit de87a08

Browse files
committed
Avoid spurious deadlocks when upgrading a tuple lock
When two (or more) transactions are waiting for transaction T1 to release a tuple-level lock, and transaction T1 upgrades its lock to a higher level, a spurious deadlock can be reported among the waiting transactions when T1 finishes. The simplest example case seems to be: T1: select id from job where name = 'a' for key share; Y: select id from job where name = 'a' for update; -- starts waiting for X Z: select id from job where name = 'a' for key share; T1: update job set name = 'b' where id = 1; Z: update job set name = 'c' where id = 1; -- starts waiting for X T1: rollback; At this point, transaction Y is rolled back on account of a deadlock: Y holds the heavyweight tuple lock and is waiting for the Xmax to be released, while Z holds part of the multixact and tries to acquire the heavyweight lock (per protocol) and goes to sleep; once X releases its part of the multixact, Z is awakened only to be put back to sleep on the heavyweight lock that Y is holding while sleeping. Kaboom. This can be avoided by having Z skip the heavyweight lock acquisition. As far as I can see, the biggest downside is that if there are multiple Z transactions, the order in which they resume after X finishes is not guaranteed. Backpatch to 9.6. The patch applies cleanly on 9.5, but the new tests don't work there (because isolationtester is not smart enough), so I'm not going to risk it. Author: Oleksii Kliukin Discussion: https://postgr.es/m/B9C9D7CD-EB94-4635-91B6-E558ACEC0EC3@hintbits.com
1 parent 3c8f8f6 commit de87a08

File tree

5 files changed

+281
-21
lines changed

5 files changed

+281
-21
lines changed

src/backend/access/heap/README.tuplock

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,16 @@ do LockTuple as well, if there is any conflict, to ensure that they don't
3636
starve out waiting exclusive-lockers. However, if there is not any active
3737
conflict for a tuple, we don't incur any extra overhead.
3838

39+
We make an exception to the above rule for those lockers that already hold
40+
some lock on a tuple and attempt to acquire a stronger one on it. In that
41+
case, we skip the LockTuple() call even when there are conflicts, provided
42+
that the target tuple is being locked, updated or deleted by multiple sessions
43+
concurrently. Failing to skip the lock would risk a deadlock, e.g., between a
44+
session that was first to record its weaker lock in the tuple header and would
45+
be waiting on the LockTuple() call to upgrade to the stronger lock level, and
46+
another session that has already done LockTuple() and is waiting for the first
47+
session transaction to release its tuple header-level lock.
48+
3949
We provide four levels of tuple locking strength: SELECT FOR UPDATE obtains an
4050
exclusive lock which prevents any kind of modification of the tuple. This is
4151
the lock level that is implicitly taken by DELETE operations, and also by

src/backend/access/heap/heapam.c

Lines changed: 63 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ static void GetMultiXactIdHintBits(MultiXactId multi, uint16 *new_infomask,
9595
static TransactionId MultiXactIdGetUpdateXid(TransactionId xmax,
9696
uint16 t_infomask);
9797
static bool DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
98-
LockTupleMode lockmode);
98+
LockTupleMode lockmode, bool *current_is_member);
9999
static void MultiXactIdWait(MultiXactId multi, MultiXactStatus status, uint16 infomask,
100100
Relation rel, ItemPointer ctid, XLTW_Oper oper,
101101
int *remaining);
@@ -2547,15 +2547,20 @@ heap_delete(Relation relation, ItemPointer tid,
25472547
*/
25482548
if (infomask & HEAP_XMAX_IS_MULTI)
25492549
{
2550-
/* wait for multixact */
2550+
bool current_is_member = false;
2551+
25512552
if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
2552-
LockTupleExclusive))
2553+
LockTupleExclusive, &current_is_member))
25532554
{
25542555
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
25552556

2556-
/* acquire tuple lock, if necessary */
2557-
heap_acquire_tuplock(relation, &(tp.t_self), LockTupleExclusive,
2558-
LockWaitBlock, &have_tuple_lock);
2557+
/*
2558+
* Acquire the lock, if necessary (but skip it when we're
2559+
* requesting a lock and already have one; avoids deadlock).
2560+
*/
2561+
if (!current_is_member)
2562+
heap_acquire_tuplock(relation, &(tp.t_self), LockTupleExclusive,
2563+
LockWaitBlock, &have_tuple_lock);
25592564

25602565
/* wait for multixact */
25612566
MultiXactIdWait((MultiXactId) xwait, MultiXactStatusUpdate, infomask,
@@ -3126,15 +3131,20 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
31263131
{
31273132
TransactionId update_xact;
31283133
int remain;
3134+
bool current_is_member = false;
31293135

31303136
if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
3131-
*lockmode))
3137+
*lockmode, &current_is_member))
31323138
{
31333139
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
31343140

3135-
/* acquire tuple lock, if necessary */
3136-
heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
3137-
LockWaitBlock, &have_tuple_lock);
3141+
/*
3142+
* Acquire the lock, if necessary (but skip it when we're
3143+
* requesting a lock and already have one; avoids deadlock).
3144+
*/
3145+
if (!current_is_member)
3146+
heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
3147+
LockWaitBlock, &have_tuple_lock);
31383148

31393149
/* wait for multixact */
31403150
MultiXactIdWait((MultiXactId) xwait, mxact_status, infomask,
@@ -4028,6 +4038,7 @@ heap_lock_tuple(Relation relation, HeapTuple tuple,
40284038
uint16 infomask;
40294039
uint16 infomask2;
40304040
bool require_sleep;
4041+
bool skip_tuple_lock;
40314042
ItemPointerData t_ctid;
40324043

40334044
/* must copy state data before unlocking buffer */
@@ -4053,6 +4064,7 @@ heap_lock_tuple(Relation relation, HeapTuple tuple,
40534064
if (first_time)
40544065
{
40554066
first_time = false;
4067+
skip_tuple_lock = false;
40564068

40574069
if (infomask & HEAP_XMAX_IS_MULTI)
40584070
{
@@ -4081,6 +4093,21 @@ heap_lock_tuple(Relation relation, HeapTuple tuple,
40814093
result = TM_Ok;
40824094
goto out_unlocked;
40834095
}
4096+
else
4097+
{
4098+
/*
4099+
* Disable acquisition of the heavyweight tuple lock.
4100+
* Otherwise, when promoting a weaker lock, we might
4101+
* deadlock with another locker that has acquired the
4102+
* heavyweight tuple lock and is waiting for our
4103+
* transaction to finish.
4104+
*
4105+
* Note that in this case we still need to wait for
4106+
* the multixact if required, to avoid acquiring
4107+
* conflicting locks.
4108+
*/
4109+
skip_tuple_lock = true;
4110+
}
40844111
}
40854112

40864113
if (members)
@@ -4235,7 +4262,7 @@ heap_lock_tuple(Relation relation, HeapTuple tuple,
42354262
if (infomask & HEAP_XMAX_IS_MULTI)
42364263
{
42374264
if (!DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
4238-
mode))
4265+
mode, NULL))
42394266
{
42404267
/*
42414268
* No conflict, but if the xmax changed under us in the
@@ -4312,13 +4339,15 @@ heap_lock_tuple(Relation relation, HeapTuple tuple,
43124339
/*
43134340
* Acquire tuple lock to establish our priority for the tuple, or
43144341
* die trying. LockTuple will release us when we are next-in-line
4315-
* for the tuple. We must do this even if we are share-locking.
4342+
* for the tuple. We must do this even if we are share-locking,
4343+
* but not if we already have a weaker lock on the tuple.
43164344
*
43174345
* If we are forced to "start over" below, we keep the tuple lock;
43184346
* this arranges that we stay at the head of the line while
43194347
* rechecking tuple state.
43204348
*/
4321-
if (!heap_acquire_tuplock(relation, tid, mode, wait_policy,
4349+
if (!skip_tuple_lock &&
4350+
!heap_acquire_tuplock(relation, tid, mode, wait_policy,
43224351
&have_tuple_lock))
43234352
{
43244353
/*
@@ -6516,10 +6545,13 @@ HeapTupleGetUpdateXid(HeapTupleHeader tuple)
65166545
* tuple lock of the given strength?
65176546
*
65186547
* The passed infomask pairs up with the given multixact in the tuple header.
6548+
*
6549+
* If current_is_member is not NULL, it is set to 'true' if the current
6550+
* transaction is a member of the given multixact.
65196551
*/
65206552
static bool
65216553
DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
6522-
LockTupleMode lockmode)
6554+
LockTupleMode lockmode, bool *current_is_member)
65236555
{
65246556
int nmembers;
65256557
MultiXactMember *members;
@@ -6540,15 +6572,24 @@ DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
65406572
TransactionId memxid;
65416573
LOCKMODE memlockmode;
65426574

6543-
memlockmode = LOCKMODE_from_mxstatus(members[i].status);
6575+
if (result && (current_is_member == NULL || *current_is_member))
6576+
break;
65446577

6545-
/* ignore members that don't conflict with the lock we want */
6546-
if (!DoLockModesConflict(memlockmode, wanted))
6547-
continue;
6578+
memlockmode = LOCKMODE_from_mxstatus(members[i].status);
65486579

6549-
/* ignore members from current xact */
6580+
/* ignore members from current xact (but track their presence) */
65506581
memxid = members[i].xid;
65516582
if (TransactionIdIsCurrentTransactionId(memxid))
6583+
{
6584+
if (current_is_member != NULL)
6585+
*current_is_member = true;
6586+
continue;
6587+
}
6588+
else if (result)
6589+
continue;
6590+
6591+
/* ignore members that don't conflict with the lock we want */
6592+
if (!DoLockModesConflict(memlockmode, wanted))
65526593
continue;
65536594

65546595
if (ISUPDATE_from_mxstatus(members[i].status))
@@ -6567,10 +6608,11 @@ DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
65676608
/*
65686609
* Whatever remains are either live lockers that conflict with our
65696610
* wanted lock, and updaters that are not aborted. Those conflict
6570-
* with what we want, so return true.
6611+
* with what we want. Set up to return true, but keep going to
6612+
* look for the current transaction among the multixact members,
6613+
* if needed.
65716614
*/
65726615
result = true;
6573-
break;
65746616
}
65756617
pfree(members);
65766618
}
Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
Parsed test spec with 3 sessions
2+
3+
starting permutation: s1_share s2_for_update s3_share s3_for_update s1_rollback s3_rollback s2_rollback
4+
step s1_share: select id from tlu_job where id = 1 for share;
5+
id
6+
7+
1
8+
step s2_for_update: select id from tlu_job where id = 1 for update; <waiting ...>
9+
step s3_share: select id from tlu_job where id = 1 for share;
10+
id
11+
12+
1
13+
step s3_for_update: select id from tlu_job where id = 1 for update; <waiting ...>
14+
step s1_rollback: rollback;
15+
step s3_for_update: <... completed>
16+
id
17+
18+
1
19+
step s3_rollback: rollback;
20+
step s2_for_update: <... completed>
21+
id
22+
23+
1
24+
step s2_rollback: rollback;
25+
26+
starting permutation: s1_keyshare s2_for_update s3_keyshare s1_update s3_update s1_rollback s3_rollback s2_rollback
27+
step s1_keyshare: select id from tlu_job where id = 1 for key share;
28+
id
29+
30+
1
31+
step s2_for_update: select id from tlu_job where id = 1 for update; <waiting ...>
32+
step s3_keyshare: select id from tlu_job where id = 1 for key share;
33+
id
34+
35+
1
36+
step s1_update: update tlu_job set name = 'b' where id = 1;
37+
step s3_update: update tlu_job set name = 'c' where id = 1; <waiting ...>
38+
step s1_rollback: rollback;
39+
step s3_update: <... completed>
40+
step s3_rollback: rollback;
41+
step s2_for_update: <... completed>
42+
id
43+
44+
1
45+
step s2_rollback: rollback;
46+
47+
starting permutation: s1_keyshare s2_for_update s3_keyshare s1_update s3_update s1_commit s3_rollback s2_rollback
48+
step s1_keyshare: select id from tlu_job where id = 1 for key share;
49+
id
50+
51+
1
52+
step s2_for_update: select id from tlu_job where id = 1 for update; <waiting ...>
53+
step s3_keyshare: select id from tlu_job where id = 1 for key share;
54+
id
55+
56+
1
57+
step s1_update: update tlu_job set name = 'b' where id = 1;
58+
step s3_update: update tlu_job set name = 'c' where id = 1; <waiting ...>
59+
step s1_commit: commit;
60+
step s3_update: <... completed>
61+
step s3_rollback: rollback;
62+
step s2_for_update: <... completed>
63+
id
64+
65+
1
66+
step s2_rollback: rollback;
67+
68+
starting permutation: s1_keyshare s2_for_update s3_keyshare s3_delete s1_rollback s3_rollback s2_rollback
69+
step s1_keyshare: select id from tlu_job where id = 1 for key share;
70+
id
71+
72+
1
73+
step s2_for_update: select id from tlu_job where id = 1 for update; <waiting ...>
74+
step s3_keyshare: select id from tlu_job where id = 1 for key share;
75+
id
76+
77+
1
78+
step s3_delete: delete from tlu_job where id = 1; <waiting ...>
79+
step s1_rollback: rollback;
80+
step s3_delete: <... completed>
81+
step s3_rollback: rollback;
82+
step s2_for_update: <... completed>
83+
id
84+
85+
1
86+
step s2_rollback: rollback;
87+
88+
starting permutation: s1_keyshare s2_for_update s3_keyshare s3_delete s1_rollback s3_commit s2_rollback
89+
step s1_keyshare: select id from tlu_job where id = 1 for key share;
90+
id
91+
92+
1
93+
step s2_for_update: select id from tlu_job where id = 1 for update; <waiting ...>
94+
step s3_keyshare: select id from tlu_job where id = 1 for key share;
95+
id
96+
97+
1
98+
step s3_delete: delete from tlu_job where id = 1; <waiting ...>
99+
step s1_rollback: rollback;
100+
step s3_delete: <... completed>
101+
step s3_commit: commit;
102+
step s2_for_update: <... completed>
103+
id
104+
105+
step s2_rollback: rollback;
106+
107+
starting permutation: s1_share s2_for_update s3_for_update s1_rollback s2_rollback s3_rollback
108+
step s1_share: select id from tlu_job where id = 1 for share;
109+
id
110+
111+
1
112+
step s2_for_update: select id from tlu_job where id = 1 for update; <waiting ...>
113+
step s3_for_update: select id from tlu_job where id = 1 for update; <waiting ...>
114+
step s1_rollback: rollback;
115+
step s2_for_update: <... completed>
116+
id
117+
118+
1
119+
step s2_rollback: rollback;
120+
step s3_for_update: <... completed>
121+
id
122+
123+
1
124+
step s3_rollback: rollback;
125+
126+
starting permutation: s1_share s2_update s3_update s1_rollback s2_rollback s3_rollback
127+
step s1_share: select id from tlu_job where id = 1 for share;
128+
id
129+
130+
1
131+
step s2_update: update tlu_job set name = 'b' where id = 1; <waiting ...>
132+
step s3_update: update tlu_job set name = 'c' where id = 1; <waiting ...>
133+
step s1_rollback: rollback;
134+
step s2_update: <... completed>
135+
step s2_rollback: rollback;
136+
step s3_update: <... completed>
137+
step s3_rollback: rollback;
138+
139+
starting permutation: s1_share s2_delete s3_delete s1_rollback s2_rollback s3_rollback
140+
step s1_share: select id from tlu_job where id = 1 for share;
141+
id
142+
143+
1
144+
step s2_delete: delete from tlu_job where id = 1; <waiting ...>
145+
step s3_delete: delete from tlu_job where id = 1; <waiting ...>
146+
step s1_rollback: rollback;
147+
step s2_delete: <... completed>
148+
step s2_rollback: rollback;
149+
step s3_delete: <... completed>
150+
step s3_rollback: rollback;

src/test/isolation/isolation_schedule

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ test: reindex-concurrently
4949
test: propagate-lock-delete
5050
test: tuplelock-conflict
5151
test: tuplelock-update
52+
test: tuplelock-upgrade-no-deadlock
5253
test: freeze-the-dead
5354
test: nowait
5455
test: nowait-2

0 commit comments

Comments
 (0)