Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit ff423db

Browse files
committed
Adding edges for all locked objects in deadlock detection.
And more accurate description of current implementation drawbacks.
1 parent af0374f commit ff423db

File tree

2 files changed

+36
-20
lines changed

2 files changed

+36
-20
lines changed

pg_shardman--0.0.2.sql

Lines changed: 23 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2137,9 +2137,13 @@ create type process as (node int, pid int);
21372137
-- View to build lock graph which can be used to detect global deadlock.
21382138
-- Application_name is assumed pgfdw:$system_id:$coord_pid
21392139
-- gid is assumed $pid:$count:$sys_id:$xid:$participants_count
2140+
-- Currently we are oblivious about lock modes and report any wait -> hold edge
2141+
-- on the same object and therefore might produce false loops. Furthermore,
2142+
-- we have not idea about locking queues here. Probably it is better to use
2143+
-- pg_blocking_pids, but it seems to ignore prepared xacts.
21402144
CREATE VIEW lock_graph(wait, hold) AS
2141-
-- If xact is already prepared, we take node and pid of the coordinator.
21422145
-- local dependencies
2146+
-- If xact is already prepared, we take node and pid of the coordinator.
21432147
SELECT
21442148
ROW(shardman.get_my_id(),
21452149
wait.pid)::shardman.process,
@@ -2152,11 +2156,19 @@ CREATE VIEW lock_graph(wait, hold) AS
21522156
FROM pg_locks wait, pg_locks hold LEFT OUTER JOIN pg_prepared_xacts twopc
21532157
ON twopc.transaction=hold.transactionid
21542158
WHERE
2155-
NOT wait.granted AND wait.pid IS NOT NULL AND hold.granted
2156-
-- this select captures waitings on xid and on, hm, tuples
2157-
AND (wait.transactionid=hold.transactionid OR
2158-
(wait.page=hold.page AND wait.tuple=hold.tuple))
2159-
AND (hold.pid IS NOT NULL OR twopc.gid IS NOT NULL) -- ???
2159+
NOT wait.granted AND wait.pid IS NOT NULL AND hold.granted AND
2160+
-- waiter waits for the the object holder locks
2161+
wait.database IS NOT DISTINCT FROM hold.database AND
2162+
wait.relation IS NOT DISTINCT FROM hold.relation AND
2163+
wait.page IS NOT DISTINCT FROM hold.page AND
2164+
wait.tuple IS NOT DISTINCT FROM hold.tuple AND
2165+
wait.virtualxid IS NOT DISTINCT FROM hold.virtualxid AND
2166+
wait.transactionid IS NOT DISTINCT FROM hold.transactionid AND -- waiting on xid
2167+
wait.classid IS NOT DISTINCT FROM hold.classid AND
2168+
wait.objid IS NOT DISTINCT FROM hold.objid AND
2169+
wait.objsubid IS NOT DISTINCT FROM hold.objsubid AND
2170+
-- this is most probably truism, but who knows
2171+
(hold.pid IS NOT NULL OR twopc.gid IS NOT NULL)
21602172
UNION ALL
21612173
-- if this fdw backend is busy, potentially waiting, add edge coordinator -> fdw
21622174
SELECT ROW(shardman.get_node_by_sysid(split_part(application_name, ':', 2)::bigint),
@@ -2276,7 +2288,7 @@ BEGIN
22762288
THEN
22772289
IF clock_timestamp() > failure_timestamp + rm_node_timeout_sec * interval '1 sec'
22782290
THEN
2279-
RAISE NOTICE 'Removing node % because of % timeout expiration', failed_node_id, rm_node_timeout_sec;
2291+
RAISE NOTICE 'Removing node % because of % sec timeout expiration', failed_node_id, rm_node_timeout_sec;
22802292
PERFORM shardman.broadcast(format('0:SELECT shardman.rm_node(%s, force=>true);', failed_node_id));
22812293
PERFORM shardman.broadcast('0:SELECT shardman.recover_xacts();');
22822294
failed_node_id := null;
@@ -2304,7 +2316,10 @@ BEGIN
23042316
AND loop_end - loop_begin = prev_loop_end - prev_loop_begin
23052317
AND deadlock_path[loop_begin:loop_end] = prev_deadlock_path[prev_loop_begin:prev_loop_end]
23062318
THEN
2307-
-- Try to cancel random node in loop
2319+
-- Try to cancel random node in loop.
2320+
-- If the victim is not executing active query at the moment,
2321+
-- pg_cancel_backend can't do anything with xact; because of that,
2322+
-- we probably need to repeat it several times
23082323
victim := deadlock_path[loop_begin + ((loop_end - loop_begin)*random())::integer];
23092324
RAISE NOTICE 'Detect deadlock: cancel process % at node %', victim.pid, victim.node;
23102325
PERFORM shardman.broadcast(format('%s:SELECT pg_cancel_backend(%s);',

readme.md

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -592,15 +592,15 @@ monitor(check_timeout_sec int = 5, rm_node_timeout_sec int = 60)
592592
Monitor cluster for presence of distributed deadlocks and node failures. This
593593
function is intended to be executed at shardlord and is redirected to shardlord
594594
been launched at any other node. It starts infinite loop which polls all
595-
clusters nodes, collecting local *lock graphs* from all nodes. Period of poll
596-
is specified by `check_timeout_sec` parameter (default value is 5 seconds).
597-
Local lock graphs are combined into global lock graph which is analyzed for the
595+
clusters nodes, collecting local *lock graphs* from all nodes. Period of poll is
596+
specified by `check_timeout_sec` parameter (default value is 5 seconds). Local
597+
lock graphs are combined into global lock graph which is analyzed for the
598598
presence of loops. A loop in the lock graph means distributed deadlock. Monitor
599599
function tries to resolve deadlock by canceling one or more backends involved in
600600
the deadlock loop (using `pg_cancel_backend` function, which doesn't actually
601-
terminate backend but tries to cancel current query). As far as not all
602-
backends are blocked in active query state, it may be needed send cancel several
603-
times. Canceled backend is randomly chosen within deadlock loop.
601+
terminate backend but tries to cancel current query). Canceled backend is
602+
randomly chosen within deadlock loop. Since not all deadlock members are
603+
hanged in 'active query' state, it might be needed to send cancel several times.
604604

605605
Since local graphs collected from all nodes do not form consistent global
606606
snapshot, false postives are possible: edges in deadlock loop correspond to
@@ -627,12 +627,13 @@ if it is not available, performs voting among all nodes.
627627
```plpgsql
628628
wipe_state(drop_slots_with_fire bool DEFAULT true)
629629
```
630-
Remove unilaterally all publications, subscriptions and replication slots
631-
created on the worker node by `pg_shardman`. PostgreSQL forbids to drop
632-
replication slot with active connection; if `drop_slots_with_fire` is true, we
633-
will try to kill the walsenders before dropping the slots. Also, immediately
634-
after transaction commit set `synchronous_standby_names` GUC to empty string --
635-
this is a non-transactional action and there is a very small chance it won't be
630+
Remove unilaterally all publications, subscriptions, replication slots, foreign
631+
servers and user mappings created on the worker node by
632+
`pg_shardman`. PostgreSQL forbids to drop replication slot with active
633+
connection; if `drop_slots_with_fire` is true, we will try to kill the
634+
walsenders before dropping the slots. Also, immediately after transaction commit
635+
set `synchronous_standby_names` GUC to empty string -- this is a
636+
non-transactional action and there is a very small chance it won't be
636637
completed. You probably want to run it before `DROP EXTENSION pg_shardman`.
637638
Data is not touched by this command.
638639

0 commit comments

Comments
 (0)