Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 1d99399

Browse files
committed
Squashed 'contrib/pg_shardman/' changes from 4c97bed..33558f1
33558f1 Merge branch 'broadcast' of https://git.postgrespro.ru/a.sher/pg_shardman into broadcast 1f303bc Document monitor() function 8b7fc16 Merge commit 'c6d60dfeee27b81d5d6e4e0fbce7c6d9e4c205b3' into PGPROEE10_pg_shardman ece4932 Previous commit fix. git-subtree-dir: contrib/pg_shardman git-subtree-split: 33558f1
1 parent c6d60df commit 1d99399

File tree

2 files changed

+28
-4
lines changed

2 files changed

+28
-4
lines changed

Makefile

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,13 @@ OBJS = pg_shardman.o
1111
PGFILEDESC = "pg_shardman - sharding for Postgres"
1212

1313
mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST))) # abs path to this makefile
14-
mkfile_dir := $(shell basename $(dir $(mkfile_path))) # parent dir of the project
14+
mkfile_dir := $(shell basename $(shell dirname $(dir $(mkfile_path)))) # parent dir of the project
1515
ifndef USE_PGXS # hmm, user didn't requested to use pgxs
16-
ifneq ($(mkfile_dir),contrib) # a-ha, but we are not inside 'contrib' dir
17-
USE_PGXS := 1 # so use it anyway, most probably that's use user wants
16+
ifneq ($(strip $(mkfile_dir)),contrib) # a-ha, but we are not inside 'contrib' dir
17+
USE_PGXS := 1 # so use it anyway, most probably that's what the user wants
1818
endif
1919
endif
20+
$(info $$USE_PGXS is [${USE_PGXS}] (we use it automatically if not in contrib dir))
2021

2122
ifdef USE_PGXS # use pgxs
2223
# You can specify path to pg_config in PG_CONFIG var

readme.md

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -470,6 +470,30 @@ recovery()
470470
Check consistency of cluster state against current metadata and perform recovery,
471471
if needed (reconfigure LR channels, repair FDW, etc).
472472

473+
```plpgsql
474+
monitor(deadlock_check_timeout_sec int = 5, rm_node_timeout_sec int = 60)
475+
```
476+
Monitor cluster for presence of distributed deadlocks and node failures.
477+
This function is intended to be executed at shardlord and is redirected to shardlord been launched at any other node.
478+
It starts infinite loop which polls all clusters nodes, collecting local *lock graphs* from all nodes.
479+
Period of poll is specified by `deadlock_check_timeout_sec` parameter (default value is 5 seconds).
480+
Local lock graphs are combined into global lock graph which is analyzed for presence of loops.
481+
A loop in lock graph means distributed deadlock. Monitor function tries to resolve deadlock by canceling one or more backends
482+
involved in the deadlock loop (using `pg_cancel_backend` function, which is not actually terminate backend but tries to cancel current query).
483+
As far as not all backends are blocked in active query state, it may be needed send cancel several times.
484+
Right now canceled backend is randomly chosen within
485+
deadlock loop.
486+
487+
Local local graphs collected from all nodes do not form consistent global snapshot, so there is possibility of false deadlocks:
488+
edges in deadlock loop correspond to different moment of times. To prevent false deadlock detection, monitor function
489+
doesn't react on detected deadlock immediately. Instead of it, previous deadlock loop located at previous iteration is compared with current deadlock
490+
loop and only if them are equal, then deadlock is reported and recovery is performed.
491+
492+
If some node is unreachable then monitor function prints correspondent error message and retries access until
493+
`rm_node_timeout_sec` timeout expiration. After it node is removed from the cluster using `shardman.rm_node` function.
494+
If redundancy level is non-zero, then primary partitions from the disabled node are replaced with replicas.
495+
496+
473497
## Transactions
474498
When using vanilla PostgreSQL, local changes are handled by PostgreSQL as usual
475499
-- so if you queries touch only only node, you are safe. Distributed
@@ -489,6 +513,5 @@ be made new shardlord at any moment.
489513
## Some limitations:
490514
* You should not touch `sync_standby_names` manually while using pg_shardman.
491515
* The shardlord itself can't be worker node for now.
492-
* ALTER TABLE for sharded is mostly not supported.
493516
* All [limitations of `pg_pathman`](https://github.com/postgrespro/pg_pathman/wiki/Known-limitations),
494517
e.g. we don't support global primary keys and foreign keys to sharded tables.

0 commit comments

Comments
 (0)