@@ -586,6 +586,69 @@ The caller can then send a cancellation signal. This implements the
586
586
principle that autovacuum has a low locking priority (eg it must not block
587
587
DDL on the table).
588
588
589
+ Group Locking
590
+ -------------
591
+
592
+ As if all of that weren't already complicated enough, PostgreSQL now supports
593
+ parallelism (see src/backend/access/transam/README.parallel), which means that
594
+ we might need to resolve deadlocks that occur between gangs of related processes
595
+ rather than individual processes. This doesn't change the basic deadlock
596
+ detection algorithm very much, but it makes the bookkeeping more complicated.
597
+
598
+ We choose to regard locks held by processes in the same parallel group as
599
+ non-conflicting. This means that two processes in a parallel group can hold
600
+ a self-exclusive lock on the same relation at the same time, or one process
601
+ can acquire an AccessShareLock while the other already holds AccessExclusiveLock.
602
+ This might seem dangerous and could be in some cases (more on that below), but
603
+ if we didn't do this then parallel query would be extremely prone to
604
+ self-deadlock. For example, a parallel query against a relation on which the
605
+ leader had already AccessExclusiveLock would hang, because the workers would
606
+ try to lock the same relation and be blocked by the leader; yet the leader can't
607
+ finish until it receives completion indications from all workers. An undetected
608
+ deadlock results. This is far from the only scenario where such a problem
609
+ happens. The same thing will occur if the leader holds only AccessShareLock,
610
+ the worker seeks AccessShareLock, but between the time the leader attempts to
611
+ acquire the lock and the time the worker attempts to acquire it, some other
612
+ process queues up waiting for an AccessExclusiveLock. In this case, too, an
613
+ indefinite hang results.
614
+
615
+ It might seem that we could predict which locks the workers will attempt to
616
+ acquire and ensure before going parallel that those locks would be acquired
617
+ successfully. But this is very difficult to make work in a general way. For
618
+ example, a parallel worker's portion of the query plan could involve an
619
+ SQL-callable function which generates a query dynamically, and that query
620
+ might happen to hit a table on which the leader happens to hold
621
+ AccessExcusiveLock. By imposing enough restrictions on what workers can do,
622
+ we could eventually create a situation where their behavior can be adequately
623
+ restricted, but these restrictions would be fairly onerous, and even then, the
624
+ system required to decide whether the workers will succeed at acquiring the
625
+ necessary locks would be complex and possibly buggy.
626
+
627
+ So, instead, we take the approach of deciding that locks within a lock group
628
+ do not conflict. This eliminates the possibility of an undetected deadlock,
629
+ but also opens up some problem cases: if the leader and worker try to do some
630
+ operation at the same time which would ordinarily be prevented by the heavyweight
631
+ lock mechanism, undefined behavior might result. In practice, the dangers are
632
+ modest. The leader and worker share the same transaction, snapshot, and combo
633
+ CID hash, and neither can perform any DDL or, indeed, write any data at all.
634
+ Thus, for either to read a table locked exclusively by the other is safe enough.
635
+ Problems would occur if the leader initiated parallelism from a point in the
636
+ code at which it had some backend-private state that made table access from
637
+ another process unsafe, for example after calling SetReindexProcessing and
638
+ before calling ResetReindexProcessing, catastrophe could ensue, because the
639
+ worker won't have that state. Similarly, problems could occur with certain
640
+ kinds of non-relation locks, such as relation extension locks. It's no safer
641
+ for two related processes to extend the same relation at the time than for
642
+ unrelated processes to do the same. However, since parallel mode is strictly
643
+ read-only at present, neither this nor most of the similar cases can arise at
644
+ present. To allow parallel writes, we'll either need to (1) further enhance
645
+ the deadlock detector to handle those types of locks in a different way than
646
+ other types; or (2) have parallel workers use some other mutual exclusion
647
+ method for such cases; or (3) revise those cases so that they no longer use
648
+ heavyweight locking in the first place (which is not a crazy idea, given that
649
+ such lock acquisitions are not expected to deadlock and that heavyweight lock
650
+ acquisition is fairly slow anyway).
651
+
589
652
User Locks (Advisory Locks)
590
653
---------------------------
591
654
0 commit comments