Synchronization Primitives
Synchronization Primitives
Synchronization Primitives
Mutexes
The mutex primitive provides mutual exclusion for one or more data objects.
Two versions of the mutex primitive are provided: spin mutexes and sleep
mutexes.
Spin mutexes are a simple spin lock. If the lock is held by another thread when
a thread tries to acquire it, the second thread will spin waiting for the lock to
be released. Due to this spinning nature, a context switch cannot be performed
while holding a spin mutex to avoid deadlocking in the case of a thread owning
a spin lock not being executed on a CPU and all other CPUs spinning on that
lock. An exception to this is the scheduler lock, which must be held during a
context switch. As a special case, the ownership of the scheduler lock is passed
from the thread being switched out to the thread being switched in to satisfy
this requirement while still protecting the scheduler data structures. Since the
bottom half code that schedules threaded interrupts and runs non-threaded
interrupt handlers also uses spin mutexes, spin mutexes must disable
interrupts while they are held to prevent bottom half code from deadlocking
against the top half code it is interrupting on the current CPU. Disabling
interrupts while holding a spin lock has the unfortunate side effect of
increasing interrupt latency.
These two types of mutexes are similar to the Solaris spin and adaptive
mutexes. One difference from the Solaris API is that acquiring and releasing a
spin mutex uses different functions than acquiring and releasing a sleep mutex.
A difference with the Solaris implementation is that sleep mutexes are not
adaptive. Details of the Solaris mutex API and implementation can be found in
section 3.5 of [Mauro01].
Condition Variables
Condition variables provide a logical abstraction for blocking a thread while
waiting for a condition. Condition variables do not contain the actual condition
to test, instead, one locks the appropriate mutex, tests the condition, and then
blocks on the condition variable if the condition is not true. To prevent lost
wakeups, the mutex is passed in as an interlock when waiting on a condition.
Shared/Exclusive Locks
Shared/Exclusive locks, also known as sx locks, provide simple reader/writer
locks. As the name suggests, multiple threads may hold a shared lock
simultaneously, but only one thread may hold an exclusive lock. Also, if one
thread holds an exclusive lock, no threads may hold a shared lock.
FreeBSD's sx locks have some limitations not present in other reader/writer
lock implementations. First, a thread may not recursively acquire an exclusive
lock. Secondly, sx locks do not implement any sort of priority propagation.
Finally, although upgrades and downgrades of locks are implemented, they
may not block. Instead, if an upgrade cannot succeed, it returns failure, and
the programmer is required to explicitly drop its shared lock and acquire an
exclusive lock. This design was intentional to prevent programmers from
making false assumptions about a blocking upgrade function. Specifically, a
blocking upgrade must potentially release its shared lock. Also, another thread
may obtain an exclusive lock before a thread trying to perform an upgrade. For
example, if two threads are performing an upgrade on a lock at the same time.
Semaphores
Wait(S) {
While(S<=0)
;//no-op
S--;
For Signal
Signal(S){
S++;
} When one process modifies the semaphore value, no other process can
simultaneously modify that same semaphore value.
Atomic Primitives
Atomic primitives are arguably the most important tool in programming that
requires coordination between multiple threads and/or processors/cores.
There are four basic types of atomic primitives: swap, fetch and phi, compare
and swap, and load linked/store conditional. Usually these operations are
performed on values the same size as or smaller than a word (the size of the
processor's registers), though sometimes consecutive (single address)
multiword versions are provided.
The most primitive (pun not intended) is the swap operation. This operation
exchanges a value in memory with a value in a register, atomically setting the
value in memory and returning the original value in memory. This is not
actually very useful, in terms of multiprogramming, with only a single practical
use: the construction of test and set (TAS; or test and test and set - TATAS)
spinlocks. In these spinlocks, each thread attempting to enter the spinlock
spins, exchanging 1 into the spinlock value in each iteration. If 0 is returned,
the spinlock was free, and the thread now owns the spinlock; if 1 is returned,
the spinlock was already owned by another thread, and the thread attempting
to enter the spinlock must keep spinning. Ultimately, the owning thread leaves
the spinlock by setting the spinlock value to 0.
C/C++ - Code
Next up the power scale is the fetch and phi family. All members of this family
follow the same basic process: a value is atomically read from memory,
modified by the processor, and written back, with the original value returned
to the program. The modification performed can be almost anything; one of
the most useful modifications is the add operation (in this case it's called fetch
and add). The fetch and add operation is notably more useful than the swap
operation, but is still less than ideal; in addition to test and set spinlocks, fetch
and add can be used to create thread-safe counters, and spinlocks that both
preserve order and (potentially) greatly reduce cache coherency traffic.
Finally, the wild card: the double compare and swap (DCAS). In a double
compare and swap, the compare and swap operation is performed
simultaneously on the values in two memory addresses. Obviously this
provides dramatically more power than any previous operation, which only
operate on single addresses. Unfortunately, support for this primitive is
extremely rare in real-world processors, and it is typically only used by lock-
free algorithm designers that are unable to reduce their algorithm to single-
address compare and swap operations.
Ticket Lock
A ticket lock is a form of lockless inter-thread synchronization.
Overview
A ticket lock works as follows; there are two integer values which begin at 0.
The first value is the queue ticket, the second is the dequeue ticket.
When a thread arrives, it atomically obtains and then increments the queue
ticket. It then atomically compares its ticket with the dequeue ticket. If they
are the same, the thread is permitted to enter the serialised code. If they are
not the same, then another thread must already be in the serialised code and
this thread must busy-wait or yield. When a thread has comes to leave the
serialised code, it atomically increments the dequeue ticket, thus permitting
the next waiting thread to enter the serialised code.
A further drawback is that if the thread which owns the mutex fails, the entire
application halts. (This type of problem applies to all synchronization entities).
Lockless locking
Main article: Lock-free and wait-free algorithms
The first technique is the use of a special set of instructions which are
guaranteed atomic by the CPU. This generally centers on an instrument known
as Compare-and-swap. This instruction compares two variables and if they are
the same, replaces one of the variable's values with a third value.
Here the destination and comparand are compared. If they are identical, the
value in exchange is placed in destination.
The second technique is to busy-wait or yield when it is clear that the current
thread cannot be permitted to continue processing. This provides the vital
ability to defer the processing done by a thread when such processing would
violate the operations which are being serialised between threads.
Test & Test & Set Lock
In computer science, the test-and-set CPU instruction is used to
implement mutual exclusion in multiprocessor environments. Although a
correct lock can be implemented with test-and-set, it can lead to memory
contention in busy lock (caused by bus locking and cache invalidation when
test-and-set operation needs to access memory atomically).
To lower the overhead a more elaborate locking protocol test and test-and-
set is used. The main idea is not to spin in test-and-set but increase the
likelihood of successful test-and-set by using the following entry protocol to
the lock:
The entry protocol uses normal memory reads to spin, waiting for the lock to
become free. Test-and-set is only used to try to get the lock when normal
memory read says it's free. Thus the expensive atomic memory operations
happens less often than in simple spin around test-and-set.
Barriers
Race conditions can cause a program to execute in a non-deterministic fashion,
producing inconsistent results. Synchronization routines are used to remove
race conditions from a code. In certain cases all threads must have executed a
portion of code before continuing. Barrier synchronization is a technique to
achieve this. A thread executing an episode of a barrier waits for all other
threads before proceeding to the next. Therefore, when a barrier is reached,
all threads are forced to wait for the last thread to arrive. Use of barriers is
common in shared memory parallel programming.
1.Centralized Barrier
2.Tree Barrier
Centralized Barrier
Tree Barrier