Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
36 views

Synchronization Notes

This document discusses synchronization and coordination in distributed systems. Synchronization refers to ordering events in time, while coordination allows processes to agree on global state values and actions. The document examines algorithms for synchronizing physical clocks, such as Cristian's algorithm and NTP, as well as logical clocks like Lamport clocks which define causality relationships between distributed events. Logical clocks are useful when relative ordering is more important than physical time.

Uploaded by

Sudha Patel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Synchronization Notes

This document discusses synchronization and coordination in distributed systems. Synchronization refers to ordering events in time, while coordination allows processes to agree on global state values and actions. The document examines algorithms for synchronizing physical clocks, such as Cristian's algorithm and NTP, as well as logical clocks like Lamport clocks which define causality relationships between distributed events. Logical clocks are useful when relative ordering is more important than physical time.

Uploaded by

Sudha Patel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Synchronisation & Coordination

This lecture deals with one of the fundamental issues encountered when constructing a system
made up of independent communicating processes: dealing with time and making sure that pro-
cesses do the right thing at the right time. In essence this comes down to allowing processes to
synchronise and coordinate their actions. Coordination refers to coordinating the actions of sepa-
rate processes relative to each other and allowing them to agree on global state (such as values of a
shared variable). Synchronisation is coordination with respect to time, and refers to the ordering
of events and execution of instructions in time. Examples of synchronisation include ordering
distributed events in a log file and ensuring that a process performs an action at a particular time.
Examples of coordination include ensuring that processes agree on what actions will be performed
(e.g., money will be withdrawn from the account), who will be performing actions (e.g., which
replica will process a request), and the state of the system (e.g., the elevator is stopped).
Synchronisation and coordination play an important role in most distributed algorithms (i.e.,
algorithms intended to work in a distributed environment). In particular, some distributed algo-
rithms are used to achieve synchronisation and coordination, while others assume the presence
of synchronisation or coordination mechanisms. Discussions of distributed algorithms generally
assume one of two timing models for distributed systems. The first is a synchronous model, where
the time to perform all actions, communication delay, and clock drift on all nodes, are bounded.
In asynchronous distributed systems there are no such bounds. Most real distributed systems are
asynchronous, however, it is easier to design distributed algorithms for synchronous distributed
systems. Algorithms for asynchronous systems are always valid on synchronous systems, however,
the converse is not true.

Time & Clocks


As mentioned, time is an important concept when dealing with synchronisation and coordination.
In particular it is often important to know when events occurred and in what order they occurred.
In a nondistributed system dealing with time is trivial as there is a single shared clock. All
processes see the same time. In a distributed system, on the other hand, each computer has its
own clock. Because no clock is perfect each of these clocks has its own skew which causes clocks
on different computers to drift and eventually become out of sync.
There are several notions of time that are relevant in a distributed system. First of all, internally
a computer clock simply keeps track of ticks that can be translated into physical time (hours,
minutes, seconds, etc.). This physical time can be global or local. Global time is a universal time
that is the same for everyone and is generally based on some form of absolute time.1 Currently
Coordinated Universal Time (UTC), which is based on oscillations of the Cesium-133 atom, is the
most accurate global time. Besides global time, processes can also consider local time. In this case
the time is only relevant to the processes taking part in the distributed system (or algorithm).
This time may be based on physical or logical clocks (which we will discuss later).
1 Although Einstein’s special relativity theory shows that time is relative and there is, therefore, no absolute

time, for our purposes (and at the worldwide scale) we can safely assume that such an absolute time does exist.

1
Physical Clocks
Physical clocks keep track of physical time. In distributed systems that rely on actual time it
is necessary to keep individual computer clocks synchronised. The clocks can be synchronised
to global time (external synchronisation), or to each other (internal synchronisation). Cristian’s
algorithm and the Network Time Protocol (NTP) are examples of algorithms developed to syn-
chronise clocks to an external global time source (usually UTC). The Berkeley Algorithm is an
example of an algorithm that allows clocks to be synchronised internally.
Cristian’s algorithm requires clients to periodically synchronise with a central time server (typ-
ically a server with a UTC receiver). One of the problems encountered when synchronising clocks
in a distributed system is that unpredictable communication latencies can affect the synchronisa-
tion. For example, when a client requests the current time from the time server, by the time the
server’s reply reaches the client the time will have changed. The client must, therefore, determine
what the communication latency was and adjust the server’s response accordingly. Cristian’s al-
gorithm deals with this problem by attempting to calculate the communication delay based on
the time elapsed between sending a request and receiving a reply.
The Network Time Protocol is similar to Cristian’s algorithm in that synchronisation is also
performed using time servers and an attempt is made to correct for communication latencies.
Unlike Cristian’s algorithm, however, NTP is not centralised and is designed to work on a wide-
area scale. As such, the calculation of delay is somewhat more complicated. Furthermore, NTP
provides a hierarchy of time servers, with only the top layer containing UTC clocks. The NTP
algorithm allows client-server and peer-to-peer (mostly between time servers) synchronisation. It
also allows clients and servers to determine the most reliable servers to synchronise with. NTP
typically provides accuracies between 1 and 50 msec depending on whether communication is over
a LAN or WAN.
Unlike the previous two algorithms, the Berkeley algorithm does not synchronise to a global
time. Instead, in this algorithm, a time server polls the clients to determine the average of
everyone’s time. The server then instructs all clients to set their clocks to this new average time.
Note that in all the above algorithms a clock should never be set backward. If time needs to be
adjusted backward, clocks are simply slowed down until time ’catches up’.

Logical Clocks
For many applications, the relative ordering of events is more important than actual physical time.
In a single process the ordering of events (e.g., state changes) is trivial. In a distributed system,
however, besides local ordering of events, all processes must also agree on ordering of causally
related events (e.g., sending and receiving of a single message). Given a system consisting of N
processes pi , i ∈ {1, . . . , N }, we define the local event ordering →i as a binary relation, such that,
if pi observes e before e′ , we have e →i e′ . Based on this local ordering, we define a global ordering
as a happened before relation →, as proposed by Lamport [Lam78]: The relation → is the smallest
relation, such that
1. e →i e′ implies e → e′ ,
2. for every message m, send (m) → receive(m), and
3. e → e′ and e′ → e′′ implies e → e′′ (transitivity).
The relation → is almost a partial order (it lacks reflexivity). If a → b, then we say a causally
affects b. We consider unordered events to be concurrent if they are unordered; i.e.,

a 6→ b and b 6→ a implies a k b.

As an example, consider Figure 1. We have the following causal relations:

E11 → E12 , E13 , E14 , E23 , E24 , . . .


E21 → E22 , E23 , E24 , E13 , E14 , . . .

2
E11 E12 E13 E14
P1

P2
E21 E22 E23 E24
Real Time

Figure 1: Example of event ordering

Moreover, the following events are concurrent: E11 kE21 , E12 kE22 , E13 kE23 , E11 kE22 , E13 kE24 ,
E14 kE23 , and so on.

Lamport Clocks
Lamport’s logical clocks can be implemented as a software counter that locally computes the
happened-before relation →. This means that each process pi maintains a logical clock Li . Given
such a clock, Li (e) denotes a Lamport timestamp of event e at pi and L(e) denotes a timestamp
of event e at the process it occurred at. Processes now proceed as follows:
1. Before time stamping a local event, a process pi executes Li := Li + 1.
2. Whenever a message m is sent from pi to pj :
• Process pi executes Li := Li + 1 and sends the new Li with m.
• Process pj receives Li with m and executes Lj := max(Lj , Li ) + 1. receive(m) is
annotated with the new Lj .
In this scheme, a → b implies L(a) < L(b), but L(a) < L(b) does not necessarily imply a → b.
As an example, consider Figure 2. In this figure E12 → E23 and L1 (E12 ) < L2 (E23 ) (i.e., 2 < 3),
however we also have E13 6→ E24 while L1 (E13 ) < L2 (E24 ) (i.e., 3 < 4).

E11 E12 E13 E14 E15 E16 E17


P1
1 2 3 4 5 6 7

1 2 3 4 7
P2
E21 E22 E23 E24 E25
Real Time

Figure 2: Example of the use of a Lamport’s clocks

In some situations (e.g., to implement distributed locks), a partial ordering on events is not
sufficient and a total ordering is required. In these cases, the partial ordering can be completed
to total ordering by including process identifiers. Given local time stamps Li (e) and Lj (e′ ), we
define global time stamps hLi (e), ii and hLj (e), ji. We, then, use standard lexicographical ordering,
where hLi (e), ii < hLj (e), ji iff Li (e) < Lj (e), or Li (e) = Lj (e) and i < j.

3
Vector Clocks

E11 E12
P1
1 2
E21 E22
P2
1 3
E31 E32 E33
P3
1 2 3
Real Time

Figure 3: Example of the lack of causality with Lamport’s clocks

The main shortcoming of Lamport’s clocks is that L(a) < L(b) does not imply a → b; hence,
we cannot deduce causal dependencies from time stamps. For example, in Figure 3, we have
L1 (E11 ) < L3 (E33 ), but E11 6→ E33 . The root of the problem is that clocks advance independently
or via messages, but there is no history as to where advance comes from.
This problem can be solved by moving from scalar clocks to vector clocks, where each process
maintains a vector clock Vi . Vi is a vector of size N , where N is the number of processes. The
component Vi [j] contains the process pi ’s knowledge about pj ’s clock. Initially, we have Vi [j] := 0
for i, j ∈ {1, . . . , N }. Clocks are advanced as follows:
1. Before pi timestamps an event, it executes Vi [i] := Vi [i] + 1.
2. Whenever a message m is sent from pi to pj :
• Process pi executes Vi [i] := Vi [i] + 1 and sends Vi with m.
• Process pj receives Vi with m and merges the vector clocks Vi and Vj as follows:

max(Vj [k], Vi [k]) + 1 , if j = k (as in scalar clocks)
Vj [k] :=
max(Vj [k], Vi [k]) , otherwise.

This last part ensures that everything that subsequently happens at pj is now causally
related to everything that previously happened at pi .
Under this scheme, we have, for all i, j, Vi [i] ≥ Vj [i] (i.e., pi always has the most up-to-date
version of its own clock); moreover, a → b iff V (a) < V (b), where
• V = V ′ iff V [i] = V ′ [i] for all i ∈ {1, . . . , N },
• V ≥ V ′ iff V [i] ≥ V ′ [i] for all i ∈ {1, . . . , N },
• V > V ′ iff V ≥ V ′ ∧ V 6= V ′ ; and
• V kV ′ iff V 6> V ′ ∧ V ′ 6> V
For example, consider the annotations at the diagram in Figure 4. Each event is annotated
with both its vector clock value (the triple) and the corresponding value of a scalar Lamport
clock. For L1 (E12 ) and L3 (E32 ), we have 2 = 2 versus (2, 0, 0) 6= (0, 0, 2). Likewise we have
L2 (E24 ) > L3 (E32 ) but (2, 4, 1) 6> (0, 0, 2) and thus E32 6→ E24 .

4
E11 1 2 E12 E13 6
P1
(1,0,0) (2,0,0) (3,4,1)
E21 1 3 E22 E23 4 E24 5
P2
(0,1,0) (2,2,0) (2,3,1) (2,4,1)
1 E31 E32 2
P3
(0,0,1) (0,0,2)
Real Time

Figure 4: Example contrasting vector and scalar clock annotations

Global State
Determining global properties in a distributed system is often difficult, but crucial for some appli-
cations. For example, in distributed garbage collection, we need to be able to determine for some
object whether it is referenced by any other objects in the system. Deadlock detection requires
detection of cycles of processes infinitely waiting for each other. To detect the termination of a
distributed algorithm we need to obtain simultaneous knowledge of all involved process as well as
take account of messages that may still traverse the network. In other words, it is not sufficient
to check the activity of all processes. Even if all processes appear to be passive, there may be
messages in transition that, upon arrival, trigger further activity.
In the following, we are concerned with determining stable global states or properties that,
once they occur, will not disappear without outside intervention. For example, once an object is
no longer referenced by any other object (i.e., it may be garbage collected), no reference to the
object can appear at a later time.

Consistent Cuts
To reason about the validity of global observations—i.e., observations that combine information
from multiple nodes—the notion of consistent cuts is useful. Due to the lack of global time, we
cannot simply require that all local observations must happen at the same time. As it is clear
that using the state of the individual processes at arbitrary points in time is not generally going
to result in a consistent overall picture, we need to define a criterion for determining when we
regard a collection of local states to be globally consistent.
To formalise the notion of a consistent cut, we again refer to a system of N processes pi ,
i ∈ {1, . . . , N }. Each process pi , over time, proceeds through a series events he0i , e1i , e2i , . . .i, which
we call pi ’s history denoted by hi . This series may be finite or infinite. In any case, we denote by
hki a k-prefix of hi (history of pi up to and including event eki ). Each event eji , as before, is either
a local event or a communication event (e.g., sending or receiving of a message).
We denote the state of of any process pi , immediately before event eki , as ski ; i.e., the state
recording all events included in the history hik−1 . This makes s0i refer to the initial state of pi .
Using a total event ordering, we can merge all local histories into a global history
N
[
H= hi
i=1

and, similarly, we can combine a set of local states s1 , . . . , sN into a global state S = (s1 , . . . , sN ).
This raises the question as to which combination of local states is consistent (a global state is
consistent if for any received message in the state the corresponding send is also in the state).
To answer this question, we need one more concept, namely that of a cut. Similar to the global

5
history, we can define cuts based on k-prefixes:
N
[
C= hci i
i=1

where hci i is history of pi up to and including event eci i . The cut C corresponds to the state
S = (sc11 +1 , . . . , scNn +1 ). The final events in a cut are its frontier defined as {eci i | i ∈ {1, . . . , N }}.
cut 1 cut 2

P3 s 0 r 0 s 1 r 1
3 3 3 3

P2 r 0 r 1 s 0 s 1 s 2
2 2 2 2 2

0
P1 s
1 r 0 r 1
1 1

Figure 5: A consistent and inconsistent cut

We call a cut consistent iff for all events e′ ∈ C, e → e′ implies e ∈ C (i.e., all events that
happened before are also in the cut). A global state is consistent if it corresponds to a consistent
cut. As a result, we can characterise the execution of a system as a sequence of consistent global
states S0 → S1 → S2 → · · ·. Figure 5 displays both a consistent cut (labeled “cut 1”) and an
inconsistent cut (labeled “cut 2”). For the inconsistent cut note that the event that happened
before r11 (i.e., s13 ) is not part of the cut.
A global history that is consistent with the happened-before relation → is also called a lineari-
sation or consistent run. A linearisation only passes through consistent global states. Finally, we
call a state S ′ is reachable from state S if there is a linearisation that passes thorough S and then
S ′.

Snapshots
Now that we have a precise characterisation of a consistent cut, the next question is whether
such cuts can be computed effectively. Chandy & Lamport [CL85] introduced an algorithm that
yields a snapshot of a distributed system, which embodies consistent global state and takes care
of messages that are in transit when the snapshot is being performed. The resulting snapshots are
useful for evaluating stable global properties.
Chandy & Lamport’s algorithm makes strong assumptions about the underlying infrastructure.
In particular, communication must be reliable and processes be failure-free. Furthermore, point-to-
point message delivery must be ordered and the process/channel graph must be strongly connected
(i.e, each node can communicate withe every other node). Under these assumptions, and after the
algorithm completes, each process hold a copy of its local state and a set of messages that were in
transit, with that process as their destination, during the snapshot.
The algorithm proceeds as follows: One process initiates the algorithm by recording its local
state and sending a marker message over each outgoing channel. On receipt of a marker message
over incoming channel c, a process distinguishes two cases:
1. If its local state is not yet saved, it behaves like the initiating process and saves the local
state and sends marker messages over each outgoing channel.
2. Otherwise, if its local state is already saved, it saves all messages that it received via c since
it saved its local state and until the marker arrived.

6
A process’ local contribution is complete after it has received markers on all incoming channels.
At this time, it has accumulated (a) a local state snapshot and (b), for each incoming channel,
a set of messages received after performing the local snapshot and before the marker came down
that channel.

P1
*

m3
m1

P2
*

m2

P3
*

Figure 6: Marker messages during the collection of a snapshot

Figure 6 outlines the the marker messages (doted arrows) and points where local snapshots are
taken (marked by the stars) for three processes.

Distributed Concurrency Control


Some of the issues encountered when looking at concurrency in distributed systems are familiar
from the study of operating systems and multithreaded applications. In particular dealing with
race conditions that occur when concurrent processes access shared resources. In nondistributed
system these problems are solved by implementing mutual exclusion using local primitives such
as locks, semaphores, and monitors. In distributed systems, dealing with concurrency becomes
more complicated due to the lack of directly shared resources (such as memory, CPU registers,
etc.), the lack of a global clock, the lack of a single global program state, and the presence of
communication delays.

Distributed Mutual Exclusion


When concurrent access to distributed resources is required, we need to have mechanisms to
prevent race conditions while processes are within critical sections. These mechanisms must fulfill
the following three requirements:
1. Safety: At most one process may execute the critical section at a time
2. Liveness: Requests to enter and exit the critical section eventually succeed
3. Ordering: Requests are processed in happened-before ordering

Method 1: Central Server


The simplest approach is to use a central server that controls the entering and exiting of critical
sections. Processes must send requests to enter and exit a critical section to a lock server (or
coordinator), which grants permission to enter by sending a token to the requesting process.
Upon leaving the critical section, the token is returned to the server. Processes that wish to enter
a critical section while another process is holding the token are put in a queue. When the token is
returned the process at the head of the queue is given the token and allowed to enter the critical
section.
This scheme is easy to implement, but it does not scale well due to the central authority.
Moreover, it is vulnerable to failure of the central server.

7
Method 2: Token Ring
More sophisticated is a setup that organises all processes in a logical ring structure, along which
a token message is continuously forwarded. Before entering the critical section, a process has to
wait until the token comes by and then retain the token until it exits the critical section.
A disadvantage of this approach is that the ring imposes an average delay of N/2 hops, which
again limits scalability. Moreover, the token messages consume bandwidth and failing nodes or
channels can break the ring. Another problem is that failures may cause the token to be lost. In
addition, if new processes join the network or wish to leave, further management logic is needed.

Method 3: Using Multicast and Logical Clocks


Ricart & Agrawala [RA81] proposed an algorithm for distributed mutual exclusion that makes use
of logical clocks. Each participating process pi maintains a Lamport clock and all processes must
be able to communicate pairwise. At any moment, each process is in one of three states:
1. Released: Outside of critical section
2. Wanted: Waiting to enter critical section
3. Held: Inside critical section
If a process wants to enter a critical section, it multicasts a message hLi , pi i and waits until it has
received a reply from every other process. The processes operate as follows:
• If a process is in Released state, it immediately replies to any request to enter the critical
section.
• If a process is in Held state, it delays replying until it is finished with the critical section.
• If a process is in Wanted state, it replies to a request immediately only if the requesting
timestamp is smaller than the one in its own request.
The only hurdle to scalability is the use of multicasts (i.e., all processes have to be contacted in
order to enter a critical section). More scalable variants of this algorithm require each individual
process to only contact subsets of its peers when wanting to enter a critical section. Unfortunately,
failure of any peer process can deny all other processes entry to the critical section.

Comparison of Algorithms
When comparing the three distributed mutual exclusion algorithms we focus on the number of
messages exchanged per entry/exit of the critical section, the delay that a process experiences
before being allowed to enter a critical section, and the reliability of the algorithms (that is, what
kinds of problems the algorithms face).
The centralised algorithm requires a total of 3 messages to be exchanged every time a critical
section is executed (two to enter and one to leave). After a process has requested permission to
enter a critical section it has to wait for a minimum of two messages to be exchanged (one for
the current holder to return the token to the coordinator and one for the coordinator to send the
token to the waiting process). The biggest problem this algorithm faces is that if the coordinator
crashes (or becomes otherwise unavailable) the whole algorithm fails.
For the ring algorithm, the number of messages exchanged per entry and exit of a critical
section depends on how often processes need to enter the section. The less often processes want
to enter, the longer the token will travel around the ring, and the higher the ‘cost’ (in terms of
messages exchanged) of entry into a critical section will be. With regards to delay, depending on
where the token is it will take between 0 and n − 1 messages before a process can enter the critical
section. The biggest problems faced by this algorithm are loss of the token and a crashed process
breaking the ring. It is possible to overcome the latter by providing all processes information
about the ring structure so that broken nodes can be skipped.

8
Finally, the decentralized algorithm effectively requires 2(n − 1) messages to be sent per entry
and exit of a critical section (i.e., n − 1 request messages and n − 1 replies). Likewise there is a
delay of 2(n− 1) messages before another process can enter the critical section (once again because
n − 1 requests and n − 1 replies have to be sent). With regards to reliability the decentralised
algorithm is worse than the others because the failure of any single node is enough to break the
algorithm.

References
[CL85] K. Mani Chandy and Leslie Lamport. Distributed snapshots: Determining global states
of distributed systems. ACM Transactions on Computer Systems, 3:63–75, 1985.
[Lam78] Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Com-
munications of the ACM, 21:558–565, 1978.
[RA81] G. Ricart and A. Argawala. An optimal algorithm for mutual exclusion in computer
networks. Communications of the ACM, 24(1), January 1981.

You might also like