Module 2 PPT
Module 2 PPT
Module - II
Module – II
Lesson Plan
▪ L1: Logical time – A framework for a system of logical clocks, Scalar time
▪ L2: Vector time
▪ L4: Global state and snapshot recording algorithms – System model and
definitions.
▪ Three ways to implement logical time:- scalar time, vector time and matrix
time
▪ In a system of logical clocks, every process has a logical clock that is advanced
using a set of rules.
Intuitively, this relation is analogous to the earlier than relation provided by the
physical time.
The logical local clock of a process pi and its local view of the global time are
squashed into one integer variable Ci.
2. Total Ordering
▪ Scalar clocks can be used to totally order events in a distributed
system
▪ process identifiers are linearly ordered and a tie among events with
identical scalar timestamp is broken on the basis of their process
identifiers.
▪ The lower the process identifier in the ranking, the higher the priority.
1. Isomorphism
2. Strong consistency
The system of vector clocks is strongly consistent; thus, by examining the vector
timestamp of two events, we can determine if the events are causally related
3. Event counting
Vector Time
Applications
Since vector time tracks causal dependencies exactly, it finds a wide variety of
applications.
▪ distributed debugging,
▪ implementations of causal ordering communication
▪ causal distributed shared memory,
▪ establishment of global breakpoints
▪ determining the consistency of checkpoints in optimistic recovery
Leader election algorithm
▪ An algorithm for choosing a unique process to play a particular role
(coordinator) is called an election algorithm.
▪ Afterwards, if the process that plays the role of server wishes to retire
then another election is required to choose a replacement.
▪ We say that a process calls the election if it takes an action that initiates
a particular run of the election algorithm.
● Bully algorithm
▪ If the arrived identifier is greater, then it forwards the message to its neighbour.
▪ If the arrived identifier is smaller and the receiver is not a participant, then it
substitutes its own identifier in the message and forwards it; but it does not
forward the message if it is already a participant.
▪ If, however, the received identifier is that of the receiver itself, then this process’s
identifier must be the greatest, and it becomes the coordinator.
▪ The coordinator marks itself as a non-participant once more and sends an elected
message to its neighbour, announcing its election and enclosing its identity
1. Initially, every process is marked as non-
A ring-based election in progress
participant. Any process can begin an election.
2. The starting process marks itself as participant
and place its identifier in a message to its
3 neighbour.
17
3. A process receives a message and compare
4 it with its own. If the arrived identifier is larger,
it passes on the message.
24 4. If arrived identifier is smaller and receiver is not
a participant, substitute its own identifier in the
9 message and forward if. It does not forward the
message if it is already a participant.
1 5. On forwarding of any case, the process marks
itself as a participant.
15 6. If the received identifier is that of the receiver
itself, then this process’s identifier must be the
28 24
greatest, and it becomes the coordinator.
7. The coordinator marks itself as non-participant,
set elected and sends an elected message to
i
its neighbour enclosing its ID.
8. When a process receives elected message, it
marks itself as a non-participant, sets its variable
electedi and forwards the message.
2. The bully algorithm
Process with highest id will be the coordinator
There are three types of message in this algorithm:
The process that knows it has the highest identifier can elect itself as the
coordinator simply by sending a coordinator message to all processes with
lower identifiers.
On the other hand, a process with a lower identifier can begin an election
by sending an election message to those processes that have a higher
identifier and awaiting answer messages in response.
If none arrives within time T, the process considers itself the
coordinator and sends a coordinator message to all processes with
lower identifiers announcing this.
● Eventually, all processes give up but one, and that one is the new
coordinator.
● It holds an election.
● Biggest guy” always wins and hence the name “ bully” algorithm.
The bully algorithm
ele cti on
1. The process begins an election by C
ele cti
sending an election message to these Stag e on
an swe
processes that have a higher ID and 1 p
r p p p
1 3
awaits an answer in response. 2 4
an swe
2. If none arrives within time T, the r
ele cti
on
process considers itself the coordinator ele cti ele cti C
Stag e on on
and sends coordinator message to all 2 an swe
p p p
processes with lower identifiers. p
1 2 r
3 4
3. Otherwise, it waits a further time T’
for coordinator message to arrive. If none, tim eou
t
begins another election. Stag e
3
4. If a process receives a coordinator p p p p
1 2 3
4
message, it sets its variable electedi to Eventu ally.....
coord inat
be the coordinator ID. or C
5. If a process receives an election Stag e
message, it sends back an answer 4
p p p p
message and begins another election 1 2 3 4
P5
P9
P10
P3
P18
P3
Bully Algorithm – Work out
• Pid’s 0,4,2,1,5,6,3,7, P7 was the initial coordinator and
crashed, Illustrate Bully algorithm, if P4 initiates election ,
Calculate total number of election messages and coordinator
messages
Global state and snapshot recording algorithms
▪ Recording the global state of a distributed system on-the-fly is an important paradigm
when one is interested in analyzing, testing, or verifying properties associated with
distributed execution
▪ Unfortunately, the lack of both a globally shared memory and a global clock in a
distributed system, added to the fact that message transfer delays in these systems are
finite but unpredictable, makes this problem non-trivial.
▪ The state of a process is characterized by the state of its local memory and a history of
its activity.
▪ The state of a channel is characterized by the set of messages sent along the channel
Global state and snapshot recording algorithms
▪ The global state of a distributed system is a collection of the local states of its
components
System model
▪ The system consists of a collection of n processes, p1, p2, , pn, that are
connected by channels.
▪ There is no globally shared memory and processes communicate solely by
passing messages.
▪ There is no physical global clock in the system. Message send and receive is
asynchronous.
▪ Messages are delivered reliably with finite but arbitrary time delay.
▪ The system can be described as a directed graph in which vertices represent
the processes and edges represent unidirectional communication channels.
Global state and snapshot recording algorithms
System model
▪ Let Cij denote the channel from process pi to process pj
▪ The actions performed by a process are modeled as three types of events, namely,
internal events, message send events, and message receive events.
▪ For a message mij that is sent by process pi to process pj, let send(mij) and rec(mij)
denote its send and receive events, respectively.
▪ Occurrence of events changes the states of respective processes and channels, thus
causing transitions in the global system state
Global state and snapshot recording algorithms
▪ For example, an internal event changes the state of the process at which it
occurs.
▪ A send event (or a receive event) changes the state of the process that sends
(or receives) the message and the state of the channel on which the message
is sent (or received).
▪ At any instant, the state of process pi, denoted by LSi, is a result of the
sequence of all the events executed by pi up to that instant
Global state and snapshot recording algorithms
A consistent global state
The global state of a distributed system is a collection of the local states of the
processes and the channels. Notationally, global state GS is defined as
▪ A cut is a line joining an arbitrary point on each process line that slices the
space–time diagram into a PAST and a FUTURE.
▪ All the messages that cross the cut from the PAST to the FUTURE are captured
in the corresponding channel state.
Global state and snapshot recording algorithms
Interpretation in terms of cuts
▪ If a global physical clock were available, the following simple procedure could
be used to record a consistent global snapshot of a distributed system.
▪ In this, the initiator of the snapshot collection decides a future time at which
the snapshot is to be taken and broadcasts this time to every process.
▪ All processes take their local snapshots at that instant in the global time.
▪ However, a global physical clock is not available in a distributed
system and the following two issues need to be addressed in
recording of a consistent global snapshot of a distributed system
After a site has recorded its snapshot, it sends a marker along all of its outgoing
channels before sending out any more messages.
Since channels are FIFO, a marker separates the messages in the channel into those to
be included in the snapshot (i.e., channel state or process state) from those not to be
recorded in the snapshot.
The algorithm
A process initiates snapshot collection by executing the marker sending rule by which it
records its local state and sends a marker on each outgoing channel
Snapshot algorithms for FIFO channels
Chandy–Lamport algorithm
A process executes the marker receiving rule on receiving a marker.
If the process has not yet recorded its local state, it records the state of the
channel on which the marker is received as empty and executes the marker
sending rule to record its local state Otherwise, the state of the incoming channel
on which the marker is received is recorded
The algorithm can be initiated by any process by executing the marker sending
rule.
The algorithm terminates after each process has received a marker on all of its
incoming channels.
The recorded local snapshots can be put together to create the global snapshot
Snapshot algorithms for FIFO channels
Chandy–Lamport algorithm
Termination Detection
▪ In distributed processing systems, a problem is typically solved in a distributed
manner with the cooperation of a number of processes.
▪ All messages are received correctly after an arbitrary but finite delay.
▪ Messages sent over the same communication channel may not obey the FIFO
ordering.
3. An idle process can become active only on the receipt of a message from
another process
▪ when a computation terminates, there must exist a unique process which became
idle last.
▪ When a process goes from active to idle, it issues a request to all other processes to
take a local snapshot, and also requests itself to take a local snapshot.
▪ When a process receives the request, if it agrees that the requester became idle
before itself, it grants the request by taking a local snapshot for the request.
▪ A request is said to be successful if all processes have taken a local snapshot for it.
▪ The requester or any external agent may collect all the local snapshots of a request.
Termination detection using distributed snapshots
Informal description
▪ in the recorded snapshot, all the processes are idle and there is no message in
transit to any of the processes
Termination detection using distributed snapshots
• Formal description
• The algorithm needs logical time to order the requests
• Each process i maintains an logical clock denoted by x, which
is initialized to zero at the start of the computation
• A process increments its x by one each time it becomes idle.
• A basic message sent by a process at its logical time x is of
the form B(x).
• A control message that requests processes to take local
snapshot issued by process i at its logical time x is of the form
R(x, i)
• Each process synchronizes its logical clock x loosely with the
logical clocks x’s on other processes in such a way that it is the
maximum of clock values ever received or sent in messages.
Termination detection using distributed snapshots
▪ The weight at each process is zero and the weight at the controlling agent is 1.
▪ The computation starts when the controlling agent sends a basic message to
one of the processes.
▪ Thus, the sum of weights on all the processes and on all the messages in
transit is always 1.
▪ When a process becomes passive, it sends its weight to the controlling agent
in a control message, which the controlling agent adds to its weight.
▪ The edges of the graph represent the communication channels, through which a
process sends messages to neighbouring processes in the graph.
▪ The algorithm uses a fixed spanning tree of the graph with process P0 at its root
which is responsible for termination detection
▪ Process P0 communicates with other processes to determine their states and the
messages used for this purpose are called signals.
▪ A parent node will similarly report to its parent when it has completed processing
and all of its immediate children have terminated, and so on.
▪ The root concludes that termination has occurred, if it has terminated and all of its
immediate children have also terminated
3. A spanning-tree-based termination detection algorithm
▪ The termination detection algorithm generates two waves of signals moving
inward and outward through the spanning tree.
▪ If this token wave reaches the root without discovering that termination has
occurred, the root initiates a second outward wave of repeat signals.
▪ As this repeat wave reaches leaves, the token wave gradually forms and starts
moving inward again.
• Each leaf process, after it has terminated, sends its token to its parent.
• When a parent process terminates and after it has received a token from
each of its children, it sends a token to its parent.
• This way, each process indicates to its parent process that the subtree
below it has become idle.
• The root of the tree concludes that termination has occurred, after it has
become idle and has received a token from each of its children.
3. A spanning-tree-based termination detection algorithm
• A problem with the algorithm
• After a process has sent its token to its parent, it should remain
idle.
• However, this is not the case. The problem arises when a
process after it has sent a token to its parent, receives a
message from some other process.
• Note that this message could cause the process (that has
already sent a token to its parent) to again become active.
• Hence the simple algorithm fails since the process that
indicated to its parent that it has become idle, is now active
because of the message it received from an active process.
• Hence, the root node just because it received a token from a
child, can’t conclude that all processes in the child’s subtree
have terminated.
• The algorithm has to be reworked to accommodate such
message-passing scenarios
3. A spanning-tree-based termination detection algorithm
3. A spanning-tree-based termination detection algorithm
3. A spanning-tree-based termination detection algorithm
3. A spanning-tree-based termination detection algorithm
3. A spanning-tree-based termination detection algorithm
3. A spanning-tree-based termination detection algorithm
3. A spanning-tree-based termination detection algorithm
Spanning Tree Workout
• Apply spanning tree-based termination
detection algorithm in the following scenario.
The nodes are processes 0 to 6. Leaf nodes 3,
4, 5, and 6 are each given tokens T3, T4, T5
and T6 respectively. Leaf nodes 3, 4, 5 and 6
terminate in the order, but before terminating
node 5,it sends a message to node 1