IntroDistribuetComputing
IntroDistribuetComputing
Outline
Distributed Systems
What are they?
Why are they harder to design?
Distributed algorithms
Importance of models
Complexity measures
Some classical problems
The notion of time and ordering of events
Some interesting examples
A broad definition
A set of autonomous processes that
communicate among themselves to
perform some task
Modes of communication
Message passing
Shared memory
Includes single machine with multiple
communicating processes also
A more specific definition
A network of autonomous computers that
communicate to perform some task
Modes of communication
Message passing
Distributed shared memory
A common shared address space built over physical
memory on different machines
Partially shared memory
Each node can read and write its own memory, and read
its neighbors’ memories
4
A practical distributed system may have both
Computers that communicate by messages
Processes/threads on a computer that communicate
by messages or shared memory
5
Characterizing Distributed Systems
Multiple Autonomous Computers
each consisting of CPU’s, local memory, stable storage, I/O
paths connecting to the environment
Geographically Distributed
Interconnections
some I/O paths interconnect computers that talk to each other
Shared State
No shared memory
systems cooperate to maintain shared state
maintaining global invariants requires correct and coordinated
operation of multiple computers.
Examples of Distributed Systems
Visualization
Battle Battle
Planning Planning
Visualization
Collaborative Collaborative
Multimedia Task Clients
(Telemedicine) Server farms
P2P Communications
MSN, Skype, Social Networking Apps
Use the vast resources of machines at the edge of the Internet to build a network that
allows resource sharing without any central authority.
Why Distributed Computing?
Inherent distribution
Bridge customers, suppliers, and companies at
different sites.
Speedup - improved performance
Fault tolerance
Resource Sharing
Exploitation of special hardware
Scalability
Flexibility
Why are Distributed Systems Hard?
Scale
numeric, geographic, administrative
Loss of control over parts of the system
Unreliability of message passing
unreliable communication, insecure communication,
costly communication
Failure
Parts of the system are down or inaccessible
Independent failure is desirable
Design goals of a distributed system
Sharing
HW, SW, services, applications
Concurrency
compete vs. cooperate
Scalability
avoids centralization
Fault tolerance/availability
Transparency
location, migration, replication, failure, concurrency
Distributed Algorithms
Algorithms that run on distributed
systems to perform some desired task
Examples
Algorithms for mutual exclusion, for creating
a spanning tree of a network, for building
routing tables in the Internet, for scheduling
jobs on different machines, for
disseminating information to multiple nodes
Many many more…
14
Why are They Harder to Design?
Lack of global shared memory
No one place where the global system state
can be accessed at any point
Lack of global clock
Events cannot be started at the same time
Events cannot be ordered in time easily
Note that if we had a global shared memory,
we could build a global clock easily
15
Hard to verify and prove
Arbitrary interleaving of actions of different
processes makes the system hard to verify
Same problem is there for multi-process
programs on a single machine
Harder here due to communication delays
that introduce additional non-determinism
16
Example: Lack of Global Memory
Problem of Distributed Search
A set of elements distributed across multiple
machines (no duplicates)
Query for element X at any one machine A
A needs to search for X in the whole system
Sequential algorithm is very simple
Search done on a single array in a single
machine
No. of elements also known in a single
variable 17
A distributed algorithm has more hurdles
to solve
How to send the query to all other m/cs?
Do all machines even know all other m/cs?
How to get back the result of the search in
each m/c?
Handling updates (both add/delete of
elements at a machine and add/remove of
machines) – adds more complexity
18
Main problem
No one place (global memory) that a machine
can look up to see the current system state
(what machines, what elements, how many
elements)
19
Example: Lack of Global Clock
Problem of Distributed Replication
3 machines A, B, C have copies of a data X,
say initialized to 1
Query/Updates can happen in any m/c
Need to make the copies consistent in case
of update at any one machine
Naïve algorithm
On an update, a machine sends the updated
value to the other replicas
A replica, on receiving an update, applies it
20
X=3
3 3 3
1
X=3
1 3
3 3
X=2
X=2
2
2 2 2
2 3 3
X=2 X=2
2 2
2
But then, consider the following scenario
1 1 2 1
X=2 X=3
X=2 X=2 3
2 2 1
X=3
X=2
What should this node do now? 2
Reject X=2, right?
But it has received exactly the
same messages in the same order
(same local view)
3 3
3 3
X=2
3
X=2
3
Could be easily solved if all nodes had a
synchronized global clock
Just timestamp each event with the clock
value and order events according to
timestamps
But impossible to perfectly synchronize
clocks in multiple machines
Message delays cannot be estimated exactly
23
Classifying Distributed Systems
Based on degree of synchrony
Synchronous
Asynchronous
Partially synchronous
Fault model
Crash failures
Byzantine failures
24
Distributed Systems
Models for Distributed Algorithms
Informally, guarantees that one can assume
the underlying system will give
Topology
Arbitrary, completely connected, ring, tree, …
Communication
Shared memory
Broadcast/multicast?…)
Failure possible or not
What all can fail?
26
Knowledge of number of nodes in the
system
Exact or upper bound
Knowledge of diameter of the network
Others…
27
A distributed algorithm needs to specify
the model on which it is supposed to
work
28
Model assumed
Physical System
29
So Which Model to Choose?
Ideally, as close to the physical system
available as possible
The algorithm can directly run on the system
considered
30
But sometimes, start with a strong model
(even if somewhat impractical to implement)
Easier to design algorithms on a stronger
system
Can use this knowledge to then
33
Model 3: Synchronous, completely connected topology,
reliable communication
Wait for reply from all for T = 2α + β, or till one node says
Found
A node, on receiving a query for X, does local search for X
and replies Found if found, does not reply if not found
If no reply received within T, return “Not found”
35
But are we done?
Suppose X is not there. A gets many Not found
messages. How does it know if all nodes have
replied? (Termination Detection)
Lets change (strengthen) the model
Suppose A knows n, the total number of nodes
A can now count the number of messages received.
Termination if at least one Found message, or n Not found
messages
Message complexity ?
36
Suppose A knows upper bound on network
diameter and synchronous system
How many messages?
Can you do it without changing the model?
Try building and using a spanning tree!
What would be the message complexity?
37
Complexity Measures
Space complexity
Total no. of bits needed for storage at all the
nodes
Message complexity
Total no. of messages sent
Can be deceptive sometimes if message size
is non-constant
Communication complexity/Bit
Complexity
Total no. of bits sent
38
Time complexity
For synchronous systems, no. of rounds
For asynchronous systems, what is time
anyway?
Remember that there is no global clock
Different notions of time complexity measures exist
Should be careful when comparing the time
complexities of two algorithms
Check if the definitions of time are the same
39
Some Classical Problems
Ordering events in the absence of a global
clock
Capturing the global state
Termination detection
Mutual exclusion
Leader election
Clock synchronization
Constructing spanning trees (and other graph
structures)
Agreement protocols
40
Distributed Algorithms in Action
Domain Name System (DNS)
Internet routing protocols
Search engines
Cloud computing
High performance computing systems
Distributed file systems (NFS, HDFS)
Single sign-on login (Kerberos)
Many many more….
41