Module 4 Distributed System
Module 4 Distributed System
Overview
Distributed system is collection of loosely coupled processors
interconnected by a communications network
Processors variously called nodes, computers, machines, hosts
Site is location of the processor
Generally a server has a resource a client node at a different site
wants to use
Reasons for Distributed Systems
Failure detection
Reconfiguration
Failure Detection
Detecting hardware failure is difficult
To detect a link failure, a heartbeat protocol can be used
Assume Site A and Site B have established a link
At fixed intervals, each site will exchange an I-am-up
message indicating that they are up and running
If Site A does not receive a message within the fixed interval,
it assumes either (a) the other site is not up or (b) the
message was lost
Site A can now send an Are-you-up? message to Site B
If Site A does not receive a reply, it can repeat the message or
try an alternate route to Site B
Failure Detection (Cont.)