CS3551-Distributed-Computing 2QB2
CS3551-Distributed-Computing 2QB2
UNIT 1
TWO MARKS:
In this model, data is shared by sending and receiving messages between co-operating
processes, using system calls. Message passing refers to services performing a simple, one-way
transfer operation between two programs.
A method of communication between processes in a distributed system where messages are sent
and received to exchange data and synchronize actions.
Synchronous execution means the first task in a program must finish processing before
moving on to executing the next task. Asynchronous execution means a second task can begin
executing in parallel, without waiting for an earlier task to finish.
Ability to use any hardware, software or data anywhere in the system. Resource manager
controls access, provides naming scheme and controls concurrency.
b. Independent failures: The programs may not be able to detect whether the network has
failed or has become unusually slow.
c. Concurrency: The capacity of the system to handle shared resources can increased by
adding more resources to the network.
A distributed system needs to hide the fact that its processes and resources are physically
distributed across multiple computers.
Distributed system must be able to interact with services from other open systems,
irrespective of the underlying environment. Systems should conform to well-defined interfaces
and should support portability of applications.
10. List any two resources of hardware and software, which can be shared in distributed
systems with example.
Hardware resource: Memory cache server and CPU servers do some computation for their
clients hence their CPU is a shared resource.
Software resource: File: File servers enable multiple clients to have read/write access to
the same files. Database: The content of a database can be usefully shared. There are many
techniques that control the concurrent access to a database.
Design issues and challenges of distributed systems are heterogeneity openness, security,
scalability, failure handling, concurrency and transparency.
Enables local and remote information objects to be accessed using identical operations.
14. What is replication transparency?
Cache is made from static ram which is faster than the slower dynamic ram used for a
buffer. A cache transparently stores data so that future requests for that data can be served faster.
A buffer temporarily stores data while the data is the process of moving from one place to
another, ie. the input device to the output device. The buffer is mostly used for input/output
processes while the cache is used during reading and writing processes from the disk.
Open distributed system is a system that offers services according to standard rules that
describe the syntax and semantics of those services.
When mobile users can continue to use their wireless laptops while moving from place to
place without ever being disconnected.
A system is scalable with respect to either its number of components, size or number and
size of administrative domains, if it can grow in one or more of these dimensions without an
unacceptable loss of performance.
Middleware acts as an intermediary layer that facilitates communication, data management, and
service coordination between distributed applications, simplifying the development process.
22. Infer the Challenges of Distributed Systems:
a. Communication Reliability: Ensuring consistent and reliable communication between
distributed components despite network failures and latency.
b. Synchronization: Coordinating actions among distributed processes to maintain
consistency and avoid conflicts, especially in the presence of failures.
23. Name the Significant Consequences of Distributed Systems:
a. Increased Scalability: Ability to expand resources and services easily across multiple
nodes, supporting larger workloads.
b. Improved Fault Tolerance: Enhanced system resilience by allowing redundancy and
recovery mechanisms that maintain functionality despite component failures.
Message passing
Message sharing
Emulating Message – passing systems on a shared memory systems.
2. Describe about Design issues and challenges in Distributed Computing.
4.What is Global State? Explain about the global state of Distributed Systems.
a. Definition
b. Requirements of global state
i. Garbage collection
ii. Deadlock
iii. Termination
iv. Distributed debugging
5. Explain the primitives of distributed communication.
Send:
Receive:
Broadcast:
Synchronization Primitives:
Group Communication:
a. Applications
i. Mobile systems
ii. Pervasive computing
1. Intranet
iii. Multimedia system
1. Web casting
UNIT 2
2 MARKS:
Group communication offers a service whereby a message is sent to a group and then this
message is delivered to all members of the group. The sender is not aware of the identities of the
receivers.
Causal ordering is used for implementing distributed shared memory, fair resource
allocation. Other applications are updating replicated data, synchronizing multimedia streams
and allocating requests in a fair manner.
When all the communication between pairs of processes is by using synchronous send and
receives primitives, the resulting order is synchronous order.
Scalar time is designed by Lamport to synchronize all the events in distributed systems.
Time domain is the set of non-negative integers.
Addressing following issues: Data structures local to every process to represent logical
time.
Properties of scalar time are consistency, total ordering, event counting and system of
scalar clocks is not strongly consistent.
8. What is Rendezvous?
When the counter gets to zero, an interruption is generated and is called one clock tick.
With n computers, all n crystals will run at slightly different rates, causing the software
clocks to gradually get out of sync. This difference in time values is called dock skew.
A clock drift rate is the change in the offset between the clock and a nominal perfect
reference clock per unit of time measured by the reference clock.
NTP servers synchronize with one another in one of three modes multicast,procedure-call
and symmetric mode.
2 Internal synchronization: For a synchronization bound D>0.(C0)-C < D. for 1.1 1.2 N
and for all real times t in 1
Logical clock is a monotonically increasing software counter, whose value need bear no
particular relationship to any physical clock. Each process P, keeps its own logical clock L.,
which it uses to apply so called Lamport timestamps to events.
The global state of the distributed system consists of the local state of each process,
together with the messages which are in transit.
1. If a and b are events in the same process, and a occurs before b then a 'b is true.
2. It is the event of a message being sent by one process, and b is the event of the message being
received by another process, then a b is also true.
17. What is need of physical clock?
In some systems like real-time systems, the actual clock time is important these systems
external physical clocks are required.
With Lamport's clocks, you cannot tell whether two events are causally related or
concurrent by looking at the timestamps. Just because L(a) L(b) does not mean that a->b Vector
clocks allow you to compare two vector timestamps to determine whether the events are
concurrent or not.
Vector clocks are used in a distributed system to determine whether pairs of events are
causally related Using vector clocks, timestamps are generated for each event in the system, and
their causal relationship is determined by comparing those timestamps.
a Events: Every time an event is generated, a process increments its clock and assigns a
timestamp to the event based on its knowledge of all the clocks in the system.
b. Sending messages: When a message is sent the timestamp of the sending event es
given to the message.
c. Receiving messages: When a message is receivest, the process updates its knowledge
of the system clock states by taking the maximum of each component of the message timestamp
and its current knowledge of the system clock states.
16 MARKS:
Event Ordering
o Condition of happens before
o Logical clock condition
Lamport Timestamp
Vector Timestamp
2. Discuss about Physical Clock Synchronization: NTP
Scalar Time
o Basic Properties
Consistency property
Total ordering
Event counting
Vector Time
o Definition
Definition
System Model
Consistent Global State
6. Explain the total order and casual order in distributed system with neat diagram.
Total Order:
Definition
Key Characteristics:
Diagram
Process A: M1 -------> M2 -------> M3
Process B: M1 -------> M2 -------> M3
Process C: M1 -------> M2 -------> M3
I
Causal Order
Definition: Causal order ensures that messages that are causally related are delivered in the order
of their cause. If one message influences another, the message that caused it must be received first.
Key Characteristics:
Diagram:
Process B: M3
Process C: M4 -------> M5 (M5 is caused by M4)
Causal Relation:
M1 --> M2
M4 --> M5
Purpose: To record a global state of a distributed system comprising multiple processes and
communication channels.
Assumption: The communication channels are FIFO, ensuring that if a message is sent
from process A to process B, B will receive it in the order it was sent.
Key Concepts
Global State: The global state of a distributed system includes the local states of all
processes and the state of all channels (i.e., the messages in transit).
Consistent Snapshot: A snapshot is consistent if it reflects a state where all messages that
were sent but not yet received are accounted for.
1. Initiation:
o A designated process (say, Process P) initiates the snapshot by recording its local
state. It sends a special marker message to all other processes.
3. Message Recording:
o After recording its local state, each process continues processing incoming
messages.
o Any messages received after the local state is recorded but before the marker is
received must also be logged as part of the channel state.
4. Completion:
o The snapshot is complete when all processes have recorded their local states and the
states of their incoming channels.
Important Points
FIFO Property: The algorithm relies on the FIFO property, which guarantees that
messages are received in the order they are sent, simplifying the snapshot consistency.
Non-blocking: The algorithm is non-blocking; processes can continue their normal
operations after recording their states.
Scalability: The Chandy-Lamport algorithm is efficient and scalable for large distributed
systems.
Applications: Useful for debugging, recovery, and maintaining consistency in distributed
databases and systems.
Total Order:
Causal Order:
Definition: Messages that are causally related are delivered in the order of their
causality. If message A causally affects message B, A must be delivered before B.
Use Case: Useful in collaborative applications (e.g., shared document editing).
Implementation: Maintained using vector clocks or timestamps that reflect the
causal relationships.
Definition: Messages sent from one process to another are received in the order
they were sent.
Use Case: Suitable for scenarios where the order of messages from a single sender
matters but not necessarily between multiple senders.
Implementation: Ensured by using FIFO channels, where messages are queued and
processed in the order they are sent.
9. Explain the types of group communication used in distributed system.
1. One to many communication
2. Many to one communication
3. Many to many communication
UNIT 3
TWO MARKS:
2. What is deadlock?
Two type of messages used by Ricart-Agrawala are REQUEST and REPLY and
communication channels are assumed to follow FIFO order. Site send a REQUEST message to
all other site to get their permission to enter critical section. A site send a REPLY message to
other site to give its permission to enter the critical section.
Mutual exclusion in a distributed system states that only one process is allowed to execute
the critical section (CS) at any given time. In a distributed system, shared variables or a local
kernel cannot be used to implement mutual exclusion.
6. Which are the three basic approaches for implementing distributed mutual exclusion?
There are three basic approaches for implementing distributed mutual exclusion
c Strict fairness
d. Fault tolerance
Performances metric are message complexity, synchronization delay, response time and
system throughput.
The time interval a request waits for its CS execution to be over after its request messages
have been sent out.
10. Which are the criteria for evaluating performance of algorithms for mutual exclusion?
10. What is the advantage if your server side processing uses threads instead of a single
process?
An important property of threads is that they can provide a convenient means of allowing
blocking system calls without blocking the entire process in which the thread is running. This
property makes threads particularly attractive to use in distributed systems as it makes it much
easier to express communication in the form of maintaining multiple logical connections at the
same time.
A deadlock that is detected' but is not really a deadlock is called a phantom deadlock.
A non-preemptive approach. If a younger process is using the resource, then the older
process waits. If an older process is holding the resource, the younger process kills itself. This
forces the resource utilization graph to be directed from older to younger processes, making
cycles impossible. This algorithm is known as the wait-die algorithm.
14. List the deadlock handling strategies in distributed system.
There are three strategies for handling deadlocks, viz, deadlock prevention, deadlock
avoidance, and deadlock detection.
Deadlock avoidance depends on additional information about the long term resource
needs of each process. The system must be able to decide whether granting a resource is safe or
not and only make the allocation when it is safe. When a process is created, it must declare its
maximum claim, i.e. the maximum number of unit resource The resource manager can grant the
request if the resources are available.
Set of Deadlocked processes, where each process waits to receive messages from other
processes in the set.
Set of deadlocked processes, where each process waits for resource held by another
process.
The condition for deadlock in a system using the AND condition is the existence of a
cycle.
Decision made dynamically, before allocating a resource, the resulting global system
state is checked, if it is safe state then allow for allocation.
Definition
Requesting the critical session
Conditions for entering CS
Releasing the CS
Correctness
Optimization
Lamport evalution
Definition
Algorithm
Requesting the Critical Session
Executing the Critical Session
Releasing the Critical Session
Definition
Suzuki – Kasami’s Broadcast Algorithm.
o Major Design Issues
o Important Data Structures
o Algorithm
Requesting CS
Executing CS
Releasing CS
Theorem: A requesting site enters CS in finite time
Performance
Deadlock
Necessary Condition
o Mutual exclusion
o Hold and wait
o Circular waiting
o No preemption
Deadlock Prevention
o First Method
o Second Method
o Third Method
Dead Avoidance
o Disadvantage
Deadlock Detection
o Principle of operation
o Resolution
o Observation
UNIT 4
2 MARKS:
. • Achieve fault tolerance by periodically saving the state of a process during the
failure-free execution.
Each process has an initial value and all the correct processes must agree on a single
value.
Check pointing is most typically used to provide fault tolerance to applications. Check
pointing techniques are useful not only for availability, but also for program debugging, process
migration, and load balancing.
The difference between the agreement problem and the consensus problem is that, in the
agreement problem, a single process has the initial value, whereas in the consensus problem, all
processes have an initial value.
6. Define recovery.
Recovery refers to restoring a system to its normal operational state. Once a failure has
occurred, it is essential that the process where the failure happened recover to a correct state.
Fundamental to fault tolerance is the recovery from an error.
4. If failure rarely occurs between successive checkpoints, then the checkpoint algorithm
places an unnecessary extra load on the system, which can significantly affect performance.
Shadow version uses a map to locate versions of the server's objects in a file called a
version store. The map associates the identifiers of the server's objects with the positions of their
current versions in the version store. The versions written by each transaction are shadows of the
previous committed versions. The transaction status entries and intentions lists are stored
separately. When a transaction commits, a new map is made by copying the old map and
entering the positions of the shadow versions. To complete the commit process, the new map
replaces the old map.
10. Define fault and failure. What are different approaches to fault-tolerance?
Failure of a system occurs when the system does not perform its service in the manner
specified.
1. Termination
2. Agreement and
3. Integrity.
In the Byzantine agreement problem, n processors communicate with each other in order
to reach an agreement on a binary value b. There are bad processors. that may collaborate with
each other in order to prevent an admissible agreement. Each processor has an initial binary
value. The agreement must reflect to a certain extent the majority among the initial value.
A process may take a local check point anytime during the execution. The local
checkpoints of different processes are not coordinated to form a global consistent checkpoint.
A useless checkpoint of a process is one that will never be part of a global consistent
state. Useless checkpoints are not desirable because they do not contribute to the recovery of the
system from failures, but they consume resources and cause performance overhead.
Messages with receive recorded but message send not recorded are called the orphan
messages.
20. What is the basic idea behind task assignment approach?
Basic idea:
b. The amount of computation required by each task and the are known.
Impossible Scenario
Lamport – Shostak – Pease Algorithm
o Example
Definition
o Strongly Consistent Set of Checkpoint.
o Consistent Set of Checkpoint
o Checkpoint Notation
Synchronous Checkpoint and Recovery
o Checkpointing Algorithm
Types
o Synchronous Checkpointing Disadvantages
The Rollback Recovery Algorithm
o Phase one
o Phase two
Message Types
Uncoordinated Checkpointing
o Direct dependency tracking technique
Coordinated Checkpointing
o Blocking Checkpointing
o Non – Blocking Checkpointing
Communication – induced Checkpointing
4. Describe the Issues in Failure Recovery.
Basic Concept
Recovery
o System Failure
o Erroneous System State
o Error
o Fault
Introduction
o The Problem
o Validity
Consensus Problem
o Agreement
o Validity
Interactive Consistency Problem
UNIT 5
TWO MARKS:
NIST definition of cloud: Cloud computing is a pay-per-use model for enabling available,
convenient, on-demand network access to a shared pool of configurable computing resources
(eg, networks, servers, storage, applications, services) that can be rapidly provisioned and
released with minimal management effort or service-provider interaction.
Cloud service is any service made available to users on demand via the Internet from a
cloud computing provider's servers as opposed to being provided from a company's own on-
premises servers.
Public cloud is built over the Internet and can be accessed by any by user who has paid
for the service. Public clouds are owned by service providers and are accessible through a
subscription.
A private cloud is built within the domain of an intranet owned by a single organization.
Therefore, it is client owned and managed, and its access is limited to the owning clients and
their partners.
Characteristics of laaS
7. List the situations where PaaS may not be the best option.
Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable
compute capacity in the cloud. It is designed to make web-scale computing easier for developers
and system administrators.
EC2 functions:
Azure queue storage is a service for storing large numbers of messages that can be
accessed from anywhere in the world via authenticated calls using HTTP or HTTPS A single
queue message can be up to 64 KB in size, and a queue can contain of messages, up to the total
capacity limit of a storage account.
12. How virtualization employed in Azure?
Google cloud storage allows world-wide storage and retrieval of any amount of data at
any time. It can be used for a range of scenarios including serving website content, storing data
for archival and disaster recovery, or distributing large data objects to users via direct download.
Amazon S3 defines a bucket name as a series of one or more labels, separated by periods,
that adhere to the following rules: The bucket name can be between 3 and 63 characters long,
and can contain only lower-case characters, numbers, periods, and dashes.
Scalability is the ability of a system or network to handle increased load or usage. At the
same time, elasticity is the ability to automatically expand and contract resources to meet
demand.
Load balancing can be defined as the process of task distribution among multiple
computers, processes, disk, or other resources in order to get optimal resource utilization and to
reduce the computation time.
Load balancing is an important means to achieve effective resource sharing and
utilization.
Pros:
1. Data center and energy-efficiency savings: As companies reduce the size of their hardware
and server footprint, they lower their energy consumption.
2. Operational expenditure savings: Once servers are virtualized, your IT staff can greatly reduce
the ongoing administration and management of manual work.
3. Reduced costs: It reduced cost of IT infrastructure. 4. Data does not leak across virtual
machine.
5. Virtual machine is completely isolated from host machine and other virtual machine.
Cons:
Public Cloud
o Benefits
o Risks
Private Cloud
o Benefits
o Risks
Community Cloud
Hybrid Cloud
o Benefits
o Risks
Difference between public and private Cloud
Software as a Service(SaaS)
o Characteristics
o Benefits
Platform as a Service(PaaS)
o Characteristics
o Benefits
Infrastructure as a Service(IaaS)
o Types
Physical Server
Dedicated Virtual Server
Shared Virtual Server
o Advantage
Hypervisor
Para – Virtualization
o Problems
Full – Virtualization
o Host Based Virtualization
Pros and Cons of Virtualization