6CS5 DS Unit-3
6CS5 DS Unit-3
6CS5 DS Unit-3
Department of
Computer Science & Engineering
UNIT-III
Distributed Process
Scheduling
2
12/04/2022
Ms.Rashi Jain
ARYA GROUP OF COLLEGES JAIPUR Distributed System CS VI Sem
12/04/2022 3
Ms.Rashi Jain
ARYA GROUP OF COLLEGES JAIPUR Distributed System CS VI Sem
OUTLINE
• Introduction
• A System Performance Model
• Static Process scheduling with communication
• Dynamic load sharing and balancing
• Distributed process implementation
• Case Studies:
- Sun network file systems
- General parallel system and window’s file systems
- Andrew and coda file systems
12/04/2022 4
Ms.Rashi Jain
ARYA GROUP OF COLLEGES JAIPUR Distributed System CS VI Sem
12/04/2022 5
Ms.Rashi Jain
Distributed System CS VI Sem
ARYA GROUP OF COLLEGES JAIPUR
Partitioning a task into multiple processes for execution can result in a speedup of the total
task completion time.
Ms.Rashi Jain
12/04/2022 6
ARYA GROUP OF COLLEGES JAIPUR Distributed System CS VI Sem
Ms.Rashi Jain
12/04/2022
ARYA GROUP OF COLLEGES JAIPUR Distributed System CS VI Sem
• RP is Relative Processing: how much loss of speedup is due to the substitution of the best sequential algorithm
by an algorithm better adapted for concurrent implementation.
• RC is the Relative Concurrency which measures how far from optimal the usage of the n-processor is. It reflects
how well adapted the given problem and its algorithm are to the ideal n-processor system. The final expression
for speedup S is:
• The term ρ is called efficiency loss. It is a function of scheduling and the system architecture. It would be
decomposed into two independent terms: ρ = ρsched + ρsyst, but this is not easy to do since scheduling and the
architecture are interdependent. The best possible schedule on a given system hides the communication overhead
(overlapping with other computations).
Ms.Rashi Jain
12/04/2022 8
ARYA GROUP OF COLLEGES JAIPUR Distributed System CS VI Sem
1. Algorithm development
2. System architecture
3. Scheduling policy
• With the objective of minimizing the total completion time (makespan) of a set of interacting processes.
• If processes are not constrained by precedence relations and are free to be redistributed or moved around
among processors in the system, performance can be further improved by sharing the workload.
Statically - load sharing
Dynamically - load balancing
Ms.Rashi Jain
12/04/2022 9
ARYA GROUP OF COLLEGES JAIPUR Distributed System CS VI Sem
WORKLOAD DISTRIBUTION
Load Sharing:
Meanwhile, load sharing entails sending a
portion of the traffic to one server and another
portion elsewhere.
The static load balancing algorithms can be divided into two sub classes:
1. Optimal static load balancing.
2. Sub optimal static load balancing.
• If all the information and resources related to a system are known Optimal static load
balancing can be done.
•It is possible to increase throughput of a system and to maximize the use of the resources by
optimal load balancing algorithm.
12/04/2022 13
Ms.Rashi Jain
ARYA GROUP OF COLLEGES JAIPUR Distributed System CS VI Sem
•The thumb-rule and heuristics methods are important for sub-optimal algorithm.
• In DLB jobs are reassigned at the runtime depending upon the situation that is the load
will be transferred from heavily loaded nodes to the lightly loaded nodes.
• In this case communication over heads occur and becomes more when number of
processors increase.
• In dynamic load balancing no decision is taken until the process gets execution.
QUEUING THEORY
Queuing theory deals with problems which involve queuing (or waiting). Typical examples might be:
Performance of system described as queuing models can be computed using queuing theory. An x/y/z queue is one where:
• X: Arrival process
• Y: Service time distribution
• Z: Number of servers
• γ : Migration rate which depend on the channel bandwidth, migration protocol, context and state information of the process being
transferred.
12/04/2022 Ms.Rashi Jain 18
ARYA GROUP OF COLLEGES JAIPUR Distributed System CS VI Sem
QUEUING CONT...
• A good heuristic distributed scheduling algorithm is one that can best balance and overlap
computation and communication.
• Computational model
• Primary objective of task scheduling is to achieve maximal concurrency for task execution within a
program
• Finding the minimum makespan is NP-complete, so we will rely on heuristic algorithms for finding good mapping of
the process model to the system model.
• For precedence process graphs, the notion of critical path is useful - the longest execution path in the DAG, which
is the lower bound of the makespan. Simple heuristic: map all tasks in a critical path onto a single processor.
1.List Scheduling (LS) strategy: No processor remains idle if there are some tasks available that it could process
(without considering communication overhead).
2.Extended List Scheduling (ELS) strategy: Allocating tasks to processors according to LS and adding
communication delays, communication overhead.
3.Earliest Task First (ETF): The earliest schedulable task is scheduled first (calculation includes communication
overhead).
12/04/2022 Ms.Rashi Jain 22
ARYA GROUP OF COLLEGES JAIPUR Distributed System CS VI Sem
• Process scheduling for many system applications has a perspective very different
from precedence model – applications may be created independently, processes do not
have explicit completion time and precedence constraints.
• Primary objectives of process scheduling are to maximize resource utilization and to
minimize inter process communication.
• Communication process model is an undirected graph G with node and edge sets V
and E, where nodes represent processes and the weight on an edge is the amount of
interaction between two connected processes.
12/04/2022 Ms.Rashi Jain 23
ARYA GROUP OF COLLEGES JAIPUR Distributed System CS VI Sem
• If we can designate a controller process that maintains the information about the queue size of each
processor:
• Fairness in terms of equal workload on each processor (join the shortest queue) - migration workstation
model (use of load sharing and load balancing, perhaps load redistribution i.e. process migration)
• Fairness in terms of user’s share of computation resources (allocate processor to a waiting process at a user
site that has the least share of the processor pool) - processor pool model
CONT...
Sender-initiated algorithms:
• push model
• includes probing strategy for finding a node with the smallest queue length (perhaps multicast)
• performs well on a lightly loaded system
Receiver-initiated algorithms:
• pull model
• probing strategy can also be used
• more stable
• perform on average better
• Combinations of both algorithms are possible: choice based on the estimated system load information or reaching threshold values of the
processing node’s queue.
12/04/2022 Ms.Rashi Jain 26
ARYA GROUP OF COLLEGES JAIPUR Distributed System CS VI Sem
PERFORMANCE COMPARISON
• Remote execution: The messages contain a program to be executed at the remote site; implementation
issues:
– load sharing algorithms (sender-initiated, registered hosts, broker...)
• Process migration: The messages represent a process being migrated to the remote site for continuing
execution (extension of load-sharing by allowing a remote execution to be preempted)
A Distributed File System (DFS) is a file system that supports sharing of files and resources in the form of persistent
storage over a network .
First widely used distributed file system was Sun's Network File System (NFS) introduced in 1985.
Examples of distributed file systems: Andrew File System (CMU), CODA (CMU), Google File System(Google).
A client module.
The Client module implements exported interfaces by flat file and directory services on server side.
Responsibilities of various modules can be defined as follows:
Client module:
It runs on each computer and provides integrated service (flat file and
directory) as a single API to application programs. For example, in
UNIX hosts, a client module emulates the full set of Unix file
operations.
FILE OPERATIONS
File systems are responsible for the organization, storage, retrieval, naming, sharing and protection of fi
les.
Files contains both data and attributes
File systems are designed to store and manage large numbers of files, with facilities for creating, namin
g and deleting files
• It is one of the two important components (process and file) in any distributed computation.
• It is a good example for illustrating the concept of transparency and client/server model.
• File sharing and data replication present many interesting research problems.
• Atomicity – either all tasks in a transaction are performed, or none of them are;
• Consistency – data is in a consistent state when the transaction begins, and when it ends;
• Isolation – all operations in a transaction are isolated from operations outside the transaction;
• Durability – upon successful completion, the result of the transaction will persist.
• Transaction processing system (TPS)
• Transaction manager (TM)
• Scheduler (SCH)
• Atomicity
• All or none: TM, two-phase commit
• Indivisible (serializable): SCH, concurrency control protocols
Several concurrency control mechanisms are available for maintaining consistency of data items such as:
turn-taking, serialization , transactional locking mechanism , and operational transformation . Lock
mechanisms, as a widely used method for concurrency control in transaction models , provide enough isolation
on modified data items (Exclusive lock) to ensure there is no access to any of these data items before a
transaction that is accessing or updating them commits .
12/04/2022 Ms.Rashi Jain 41
ARYA GROUP OF COLLEGES JAIPUR Distributed System CS VI Sem
Write operation
Because data is copied with latency, primary storage Read operations
• Write-one-primary
• Read-one-primary
failure can cause data loss if most recent modifications • Read-one
• Write-all
Write operation
Because data is copied with latency, primary storage Read operations
• Write-one-primary
• Read-one-primary
failure can cause data loss if most recent modifications • Read-one
• Write-all
CASE STUDIES
File level replication is asynchronous, so files schedule/time interval changes, are collected
and replicated to the secondary are first written to the primary storage and then, based on a
defined storage unit.
Because data is copied with latency, primary storage failure can cause data loss if most
recent modifications were not transferred to the secondary storage.
Architecture of a replica manager:
12/04/2022 45
Ms.Rashi Jain