Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
• Bit-level parallelism
• Number of bits processed per clock cycle (often called a word size)
• Increased from 4-bit, to 8-bit, 16-bit, 32-bit, and to 64-bit
• Instruction-level parallelism
• Computers now use multi-stage processing pipelines to speed up execution
• Data parallelism or loop parallelism
• The program loops can be processed in parallel
• Task parallelism
• The problem can be decomposed into tasks that can be carried out
concurrently. For example, SPMD. Note that data dependencies cause
different flows of control in individual tasks
2
Parallel Computer Architecture
• Flynn’s taxonomy of computer architectures
• Based on the # of concurrent control/instruction
& data streams
• SISD (Single Instruction Single Data)
• Scalar architecture with one processor/core
• SIMD (Single Instruction, Multiple Data)
• Supports vector processing
• Operations on individual vector components are
carried out concurrently
3
Parallel Computer Architecture
• MIMD (Multiple Instructions, Multiple Data)
• Several processors/cores function
asynchronously and independently
• At any time, different processors/cores may be
executing different instructions on different data
• Several types of systems:
• Uniform Memory Access (UMA)
• Cache Only Memory Access (COMA)
• Non-Uniform Memory Access (NUMA)
4
Distributed Systems
• A distributed system is a collection of:
• Autonomous computers
• Connected through a network
5
Characteristics of Distributed Systems
• Users perceive the system as a single, integrated computing facility
• Components are autonomous
• Scheduling, resource management and security policies are
implemented by each system
• There are multiple:
• Points of control
• Points of failure
• Resources may not be accessible at all times
• Such distributed systems can be scaled via additional resources
• They can be designed to maintain availability even at low levels of
hardware/software/network reliability
6
Desirable Properties of a Distributed System
• Access Transparency
• Local and remote resources are accessed using identical operations
• Location Transparency
• Information objects are accessed without knowing their location
• Concurrency Transparency
• Several processes run concurrently using shared information objects without
interference among them
• Replication Transparency
• Multiple instances of information objects increase reliability without the
knowledge of users or applications
7
Desirable Properties of a Distributed System
• Failure Transparency
• Concealment of failures
• Migration Transparency
• Information objects in the system are moved without affecting the operation
performed on them
• Performance Transparency
• The system can be reconfigured based on the load and quality of service
(QoS) requirements
• Scaling Transparency
• The system and applications can scale without changing the system structure
and without affecting the applications
8
Processes, Threads and Events
• Dispatchable units of work:
• Process – is a program in execution
• Thread – is a lightweight process
• State of a process/thread:
• Information required to restart a suspended process/thread, e.g. program
counter and the current values of the registers
• Event
• A change of state of a process, e.g., local or communication events
9
Amdahl’s Law
We parallelize our programs in order to run them faster
4
Amdahl’s Law: An Example
Suppose that 80% of you program can be parallelized and that you
use 4 processors to run your parallel version of the program
Although you use 4 processors you cannot get a speedup more than
2.5 times (or 40% of the serial running time)
5
Real Vs. Actual Cases
Amdahl’s argument is too simplified to be applied to real cases
20 80 20 80
Serial Serial
Parallel 20 20 Parallel 20 20
Process 1 Process 1
Process 2 Process 2
Cannot be parallelized
Process 3 Process 3
Cannot be parallelized Can be parallelized
6
Guidelines
In order to efficiently benefit from parallelization, we
ought to follow these guidelines:
7
Parallel Computer Architectures
Multi-Chip Single-Chip
Multiprocessors Multiprocessors
9
Multi-Chip Multiprocessors
We can categorize the architecture of multi-chip multiprocessor
computers in terms of two aspects:
Address Space
Shared Individual
Memory
10
Symmetric Multiprocessors
A system with Symmetric Multiprocessors (SMP) architecture uses a
shared memory that can be accessed equally from all processors
Memory
I/O
11
Massively Parallel Processors
A system with a Massively Parallel Processors (MPP) architecture
consists of nodes with each having its own processor, memory and
I/O subsystem
Interconnection Network
12
Distributed Shared Memory
A Distributed Shared Memory (DSM) system is typically built on a
similar hardware model as MPP
13
Parallel Computer Architectures
Multi-Chip
Multi-Chip Single-Chip
Multiprocessors
Multiprocessors Multiprocessors
14
Chip Multiprocessors
The outcome is a single-chip multiprocessor referred to as Chip
Multiprocessor (CMP)
16
Models of Parallel Programming
What is a parallel programming model?
18
Traditional Parallel Programming
Models
Parallel Programming Models
19
Shared Memory Model
In the shared memory programming model, the abstraction is that
parallel tasks can access any location of the memory
20
Shared Memory Model
Single Thread Multi-Thread
Si = Serial
Pj = Parallel Time
Time
S1 S1 Spawn
P1
P1 P2 P3 P3
P2
Join
P3 S2 Shared Address Space
P4
S2
Process
Process
21
Shared Memory Example
begin parallel // spawn a child thread
private int start_iter, end_iter, i;
shared int local_iter=4, sum=0;
shared double sum=0.0, a[], b[], c[];
shared lock_type mylock;
Parallel
22
Traditional Parallel Programming
Models
Parallel Programming Models
28
Message Passing Model
In message passing, parallel tasks have their own local memories
Message Passing Interface (MPI) programs are the best fit with the
message passing programming model
29
Message Passing Model
Single Thread Message Passing
S = Serial
P = Parallel
Time
Time
S1 S1 S1 S1 S1
P1 P1 P1 P1 P1
P2 S2 S2 S2 S2
P3
P4
Process 0 Process 1 Process 2 Process 3
S2
Node 1 Node 2 Node 3 Node 4
Process
30
SPMD and MPMD
When we run multiple processes with message-passing, there are
further categorizations regarding how many different programs are
cooperating in parallel execution
34
SPMD
In the SPMD model, there is only one program and each process
uses the same executable working on different sets of data
a.out
35
MPMD
The MPMD model uses different programs for different processes,
but the processes collaborate to solve the same problem
MPMD has two styles, the master/worker and the coupled analysis
Example
39
8. Distributed systems
◼ Collection of autonomous computers, connected through a network
operating under the control and distribution software.
◼ Middleware → software enabling individual systems to coordinate
their activities and to share system resources.
◼ Main characteristics of distributed systems:
The users perceive the system as a single, integrated computing facility.
The components are autonomous.
Scheduling and other resource management and security policies are
implemented by each system.
There are multiple points of control and multiple points of failure.
The resources may not be accessible at all times.
Can be scaled by adding additional resources.
Can be designed to maintain availability even at low levels of
hardware/software/network reliability.
10
Messages and Communication Channels
• A message is a structured unit of information
• A communication channel provides the means for processes or
threads to:
• Communicate with one another
• Coordinate their actions by exchanging messages
• Communication is done using send(m) and receive(m) system calls, where m is
a message
11
Messages and Communication Channels
• State of a communication channel
• Given two processes 𝑝𝑖 and 𝑝𝑗 , the state of the channel 𝜉𝑖,𝑗 from 𝑝𝑖 to 𝑝𝑗
consists of messages sent by 𝑝𝑖 but not yet received by 𝑝𝑗
• Protocol
• A finite set of messages exchanged among processes to help them coordinate
their actions
12
Process Coordination – Communication Protocols
• A major challenge is to guarantee that 2 processes will reach an
agreement in case of channel failures
• Communication protocols ensure process coordination by
implementing:
• Error Control mechanisms
• Using error detection and error correction codes
• Flow Control
• Provides feedback from the receiver, it forces the sender to transmit only the amount of
data the receiver can handle
• Congestion Control
• Ensures that the offered load of the network does not exceed the network capacity
13
Process Coordination – Time and time intervals
• Process Coordination requires:
• A global concept of time shared by cooperating entities
• The measurement of time intervals, the time elapsed between 2 events
• Two events in the global history may be unrelated
• Neither one is the cause of the other
• Such events are said to be concurrent events
• Local timers provide relative time measurements
• An isolated system can be characterized by its history, i.e., a sequence of
events
2
Process Coordination – Time and time intervals
• Global agreement on time is necessary to trigger actions that should
occur concurrently
• Timestamps are often used for event ordering
• Using a global time base constructed on local virtual clocks
3
Causality Example: Event Ordering
4
Logical Clocks
• Logical Clock (LC)
• An abstraction necessary to ensure the clock condition in the absence of a
global clock
• A process maps events to positive integers
• LC(e) is the local variable associated with event e.
• Each process time-stamps the message m it sends with the value of
the logical clock at the time of sending:
5
Logical Clocks
1 2 3 4 5 12
p 1
m 1 m 2 m
5
1 2 6 7 8 9
p 2
m 3 m
4
1 2 3 10 11
p 3
6
Logical p
1 2 3 4 5 12
Clocks 1
m 1 m 2 m
5
1 2 6 7 8 9
p 2
m 3 m
4
1 2 3 10 11
p 3
7
Message Delivery Rules; Causal Delivery
• A real-life network might reorder messages.
• First-In-First-Out (FIFO) delivery
• Messages are delivered in the same order they are sent.
• Causal delivery
• An extension of the FIFO delivery
• Used in case when a process receives messages from different sources.
• Communication channel typically does not guarantee FIFO delivery
• However, FIFO delivery is enforced by attaching a sequence number to each message sent
• The sequence numbers are also used to reassemble messages out of individual packets.
8
Concurrency
• Required by system and application software:
• Reactive systems respond to external events
• e.g., operating system kernel, embedded systems.
• Improve performance
• Parallel applications partition workload & distribute it to multiple threads running
concurrently.
• Support variable load & shorten the response time of distributed applications, like
• Transaction management systems
• Client-server applications
9
Consensus Protocols
• Consensus
• Process of agreeing to one of several alternates proposed by a number of
agents.
• Consensus Service
• Set of n processes
• Clients send requests, propose a value and wait for a response
• Goal is to get the set of processes to reach consensus on a single proposed
value.
10
Consensus Protocols
• Consensus protocol assumptions:
• Processes run on processors and communicate through a network
• processors and network may experience failures, (but not the complicated failures).
• Processors:
• Operate at arbitrary speeds
• Have stable storage and may rejoin the protocol after a failure
• Send messages to one another.
• Network:
• May lose, reorder, or duplicate messages
• Messages are sent asynchronously
• Message may take arbitrary long time to reach the destination.
11
Client-Server Paradigm
• This paradigm is based on the enforced modularity
• Modules are forced to interact only by sending and receiving messages.
• A more robust design
• Clients and servers are independent modules and may fail separately.
• Servers are stateless
• May fail and then come up without the clients being affected or even noticing
the failure of the server.
• An attack is less likely
• Difficult for an intruder to guess the:
• Format of the messages
• Sequence numbers of the segments, when messages are transported by TCP
12
Logical Clocks
• Logical Clock (LC)
• An abstraction necessary to ensure the clock condition in the absence of a global
clock
• A process maps events to positive integers
• LC(e) is the local variable associated with event e.
• Each process time-stamps the message m it sends with the value of the
logical clock at the time of sending:
2
Logical 1 2 3 4 5 12
p
Clocks 1
m 1 m 2 m
5
1 2 6 7 8 9
p 2
m 3 m
4
1 2 3 10 11
p 3
5
Services
(a)
(b)
6
WWW
• 3-way handshake
• First 3 messages exchanged between the client and the server
• Once a TCP connection is established the HTTP server takes its time to
construct the page to respond the first request
• To satisfy the second request, the HTTP server must retrieve an image
from the disk
• Response time includes
• Round Trip Time (RTT)
• Server residence time
• Data transmission time
7
Browser Web Server
RTT
SYN
SYN
HTTP request
ACK
time time
8
HTTP Communication HTTP client
Web
request
TCP
port
HTTP
80 server
• A Web client can:
Browser
response
Proxy
TCP port 80
Browser
80 server
response response