0% found this document useful (0 votes)

32 views

Programação Paralela e Distribuída

This document introduces parallel computing concepts including Flynn's taxonomy for classifying computer architectures based on the number of instruction and data streams. It discusses parallel processing models such as SIMD, SPMD, MIMD and MPMD and how they differ in terms of instruction and data streams. It also covers parallel computer communication methods including shared memory, distributed memory and hybrid systems. Examples of parallel algorithms and how to implement them using control versus data parallelism are provided.

Uploaded by

Marcelo Azevedo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

Programação Paralela e Distribuída

Uploaded by

Marcelo Azevedo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Programao Paralela e Distribuda

Introduo Computao Paralela

Prof. Fbio M. Costa
Programa de Ps-Graduao em Cincia da Computao
Instituto de Informtica
Universidade Federal de Gois
2 Semestre / 2016

Parallel Computing Architectures - Overview

Classification of Computer Architectures

Control Flow: Single and Multiple
Instruction Streams
Communication: Shared and Distributed
Memory

Flynns Taxonomy

Parallel computers: Control Flow

SISD (single instruction stream, single data stream): these
computers correspond to the conventional sequential
computer.
MISD (multiple instruction stream, single data stream):
these computers are rare and can be thought of as
systolic (pipelined) array computers.
SIMD (single instruction stream, multiple data stream):
computers of this kinds have a single control unit that
dispatches the same instruction to various processors
(which work on different data).
MIMD (multiple instruction stream, multiple data stream):
computers have processors with their own control unit.

SIMD Processors
Some of the earliest parallel computers such as the Illiac IV,
MPP, DAP, CM-2, and MasPar MP-1 belonged to this class
of machines.
Variants of this concept have found use in co-processing units,
such as the MMX units in Intel processors, and in DSP chips,
such as the Sharc, and more recently on GPUs.
SIMD relies on the regular structure of computations (such as
those in image processing).
It is often necessary to selectively turn off operations on certain
data items. For this reason, most SIMD programming
paradigms allow for an activity mask, which determines if a
processor should participate in a computation or not.

MIMD Processors
In contrast to SIMD processors, MIMD processors can
execute different programs on different processors.
A variant of this, called single program multiple data streams
(SPMD) executes the same program on different
processors.
It is easy to see that SPMD and MIMD are closely related in
terms of programming flexibility and underlying
architectural support.
Examples of such platforms include current generation Sun
Ultra Servers, SGI Origin Servers, multiprocessor PCs,
workstation clusters, and the IBM SP.

SPMD Model
(Single Program Multiple Data)
Each processor executes the same program
asynchronously.
They can execute different instructions within the same
program using instructions similar to:
if myNodeNum = 1 do this, else do that
Synchronization takes place only when processors need
to exchange data
SPMD is an extension of SIMD (relax synchronized
instruction execution) and a restriction of MIMD (use only
one source/object code)

SIMD-SPMD Comparison
In SPMD, multiple autonomous processors
simultaneously execute the same program, but at
independent points
In SIMD, processors execute the program in lockstep
(same instruction at the same time)
With SPMD, tasks can be executed on general
purpose CPUs
SIMD requires vector processors to manipulate data
streams.

SIMD-MIMD Comparison
SIMD computers require less hardware than MIMD
computers (single control unit).
However, since SIMD processors are specially designed,
they tend to be expensive and have long design cycles.
Not all applications are naturally suited to SIMD
processors.
In contrast, platforms supporting the SPMD paradigm can
be built from inexpensive off-the-shelf components with
relatively little effort in a short amount of time.

MPMD Model
(Multiple Program Multiple Data)
MPMD is the equivalent of having different
programs executing on different processors
(ex. Client/Server)
(This will be covered in the Distributed Programming
part of the course)

Parallel computers: Communication

There are two primary forms of data exchange
between parallel tasks - accessing a shared data
space and exchanging messages.
Platforms that provide a shared data space are
called shared-address-space machines, or
multiprocessors.
Platforms that support messaging are called
message passing machines, or multicomputers.

Shared-Address-Space Computers
Part (or all) of the memory is accessible to all processors.
Processors interact by modifying data objects stored in
this shared-address-space.
If the time taken by a processor to access any memory
word in the system (either global or local) is identical,
the platform is classified as a uniform memory access
(UMA), else, it is a non-uniform memory access
(NUMA) machine.

NUMA and UMA Shared-Address-Space

Computers
The distinction between NUMA and UMA platforms is
important from the point of view of algorithm design.
NUMA machines require locality from underlying
algorithms for performance.
Programming these platforms is easier since reads and
writes are implicitly visible to other processors.
However, read-write to shared data must be coordinated.
Caches in such machines require coordinated access to
multiple copies. This leads to the cache coherence
problem.

Shared-Address-Space vs. Shared Memory

Computers
It is important to note the difference between the
terms shared address space and shared
memory.
We refer to the former as a programming
abstraction and to the latter as a physical
machine attribute.
It is possible to provide a shared address space
using a physically distributed memory.

Shared Memory
One or more memories.
Global address space (all system memory visible to all
processors).
Transfer of data between processors is usually implicit, just
read (write) to (from) a given memory address (OpenMP).
Cache-coherency protocol to maintain consistency between
processors.

Distributed Memory
Each processor has access to its own memory only.
Data transfer between processors is explicit, user calls
message passing functions.
Common Libraries for message passing: MPI, PVM
User has complete control/responsibility for data placement and
management.

Distributed Shared Memory

(Shared-Address-Space)
Single address space with implicit communication
Hardware support for read/write to non-local memories,
cache coherency.
Latency for a memory operation is greater when
accessing non local data than when accessing data
within a CPUs own memory.

Hybrid Systems
Distributed memory system where each node is a
multiprocessor with shared memory.
Most common architecture for current generation of
parallel machines.

Message-Passing Computers
These platforms comprise a set of processors and their own
(exclusive) memory.
Instances of such a view come naturally from clustered
workstations and non-shared-address-space
multicomputers.
These platforms are programmed using (variants of) send
and receive primitives.
Libraries such as MPI and PVM provide such primitives.

Message Passing vs. Shared Address Space

Computers
Message passing requires little hardware
support, other than a network.
Shared address space platforms can easily
emulate message passing. The reverse is
more difficult to do (in an efficient manner).

Flynn-Johnson classification of computers

Approaches to Parallelism

Approaches to Parallelism
Dividing the processing
Discovering the maximum possible
parallelism
Approaches
Data-centered: Data parallelism
Process-centered: Control parallelism

Approaches to Parallelism
Functional Decomposition: Control Parallelism

First: divide the processing into parts

Second: determine how to associate data
to processing

Approaches to Parallelism
Domain decomposition: Data parallelism
First: divide the data into parts
Second: determine how to associate
processing with data
Focusing on the largest and/or most
accessed data structure in the program

Approaches to Parallelism
Checklist for data parallelism:
The number of primitive tasks is at least an
order of magnitude greater than the number
of processors
Redundant processing and data structure
storage is minimized
Primitive tasks are all the same size
The number of tasks increases with the size of
the problem

Example: Eratosthenes Sieve

A classical algorithm to obtain the prime
numbers n
Use multiples of the prime numbers (2, 3, 5,
) to remove composite numbers
Terminates when multiples of the greatest
prime number n is obtained

Sequential Algorithm

Sequential Algorithm (conted)

5 is the greatest prime number 30 = 5.5 => Terminates

How to solve it using Control Parallelism?

Any suggestions?

Solution using Control Parallelism

Algorithm: Each processor looks for the
next prime number and marks its multiples
Problems:
Two processors may use the same prime
number to walk the sieve => waste of time
(though no error is caused)
A processor may mark multiples of a composite
number

How to solve it using Data Parallelism?

Any suggestions?

Solution using Data Parallelism

Algorithm: First processor finds the next prime
number and sends it to the others, which in turn
work together to mark the multiples, each
processor working on a distinct data segment
Note:
All prime numbers n must be in the first processor
Each processor receives no more than n/p data
items

Problem
Propose parallel solutions to calculate the
following vector-based expression:
k1*A + k2*B
where k1 and k2 are constants, and A and B
are arrays of size n.
Present two solutions: one that explores
control parallelism and another one that
explores data parallelism

Possible solution: Control Parallelism

A solution using control parallelism could
use 2 processors.
Each processor would compute one of the
terms of the expression (i.e., k1*A or k2*B).
Then one of the processors would compute
the sum of the terms.

Possible solution: Data Parallelism

For a solution using data parallelism, we could use
n processors, assigning one array element to
each processor, and computing the expression in
3 steps:
In the first step, each processor multiplies
constant k1 by its element.
In the second step, the second term of the
expression would be computed similarly.
In the third step, each processor computes the
two elements computed in steps 1 and 2.

Parallel vs Distributed Programming

Parallel Programming: improves performance
of a single application, making it possible to
handle a huge amount of data that otherwise
wouldn't fit into a single memory. Does not deal
with issues such as security, failures etc.
Distributed Programming: allows the
cooperation of more than one application/task.
Must deal with security, fault tolerance etc.

Parallel Programming
Shared Memory:
Pthreads (task parallelism, SPMD, a few threads - SMT, Simultaneous
Multithreading)
OpenMP (data parallelism, SPMD) higher level of abstraction
CUDA/OpenCL(data/task parallelism,SPMD, massive multithreading). Exploits
data parallelism using a SIMD-like approach, without having to resort to
vector code. Instead, uses a SIMT (Single Instruction, Multiple Threads).

Distributed Memory:
MPI (data/task parallelism, SPMD)
MapReduce (data parallelism, SPMD) - (higher level abstractions). Data
parallelism with large chunks. SIMD where operation is a function and data
is a data partition.

Distributed Programming
Shared Memory
Pthreads (task parallelism, SPMD): different tasks (functions) of
a single application cooperate to improve performance (ex.
Spreadsheet: user interface + calculation + save backup etc)

Distributed Memory
Sockets (task parallelism, MPMD)
RPC and its higher level abstractions: Java RMI, CORBA (task
parallelism, MPMD)
Message-Oriented Middleware (JMS) (data/control driven)
Publish-Subscribe (DDS) (data-driven)
Tuple Spaces (data-driven)

Iso 08764-1-2004
No ratings yet
Iso 08764-1-2004
28 pages
N6 Mechanotechnics
100% (1)
N6 Mechanotechnics
22 pages
As You R
No ratings yet
As You R
97 pages
Parallel Processing
No ratings yet
Parallel Processing
35 pages
Solution Key To Second Round of IMAS 2017/2018 Upper Primary Division
33% (3)
Solution Key To Second Round of IMAS 2017/2018 Upper Primary Division
9 pages
Lecture 3 - 1 Dichotomy of Parallel Computing Platforms
No ratings yet
Lecture 3 - 1 Dichotomy of Parallel Computing Platforms
17 pages
High Performance Computing
No ratings yet
High Performance Computing
17 pages
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
Parallel and Distributed Algorithms: Johnnie W. Baker
No ratings yet
Parallel and Distributed Algorithms: Johnnie W. Baker
67 pages
Unit-7 Design Issues For Parallel Computers Definition
No ratings yet
Unit-7 Design Issues For Parallel Computers Definition
11 pages
Chapter 6 Parallel and Concurrent Computing
No ratings yet
Chapter 6 Parallel and Concurrent Computing
27 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
Baker CHPT 5 SIMD Good
No ratings yet
Baker CHPT 5 SIMD Good
94 pages
Parallel Processing
No ratings yet
Parallel Processing
31 pages
Parallel_computing
No ratings yet
Parallel_computing
32 pages
Explicitly Parallel Platforms
No ratings yet
Explicitly Parallel Platforms
90 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
2.3 Dichotomy of Parallel Computing Platforms
No ratings yet
2.3 Dichotomy of Parallel Computing Platforms
6 pages
CS0051 - M1-Parallel Computing Hardware
No ratings yet
CS0051 - M1-Parallel Computing Hardware
36 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Why Multiprocessors?: Motivation: Opportunity
No ratings yet
Why Multiprocessors?: Motivation: Opportunity
20 pages
Module 2 - Parallel Computing
No ratings yet
Module 2 - Parallel Computing
55 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
No ratings yet
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
28 pages
Coa Unit 04
No ratings yet
Coa Unit 04
85 pages
PARALLEL VS DISTRIBUTED COMPUTING
No ratings yet
PARALLEL VS DISTRIBUTED COMPUTING
9 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
Parallel Computer Models: PCA Chapter 1
No ratings yet
Parallel Computer Models: PCA Chapter 1
61 pages
2 Parallel Computer Memory Architectures
No ratings yet
2 Parallel Computer Memory Architectures
26 pages
3 4 Flayynn Taxonomy, Network
No ratings yet
3 4 Flayynn Taxonomy, Network
84 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
Unit2_a
No ratings yet
Unit2_a
70 pages
(SMC), (SMP), (MPP) : Symmetric Multi-Computers Symmetric Multi-Processors
No ratings yet
(SMC), (SMP), (MPP) : Symmetric Multi-Computers Symmetric Multi-Processors
13 pages
Parallel
No ratings yet
Parallel
5 pages
Parallel Processing Report
No ratings yet
Parallel Processing Report
9 pages
8051 Arch
No ratings yet
8051 Arch
55 pages
Chapter - 5 Parallel Processing
No ratings yet
Chapter - 5 Parallel Processing
117 pages
01 Intro Parallel Computing
No ratings yet
01 Intro Parallel Computing
40 pages
2. Parallel Computers
No ratings yet
2. Parallel Computers
39 pages
Flynn's Classification
No ratings yet
Flynn's Classification
4 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
90 pages
CS802A Lec-2 PDF
No ratings yet
CS802A Lec-2 PDF
28 pages
Parallel Computig Assignment
No ratings yet
Parallel Computig Assignment
15 pages
APznzaaBPbq19r7DttJsFJDiz6xdljQmPxg0oflqRAoyoqcN6IEEo4yrW Ck8XgHkH5PDMZIHRNz7h0ZpQWHOHwyjvO3PX93sVHvLd5fwcGETUu8XvmdTkaodNRbNrLgkDFPQZVQMfz8KHkZay30aqD0CVLA10PSummzrUt1vN32NEahcaq-m3CTYqZXjSBaBus9kPl5fj8KDKPT (1)
No ratings yet
APznzaaBPbq19r7DttJsFJDiz6xdljQmPxg0oflqRAoyoqcN6IEEo4yrW Ck8XgHkH5PDMZIHRNz7h0ZpQWHOHwyjvO3PX93sVHvLd5fwcGETUu8XvmdTkaodNRbNrLgkDFPQZVQMfz8KHkZay30aqD0CVLA10PSummzrUt1vN32NEahcaq-m3CTYqZXjSBaBus9kPl5fj8KDKPT (1)
80 pages
Model
No ratings yet
Model
14 pages
L2
No ratings yet
L2
27 pages
Cloud Computing - Lecture 3
No ratings yet
Cloud Computing - Lecture 3
22 pages
08 Parallel algorithms approches
No ratings yet
08 Parallel algorithms approches
12 pages
Paralle Processing in Brief
No ratings yet
Paralle Processing in Brief
31 pages
Lec1 Introduction to Parallel Computing (2)
No ratings yet
Lec1 Introduction to Parallel Computing (2)
40 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Swami Vivekananda Institute of Science &: Technology
No ratings yet
Swami Vivekananda Institute of Science &: Technology
8 pages
Parallel 123
No ratings yet
Parallel 123
28 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
28 pages
Flynn taxonomy
No ratings yet
Flynn taxonomy
4 pages
Coa-Unit - 5 Notes
No ratings yet
Coa-Unit - 5 Notes
38 pages
Parallel Programming
No ratings yet
Parallel Programming
12 pages
COA U5 PPT Full
No ratings yet
COA U5 PPT Full
43 pages
Lect6-SPC_ Flynns
No ratings yet
Lect6-SPC_ Flynns
16 pages
Lecture 2 General Parallelism Terms
No ratings yet
Lecture 2 General Parallelism Terms
22 pages
Quantum Computer Vs Traditional Computer
From Everand
Quantum Computer Vs Traditional Computer
Arief Muinnudin
No ratings yet
Introduction to Computing DSST Quick Prep Sheet
From Everand
Introduction to Computing DSST Quick Prep Sheet
Justin Orgeron
No ratings yet
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Computer Science I Essentials
From Everand
Computer Science I Essentials
Randall Raus
5/5 (7)
(PEER-TURNER) Influence of Kinematic SSI On Foundation Input Motions For Bridges On Deep Foundations
No ratings yet
(PEER-TURNER) Influence of Kinematic SSI On Foundation Input Motions For Bridges On Deep Foundations
226 pages
Transformational Generative Grammar
No ratings yet
Transformational Generative Grammar
9 pages
CN Lab 2
No ratings yet
CN Lab 2
5 pages
SIS Book - Chapter 08 - PFDavg With FTs
No ratings yet
SIS Book - Chapter 08 - PFDavg With FTs
34 pages
Mil STD 40051 1C
No ratings yet
Mil STD 40051 1C
560 pages
Siddhanta Shiromani CH 14 Translated
No ratings yet
Siddhanta Shiromani CH 14 Translated
3 pages
Fine Pitch FPC (Flexible Printed Circuit) Connectors - 0.25mm & 0.3mm Pitch
No ratings yet
Fine Pitch FPC (Flexible Printed Circuit) Connectors - 0.25mm & 0.3mm Pitch
4 pages
BLV R: Series Type
No ratings yet
BLV R: Series Type
474 pages
[Ebooks PDF] download (Ebook) Alhacen on image formation and distortion in mirrors. A Critical Edition, with English Translation and Commentary, of Book 6 of Alhacen’s De aspectibus. Volume One - Introduction and Latin Text, Volume Two - English Translation by Alhazen (lbn al-Haytham), 965-1039 ; A. Mark Smith (editor, translation, commentary) ISBN 9781606189818, 9781934846025, 9781606189863, 1934846023, 1606189816, 1606189867 full chapters
100% (1)
[Ebooks PDF] download (Ebook) Alhacen on image formation and distortion in mirrors. A Critical Edition, with English Translation and Commentary, of Book 6 of Alhacen’s De aspectibus. Volume One - Introduction and Latin Text, Volume Two - English Translation by Alhazen (lbn al-Haytham), 965-1039 ; A. Mark Smith (editor, translation, commentary) ISBN 9781606189818, 9781934846025, 9781606189863, 1934846023, 1606189816, 1606189867 full chapters
71 pages
2.6 Limits at Infinity
No ratings yet
2.6 Limits at Infinity
5 pages
ECE222 DP1 Binary Multiplier
No ratings yet
ECE222 DP1 Binary Multiplier
8 pages
Classification of Noun Phrase
No ratings yet
Classification of Noun Phrase
8 pages
BECL456A-MC Lab
No ratings yet
BECL456A-MC Lab
5 pages
HM81123
No ratings yet
HM81123
3 pages
4 QEM Process Capability
No ratings yet
4 QEM Process Capability
6 pages
Qiu Et Al., 2018
No ratings yet
Qiu Et Al., 2018
9 pages
Lab manual-ADWANCED WEB LAB MANUAL
No ratings yet
Lab manual-ADWANCED WEB LAB MANUAL
17 pages
USER CODE BEGIN Header
No ratings yet
USER CODE BEGIN Header
16 pages
Language Functions Grade 10
100% (1)
Language Functions Grade 10
3 pages
Dict 3
No ratings yet
Dict 3
2 pages
Datasheet Bianca 3 BTV Interapp
No ratings yet
Datasheet Bianca 3 BTV Interapp
8 pages
Catia Software Questions
No ratings yet
Catia Software Questions
51 pages
LRP - Assesment-Tool - Epas 11
No ratings yet
LRP - Assesment-Tool - Epas 11
4 pages
Exams and Sample Problems
No ratings yet
Exams and Sample Problems
24 pages
m20 cube
No ratings yet
m20 cube
1 page
Cowan Reines Experiment
No ratings yet
Cowan Reines Experiment
13 pages

Programação Paralela e Distribuída

Uploaded by

Programação Paralela e Distribuída

Uploaded by

Programao Paralela e Distribuda

Introduo Computao Paralela

Parallel Computing Architectures - Overview

Classification of Computer Architectures

Parallel computers: Control Flow

Parallel computers: Communication

NUMA and UMA Shared-Address-Space

Shared-Address-Space vs. Shared Memory

Distributed Shared Memory

Message Passing vs. Shared Address Space

Flynn-Johnson classification of computers

First: divide the processing into parts

Example: Eratosthenes Sieve

Sequential Algorithm (conted)

5 is the greatest prime number 30 = 5.5 => Terminates

How to solve it using Control Parallelism?

Solution using Control Parallelism

How to solve it using Data Parallelism?

Solution using Data Parallelism

Possible solution: Control Parallelism

Possible solution: Data Parallelism

Parallel vs Distributed Programming

You might also like