Programação Paralela e Distribuída
Programação Paralela e Distribuída
Flynns Taxonomy
SIMD Processors
Some of the earliest parallel computers such as the Illiac IV,
MPP, DAP, CM-2, and MasPar MP-1 belonged to this class
of machines.
Variants of this concept have found use in co-processing units,
such as the MMX units in Intel processors, and in DSP chips,
such as the Sharc, and more recently on GPUs.
SIMD relies on the regular structure of computations (such as
those in image processing).
It is often necessary to selectively turn off operations on certain
data items. For this reason, most SIMD programming
paradigms allow for an activity mask, which determines if a
processor should participate in a computation or not.
MIMD Processors
In contrast to SIMD processors, MIMD processors can
execute different programs on different processors.
A variant of this, called single program multiple data streams
(SPMD) executes the same program on different
processors.
It is easy to see that SPMD and MIMD are closely related in
terms of programming flexibility and underlying
architectural support.
Examples of such platforms include current generation Sun
Ultra Servers, SGI Origin Servers, multiprocessor PCs,
workstation clusters, and the IBM SP.
SPMD Model
(Single Program Multiple Data)
Each processor executes the same program
asynchronously.
They can execute different instructions within the same
program using instructions similar to:
if myNodeNum = 1 do this, else do that
Synchronization takes place only when processors need
to exchange data
SPMD is an extension of SIMD (relax synchronized
instruction execution) and a restriction of MIMD (use only
one source/object code)
SIMD-SPMD Comparison
In SPMD, multiple autonomous processors
simultaneously execute the same program, but at
independent points
In SIMD, processors execute the program in lockstep
(same instruction at the same time)
With SPMD, tasks can be executed on general
purpose CPUs
SIMD requires vector processors to manipulate data
streams.
SIMD-MIMD Comparison
SIMD computers require less hardware than MIMD
computers (single control unit).
However, since SIMD processors are specially designed,
they tend to be expensive and have long design cycles.
Not all applications are naturally suited to SIMD
processors.
In contrast, platforms supporting the SPMD paradigm can
be built from inexpensive off-the-shelf components with
relatively little effort in a short amount of time.
MPMD Model
(Multiple Program Multiple Data)
MPMD is the equivalent of having different
programs executing on different processors
(ex. Client/Server)
(This will be covered in the Distributed Programming
part of the course)
Shared-Address-Space Computers
Part (or all) of the memory is accessible to all processors.
Processors interact by modifying data objects stored in
this shared-address-space.
If the time taken by a processor to access any memory
word in the system (either global or local) is identical,
the platform is classified as a uniform memory access
(UMA), else, it is a non-uniform memory access
(NUMA) machine.
Shared Memory
One or more memories.
Global address space (all system memory visible to all
processors).
Transfer of data between processors is usually implicit, just
read (write) to (from) a given memory address (OpenMP).
Cache-coherency protocol to maintain consistency between
processors.
Distributed Memory
Each processor has access to its own memory only.
Data transfer between processors is explicit, user calls
message passing functions.
Common Libraries for message passing: MPI, PVM
User has complete control/responsibility for data placement and
management.
Hybrid Systems
Distributed memory system where each node is a
multiprocessor with shared memory.
Most common architecture for current generation of
parallel machines.
Message-Passing Computers
These platforms comprise a set of processors and their own
(exclusive) memory.
Instances of such a view come naturally from clustered
workstations and non-shared-address-space
multicomputers.
These platforms are programmed using (variants of) send
and receive primitives.
Libraries such as MPI and PVM provide such primitives.
Approaches to Parallelism
Approaches to Parallelism
Dividing the processing
Discovering the maximum possible
parallelism
Approaches
Data-centered: Data parallelism
Process-centered: Control parallelism
Approaches to Parallelism
Functional Decomposition: Control Parallelism
Approaches to Parallelism
Domain decomposition: Data parallelism
First: divide the data into parts
Second: determine how to associate
processing with data
Focusing on the largest and/or most
accessed data structure in the program
Approaches to Parallelism
Checklist for data parallelism:
The number of primitive tasks is at least an
order of magnitude greater than the number
of processors
Redundant processing and data structure
storage is minimized
Primitive tasks are all the same size
The number of tasks increases with the size of
the problem
Sequential Algorithm
Problem
Propose parallel solutions to calculate the
following vector-based expression:
k1*A + k2*B
where k1 and k2 are constants, and A and B
are arrays of size n.
Present two solutions: one that explores
control parallelism and another one that
explores data parallelism
Parallel Programming
Shared Memory:
Pthreads (task parallelism, SPMD, a few threads - SMT, Simultaneous
Multithreading)
OpenMP (data parallelism, SPMD) higher level of abstraction
CUDA/OpenCL(data/task parallelism,SPMD, massive multithreading). Exploits
data parallelism using a SIMD-like approach, without having to resort to
vector code. Instead, uses a SIMT (Single Instruction, Multiple Threads).
Distributed Memory:
MPI (data/task parallelism, SPMD)
MapReduce (data parallelism, SPMD) - (higher level abstractions). Data
parallelism with large chunks. SIMD where operation is a function and data
is a data partition.
Distributed Programming
Shared Memory
Pthreads (task parallelism, SPMD): different tasks (functions) of
a single application cooperate to improve performance (ex.
Spreadsheet: user interface + calculation + save backup etc)
Distributed Memory
Sockets (task parallelism, MPMD)
RPC and its higher level abstractions: Java RMI, CORBA (task
parallelism, MPMD)
Message-Oriented Middleware (JMS) (data/control driven)
Publish-Subscribe (DDS) (data-driven)
Tuple Spaces (data-driven)