Introduction To Parallel and Distributed Programming
Introduction To Parallel and Distributed Programming
METRICS
Metrics are used to define a way to compare two different systems. Some of them
may be:
o Price.
o Power consumption.
o Bandwidth.
o Render time.
FLOPS
o FLOP (FLOP/s) → Is the number of floating operations performed per second.
o Rmax→ FLOPS determined by the benchmarck.
o Rpeak → Theoretical FLOPS.
FLYNN’S TAXONOMY
Flynn’s taxonomy classifies computer architecture based on the number of
concurrent instructions and data streams.
o SISD → Single Instruction Single Data stream.
o MISD → Multiple Instructions Single Data stream.
o SIMD → Single Instructions Multiple Data streams.
o MIMD → Multiple Instructions Multiple Data streams.
1
INTRODUCTION TO PARALLEL AND DISTRIBUTED PROGRAMMING
SPMD
Most message passing programs use the Single-Program-Multiple-Data (SPMD) model.
All processes run their own copy of the same program- each one of them has their
own copy of the data and to make it successful each process has a unique identifier.
Processes can follow different control paths through the program, depending on their
process ID. Usually rune on process per processor/core.
2
INTRODUCTION TO PARALLEL AND DISTRIBUTED PROGRAMMING
5 – STAGE PIPELINE
o IF → Instruction fetch (Includes PC increment).
o ID → Instruction decode + fetching values from general purpose registers.
o EXE → Arithmetic/logic operation or address computation.
o MEM → Memory access or branch completion.
o WB → Write black results to general purpose registers.
3
INTRODUCTION TO PARALLEL AND DISTRIBUTED PROGRAMMING
PIPELINING
Start instruction per clock cycle.
DATA DEPENDENCY
𝑋→𝑌
4
INTRODUCTION TO PARALLEL AND DISTRIBUTED PROGRAMMING
𝑌 must be executed after 𝑋 because 𝑌 uses data produced by 𝑋. Changing the order
modifies the functionality of the software.
Dependency chain is determined by the largest data dependency, which determines
the minimum time to execute a sequence of instructions.
COMPILER OPTIMISATION
Most compilers implement optimization techniques to improve the overall
performance of the code. There are several optimization flags of GCC:
o O0 → Don’t apply any optimization. Given the expected results.
o O1 → Apply some optimizations.
o O2 → Apply almost all optimizations. Will increase compiling time.
o O3 → Highest optimization level. Raises more compiling time.
MEMORY HIERARCHY
Motivation:
o Large memories (DRAM) are slow.
o Small memories (SRAM) are fast.
Make the average access time by:
o Servicing most accesses from a small, fast memory.
MANAGEMENT
Registers and memory:
o Compiler → Controls which data is set on registers and which on memory.
Cache and memory:
o Cache → Hardware moves the data transparently between cache and
memory.
o Memory → Demand paging and memory allocation controlled by the kernel.
Memory allocation explicitly controlled by program.
Memory and disk:
o Virtual memory → Operating system and hardware.
o Files → Explicit control by the program.
REGISTER ASSIGNMENT
Register assignment improves the overall performance. The compiler assigns variables
to register of the architecture. The aim to reduce load and store operations.
5
INTRODUCTION TO PARALLEL AND DISTRIBUTED PROGRAMMING
LOCALITY
DEFINITION
The principle of Locality is that software tens to reuse data and instructions used
recently.
There are two types of locality:
o Temporal locality (Locality on time) → If an item is referenced, it will tend to be
referenced again soon.
o Spatial locality (Locality in space) → If an item is referenced, items whose
addresses are close tend to be referenced soon.
Locality is a property of software which is exploited by computers.
QUANTUM
Quantum computers are not general-purpose computers but are extremely powerful
on certain problems. It may be used as coprocessor to algorithms which can be
expressed as quantum algorithm.