Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
58 views

Introduction To Parallel and Distributed Programming

This document introduces several key concepts in parallel and distributed programming. It discusses supercomputers and high performance computing and how they allow problems to be solved faster through parallel processing. It also covers metrics to evaluate systems, Flynn's taxonomy of computer architectures, and the single-program multiple-data (SPMD) model. Additionally, it introduces concepts like pipelining, instruction-level parallelism, out-of-order execution, and memory hierarchies that improve processor performance through parallelism.

Uploaded by

juan magnolio
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

Introduction To Parallel and Distributed Programming

This document introduces several key concepts in parallel and distributed programming. It discusses supercomputers and high performance computing and how they allow problems to be solved faster through parallel processing. It also covers metrics to evaluate systems, Flynn's taxonomy of computer architectures, and the single-program multiple-data (SPMD) model. Additionally, it introduces concepts like pipelining, instruction-level parallelism, out-of-order execution, and memory hierarchies that improve processor performance through parallelism.

Uploaded by

juan magnolio
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

INTRODUCTION TO PARALLEL AND DISTRIBUTED PROGRAMMING

THEORY 1 - INTRODUCTION TO PARALLEL AND


DISTRIBUTED PROGRAMMING
SUPERCOMPUTER
It allows to solve problems which will take years to be solved in one computer, large
number of nodes are working concurrently. It basically has different parallel models.

HIGH PERFORMACE COMPUTING (HPC)


The acceleration is simulations, higher resolution is the discretization and the complex
physical models are some of the benefits of HPC. Scientific codes need to be adapted
to the computer architectures.

METRICS
Metrics are used to define a way to compare two different systems. Some of them
may be:
o Price.
o Power consumption.
o Bandwidth.
o Render time.

FLOPS
o FLOP (FLOP/s) → Is the number of floating operations performed per second.
o Rmax→ FLOPS determined by the benchmarck.
o Rpeak → Theoretical FLOPS.

FLYNN’S TAXONOMY
Flynn’s taxonomy classifies computer architecture based on the number of
concurrent instructions and data streams.
o SISD → Single Instruction Single Data stream.
o MISD → Multiple Instructions Single Data stream.
o SIMD → Single Instructions Multiple Data streams.
o MIMD → Multiple Instructions Multiple Data streams.

1
INTRODUCTION TO PARALLEL AND DISTRIBUTED PROGRAMMING

SPMD
Most message passing programs use the Single-Program-Multiple-Data (SPMD) model.
All processes run their own copy of the same program- each one of them has their
own copy of the data and to make it successful each process has a unique identifier.
Processes can follow different control paths through the program, depending on their
process ID. Usually rune on process per processor/core.

VON NEUMAN ARCHITECTURE


DEFINITION
Main memory that is collection of locations that are capable of storing both
instructions and data.
o Central processing unit (CPU) or processor or core.
• The control unit decides instructions in a program that should be executed.
• Arithmetic and logic unit (ALU) execute instructions.
• Register is a fast storage used by the CPU.
• Program counter is special register used by control unit to store the address
of the next instruction to be executed.
o Interconnection between memory and CPU to transfer data and instructions.

INSTRUCTION SET ARCHITECTURE (ISA)


ISA is an interface to talk with the CPU. Different architectures can have the same ISA,
and every similar architectures can have a different ISA. The best ISA depends on the
application.
o Complex and specific instructions: lower instruction count.
o Simple and general instructions: lower CPI.
o CISC → Complex Instruction Set Computing.

2
INTRODUCTION TO PARALLEL AND DISTRIBUTED PROGRAMMING

o RISC → Reduced Instruction Set Computing.

INSTRUCTION LEVEL PARALLELISM


From a processor’s perspective, a program is a sequence of instructions. Instruction-
level parallelism (ILP) attempts to improve processors performance by having multiple
processor components. ILP increase the IPC.
How can ILP fasten the following program:
o 𝑒 =𝑎+𝑏
o 𝑓 =𝑐+𝑑
o 𝑚 =𝑒∗𝑓
Operation 3 depends on 1 and 2. Operations 1 and 2 are independent computations,
therefore they can be executed at the time.
How much ILP exists in program is very application specific:
o Graphic and scientific computing the amount can be very large.
o Cryptography may exhibit much less parallelism.
Compiler and processor designers try to identify and take advantage of ILP. Most
available ILP is exploited by a processor capable of issuing for instructions per clock.
Little performance benefit from building a processing that can issue more.

5 – STAGE PIPELINE
o IF → Instruction fetch (Includes PC increment).
o ID → Instruction decode + fetching values from general purpose registers.
o EXE → Arithmetic/logic operation or address computation.
o MEM → Memory access or branch completion.
o WB → Write black results to general purpose registers.

PIPELINING VS NOT – PIPELINED


Pipelining → Functional units are arranged in stages. The stages will effectively run at
the speed of the slowest functional unit.

3
INTRODUCTION TO PARALLEL AND DISTRIBUTED PROGRAMMING

PIPELINING
Start instruction per clock cycle.

ILP AND SUPERSCALAR PROCESSOR


The term superscalar describes a computer implementation that improves
performance by concurrent execution of scalar instructions, more than one per cycle.
Scalar processor is a processor that execute one instruction at a time. Superscalar
allows concurrent execution of instruction in the same pipeline stage.
Start two instructions per clock cycle:

OUT OF ORDER PROCESSORS


Most modern processors nowadays are out-of-order. Out-of-order exploits the
parallelism an instruction that would be wasted otherwise. An instruction that is
defined to run later can run before. The architecture design becomes more complex,
but this is transparent to the programmer.

DATA DEPENDENCY
𝑋→𝑌

4
INTRODUCTION TO PARALLEL AND DISTRIBUTED PROGRAMMING

𝑌 must be executed after 𝑋 because 𝑌 uses data produced by 𝑋. Changing the order
modifies the functionality of the software.
Dependency chain is determined by the largest data dependency, which determines
the minimum time to execute a sequence of instructions.

COMPILER OPTIMISATION
Most compilers implement optimization techniques to improve the overall
performance of the code. There are several optimization flags of GCC:
o O0 → Don’t apply any optimization. Given the expected results.
o O1 → Apply some optimizations.
o O2 → Apply almost all optimizations. Will increase compiling time.
o O3 → Highest optimization level. Raises more compiling time.

Higher optimization levels implement also previous.

MEMORY HIERARCHY
Motivation:
o Large memories (DRAM) are slow.
o Small memories (SRAM) are fast.
Make the average access time by:
o Servicing most accesses from a small, fast memory.

Reduce the bandwidth required of the large memory.

MANAGEMENT
Registers and memory:
o Compiler → Controls which data is set on registers and which on memory.
Cache and memory:
o Cache → Hardware moves the data transparently between cache and
memory.
o Memory → Demand paging and memory allocation controlled by the kernel.
Memory allocation explicitly controlled by program.
Memory and disk:
o Virtual memory → Operating system and hardware.
o Files → Explicit control by the program.

REGISTER ASSIGNMENT
Register assignment improves the overall performance. The compiler assigns variables
to register of the architecture. The aim to reduce load and store operations.

5
INTRODUCTION TO PARALLEL AND DISTRIBUTED PROGRAMMING

LOCALITY
DEFINITION
The principle of Locality is that software tens to reuse data and instructions used
recently.
There are two types of locality:
o Temporal locality (Locality on time) → If an item is referenced, it will tend to be
referenced again soon.
o Spatial locality (Locality in space) → If an item is referenced, items whose
addresses are close tend to be referenced soon.
Locality is a property of software which is exploited by computers.

QUANTUM
Quantum computers are not general-purpose computers but are extremely powerful
on certain problems. It may be used as coprocessor to algorithms which can be
expressed as quantum algorithm.

You might also like