Introduction To Parallel and Distributed Programming

This document introduces several key concepts in parallel and distributed programming. It discusses supercomputers and high performance computing and how they allow problems to be solved faster through parallel processing. It also covers metrics to evaluate systems, Flynn's taxonomy of computer architectures, and the single-program multiple-data (SPMD) model. Additionally, it introduces concepts like pipelining, instruction-level parallelism, out-of-order execution, and memory hierarchies that improve processor performance through parallelism.

Uploaded by

juan magnolio

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views

Introduction To Parallel and Distributed Programming

Uploaded by

juan magnolio

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

INTRODUCTION TO PARALLEL AND DISTRIBUTED PROGRAMMING

THEORY 1 - INTRODUCTION TO PARALLEL AND

DISTRIBUTED PROGRAMMING
SUPERCOMPUTER
It allows to solve problems which will take years to be solved in one computer, large
number of nodes are working concurrently. It basically has different parallel models.

HIGH PERFORMACE COMPUTING (HPC)

The acceleration is simulations, higher resolution is the discretization and the complex
physical models are some of the benefits of HPC. Scientific codes need to be adapted
to the computer architectures.

METRICS
Metrics are used to define a way to compare two different systems. Some of them
may be:
o Price.
o Power consumption.
o Bandwidth.
o Render time.

FLOPS
o FLOP (FLOP/s) → Is the number of floating operations performed per second.
o Rmax→ FLOPS determined by the benchmarck.
o Rpeak → Theoretical FLOPS.

FLYNN’S TAXONOMY
Flynn’s taxonomy classifies computer architecture based on the number of
concurrent instructions and data streams.
o SISD → Single Instruction Single Data stream.
o MISD → Multiple Instructions Single Data stream.
o SIMD → Single Instructions Multiple Data streams.
o MIMD → Multiple Instructions Multiple Data streams.

1
INTRODUCTION TO PARALLEL AND DISTRIBUTED PROGRAMMING

SPMD
Most message passing programs use the Single-Program-Multiple-Data (SPMD) model.
All processes run their own copy of the same program- each one of them has their
own copy of the data and to make it successful each process has a unique identifier.
Processes can follow different control paths through the program, depending on their
process ID. Usually rune on process per processor/core.

VON NEUMAN ARCHITECTURE

DEFINITION
Main memory that is collection of locations that are capable of storing both
instructions and data.
o Central processing unit (CPU) or processor or core.
• The control unit decides instructions in a program that should be executed.
• Arithmetic and logic unit (ALU) execute instructions.
• Register is a fast storage used by the CPU.
• Program counter is special register used by control unit to store the address
of the next instruction to be executed.
o Interconnection between memory and CPU to transfer data and instructions.

INSTRUCTION SET ARCHITECTURE (ISA)

ISA is an interface to talk with the CPU. Different architectures can have the same ISA,
and every similar architectures can have a different ISA. The best ISA depends on the
application.
o Complex and specific instructions: lower instruction count.
o Simple and general instructions: lower CPI.
o CISC → Complex Instruction Set Computing.

2
INTRODUCTION TO PARALLEL AND DISTRIBUTED PROGRAMMING

o RISC → Reduced Instruction Set Computing.

INSTRUCTION LEVEL PARALLELISM

From a processor’s perspective, a program is a sequence of instructions. Instruction-
level parallelism (ILP) attempts to improve processors performance by having multiple
processor components. ILP increase the IPC.
How can ILP fasten the following program:
o 𝑒 =𝑎+𝑏
o 𝑓 =𝑐+𝑑
o 𝑚 =𝑒∗𝑓
Operation 3 depends on 1 and 2. Operations 1 and 2 are independent computations,
therefore they can be executed at the time.
How much ILP exists in program is very application specific:
o Graphic and scientific computing the amount can be very large.
o Cryptography may exhibit much less parallelism.
Compiler and processor designers try to identify and take advantage of ILP. Most
available ILP is exploited by a processor capable of issuing for instructions per clock.
Little performance benefit from building a processing that can issue more.

5 – STAGE PIPELINE
o IF → Instruction fetch (Includes PC increment).
o ID → Instruction decode + fetching values from general purpose registers.
o EXE → Arithmetic/logic operation or address computation.
o MEM → Memory access or branch completion.
o WB → Write black results to general purpose registers.

PIPELINING VS NOT – PIPELINED

Pipelining → Functional units are arranged in stages. The stages will effectively run at
the speed of the slowest functional unit.

3
INTRODUCTION TO PARALLEL AND DISTRIBUTED PROGRAMMING

PIPELINING
Start instruction per clock cycle.

ILP AND SUPERSCALAR PROCESSOR

The term superscalar describes a computer implementation that improves
performance by concurrent execution of scalar instructions, more than one per cycle.
Scalar processor is a processor that execute one instruction at a time. Superscalar
allows concurrent execution of instruction in the same pipeline stage.
Start two instructions per clock cycle:

OUT OF ORDER PROCESSORS

Most modern processors nowadays are out-of-order. Out-of-order exploits the
parallelism an instruction that would be wasted otherwise. An instruction that is
defined to run later can run before. The architecture design becomes more complex,
but this is transparent to the programmer.

DATA DEPENDENCY
𝑋→𝑌

4
INTRODUCTION TO PARALLEL AND DISTRIBUTED PROGRAMMING

𝑌 must be executed after 𝑋 because 𝑌 uses data produced by 𝑋. Changing the order
modifies the functionality of the software.
Dependency chain is determined by the largest data dependency, which determines
the minimum time to execute a sequence of instructions.

COMPILER OPTIMISATION
Most compilers implement optimization techniques to improve the overall
performance of the code. There are several optimization flags of GCC:
o O0 → Don’t apply any optimization. Given the expected results.
o O1 → Apply some optimizations.
o O2 → Apply almost all optimizations. Will increase compiling time.
o O3 → Highest optimization level. Raises more compiling time.

Higher optimization levels implement also previous.

MEMORY HIERARCHY
Motivation:
o Large memories (DRAM) are slow.
o Small memories (SRAM) are fast.
Make the average access time by:
o Servicing most accesses from a small, fast memory.

Reduce the bandwidth required of the large memory.

MANAGEMENT
Registers and memory:
o Compiler → Controls which data is set on registers and which on memory.
Cache and memory:
o Cache → Hardware moves the data transparently between cache and
memory.
o Memory → Demand paging and memory allocation controlled by the kernel.
Memory allocation explicitly controlled by program.
Memory and disk:
o Virtual memory → Operating system and hardware.
o Files → Explicit control by the program.

REGISTER ASSIGNMENT
Register assignment improves the overall performance. The compiler assigns variables
to register of the architecture. The aim to reduce load and store operations.

5
INTRODUCTION TO PARALLEL AND DISTRIBUTED PROGRAMMING

LOCALITY
DEFINITION
The principle of Locality is that software tens to reuse data and instructions used
recently.
There are two types of locality:
o Temporal locality (Locality on time) → If an item is referenced, it will tend to be
referenced again soon.
o Spatial locality (Locality in space) → If an item is referenced, items whose
addresses are close tend to be referenced soon.
Locality is a property of software which is exploited by computers.

QUANTUM
Quantum computers are not general-purpose computers but are extremely powerful
on certain problems. It may be used as coprocessor to algorithms which can be
expressed as quantum algorithm.

Ix It Practical File
100% (2)
Ix It Practical File
19 pages
PYE 422 Lecture Note
No ratings yet
PYE 422 Lecture Note
11 pages
Vijeo Designer Tutorial
No ratings yet
Vijeo Designer Tutorial
70 pages
Paper 7
No ratings yet
Paper 7
17 pages
HPC-Unit-2
No ratings yet
HPC-Unit-2
72 pages
CA 2mark and 16 Mark With Answer
100% (1)
CA 2mark and 16 Mark With Answer
109 pages
HPC-Unit-1
No ratings yet
HPC-Unit-1
65 pages
High Performance Computing-1 PDF
No ratings yet
High Performance Computing-1 PDF
15 pages
ERTS Unit 3
No ratings yet
ERTS Unit 3
38 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
19 pages
RISC Microprocessors
No ratings yet
RISC Microprocessors
63 pages
Dlca NOTES
No ratings yet
Dlca NOTES
5 pages
Notes 1
No ratings yet
Notes 1
62 pages
Computer System Organizations: Ms - Chit Su Mon
No ratings yet
Computer System Organizations: Ms - Chit Su Mon
74 pages
CA 2mark and 16 Mark With Answer
No ratings yet
CA 2mark and 16 Mark With Answer
112 pages
Parallelism
No ratings yet
Parallelism
22 pages
What Do You Mean by Code Optimization
No ratings yet
What Do You Mean by Code Optimization
3 pages
Lecture Parallel Computing
No ratings yet
Lecture Parallel Computing
6 pages
Unit Iii General Purpose Processor Software Development
No ratings yet
Unit Iii General Purpose Processor Software Development
11 pages
Prepared by Dasun Nilanjana For
No ratings yet
Prepared by Dasun Nilanjana For
24 pages
CA - 2marks
No ratings yet
CA - 2marks
80 pages
Computer Architecture
No ratings yet
Computer Architecture
29 pages
Architecture_Notes
No ratings yet
Architecture_Notes
13 pages
Introduction To Computing
No ratings yet
Introduction To Computing
6 pages
Aca Univ 2 Mark and 16 Mark
No ratings yet
Aca Univ 2 Mark and 16 Mark
20 pages
module 4 co
No ratings yet
module 4 co
17 pages
Tuning Programs With Oprofi Le
No ratings yet
Tuning Programs With Oprofi Le
10 pages
SOC
No ratings yet
SOC
71 pages
Concurrent and Parallel Programming Unit V-Notes Unit V Openmp, Opencl, Cilk++, Intel TBB, Cuda 5.1 Openmp
No ratings yet
Concurrent and Parallel Programming Unit V-Notes Unit V Openmp, Opencl, Cilk++, Intel TBB, Cuda 5.1 Openmp
10 pages
UNIT - IV
No ratings yet
UNIT - IV
40 pages
Unit 1
No ratings yet
Unit 1
68 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
2 pages
Ca Unit 4 Prabu
No ratings yet
Ca Unit 4 Prabu
24 pages
Be - Computer Engineering - Semester 4 - 2019 - December - Computer Organization and Architecture Cbcgs
No ratings yet
Be - Computer Engineering - Semester 4 - 2019 - December - Computer Organization and Architecture Cbcgs
23 pages
Analysis and Programming of Kernel For Embedded Systems
No ratings yet
Analysis and Programming of Kernel For Embedded Systems
13 pages
Parallel_computing
No ratings yet
Parallel_computing
32 pages
CA - 2marks
No ratings yet
CA - 2marks
73 pages
CA Unit-1
No ratings yet
CA Unit-1
25 pages
Potential Approaches To Parallel Computation of Rayleigh Integrals in Measuring Acoustic Pressure and Intensity
No ratings yet
Potential Approaches To Parallel Computation of Rayleigh Integrals in Measuring Acoustic Pressure and Intensity
5 pages
Cs1304-Computer Architecture Department of Cse & It
No ratings yet
Cs1304-Computer Architecture Department of Cse & It
105 pages
Co Notes Module 3
No ratings yet
Co Notes Module 3
17 pages
Computer System Organization: Processors
No ratings yet
Computer System Organization: Processors
21 pages
CH 0
No ratings yet
CH 0
138 pages
Part A
No ratings yet
Part A
3 pages
IV-unit-of-EC-604
No ratings yet
IV-unit-of-EC-604
10 pages
Compiler Design
No ratings yet
Compiler Design
20 pages
Ec 6009 - Advanced Computer Architecture 2 Marks
No ratings yet
Ec 6009 - Advanced Computer Architecture 2 Marks
8 pages
ddco mod3
No ratings yet
ddco mod3
32 pages
unit-1
No ratings yet
unit-1
11 pages
Computer Architecture
100% (1)
Computer Architecture
125 pages
4 - Performance Issues
No ratings yet
4 - Performance Issues
48 pages
Hpc_unit-1 Insem Notes
No ratings yet
Hpc_unit-1 Insem Notes
76 pages
Group 9 Risc
No ratings yet
Group 9 Risc
27 pages
Slot26 CH15 ReduceInstructionSetComputers 24 Slides
No ratings yet
Slot26 CH15 ReduceInstructionSetComputers 24 Slides
24 pages
Optimization of Computer Programs in C
No ratings yet
Optimization of Computer Programs in C
37 pages
Two Mark Questions and Answers
No ratings yet
Two Mark Questions and Answers
29 pages
Lecture 5
No ratings yet
Lecture 5
12 pages
General Question: 1. What Is Read Modify Write Technique?
No ratings yet
General Question: 1. What Is Read Modify Write Technique?
4 pages
Cs501 Notes (1)
No ratings yet
Cs501 Notes (1)
33 pages
CA1, Sem3 (Comp Org)
No ratings yet
CA1, Sem3 (Comp Org)
10 pages
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
Understanding SQL Subqueries by Example
No ratings yet
Understanding SQL Subqueries by Example
52 pages
19 SQL (Subquery)
No ratings yet
19 SQL (Subquery)
12 pages
CS-2500 ASTM Host Interface Specifications OUS 3.02 DXDCM 0900ac6d80fa36ab-1617211915033
No ratings yet
CS-2500 ASTM Host Interface Specifications OUS 3.02 DXDCM 0900ac6d80fa36ab-1617211915033
74 pages
Gowin UG1000 1.0E - GWU2X Datasheet
No ratings yet
Gowin UG1000 1.0E - GWU2X Datasheet
26 pages
Titan5 Brochure
No ratings yet
Titan5 Brochure
8 pages
The Adobe Illustrator®: Cheat Sheet
No ratings yet
The Adobe Illustrator®: Cheat Sheet
7 pages
Type Casting and Type Conversion
No ratings yet
Type Casting and Type Conversion
5 pages
Programming Assignment 2 - Updated
No ratings yet
Programming Assignment 2 - Updated
10 pages
Redgate
No ratings yet
Redgate
24 pages
Kaviya Data Breach
No ratings yet
Kaviya Data Breach
1 page
Social Fiction
No ratings yet
Social Fiction
10 pages
The Keyword Golden Ratio
No ratings yet
The Keyword Golden Ratio
6 pages
999
No ratings yet
999
128 pages
Chapter 1-3 THESIS
80% (49)
Chapter 1-3 THESIS
102 pages
Woodword Relays XR1E
No ratings yet
Woodword Relays XR1E
16 pages
gINT Lab Testing User Guide
100% (1)
gINT Lab Testing User Guide
68 pages
Chapter05 4e
No ratings yet
Chapter05 4e
40 pages
Module 6 Task 6 (M6 LA3)
No ratings yet
Module 6 Task 6 (M6 LA3)
1 page
About Quickxpert Infotech: - Sap Modules, Java, Dot Net, Software Testing, Web
No ratings yet
About Quickxpert Infotech: - Sap Modules, Java, Dot Net, Software Testing, Web
13 pages
1XI_S4HANA2021_Set-Up_EN_XX
No ratings yet
1XI_S4HANA2021_Set-Up_EN_XX
90 pages
IM Series 300 - 310 - 500 - 700 Data Sheet
No ratings yet
IM Series 300 - 310 - 500 - 700 Data Sheet
2 pages
S01 Introductory Session
No ratings yet
S01 Introductory Session
14 pages
Module 38: Understanding Siebel Event Models
No ratings yet
Module 38: Understanding Siebel Event Models
15 pages
Computer Graphics Multimedia Notes
No ratings yet
Computer Graphics Multimedia Notes
114 pages
Changelog PrinterSet
No ratings yet
Changelog PrinterSet
2 pages
KDL 32 Ex 500
No ratings yet
KDL 32 Ex 500
26 pages
M01 - Developing System Infrastructure Design Plan
0% (1)
M01 - Developing System Infrastructure Design Plan
44 pages
Domain 3 Questions
No ratings yet
Domain 3 Questions
16 pages