Computer Achitecture II - Parallel - Computing

This document provides an overview of an advanced computer architecture course focused on parallel computing. The key topics covered include: - Concepts and terminology related to parallel computing models, architectures, programming, and algorithms. - Flynn's taxonomy for classifying parallel computer architectures based on whether instructions and data are single or multiple. - Examples of parallel memory architectures and programming models. - Techniques for designing parallel programs and algorithm examples. The goal is to introduce students to parallel computing concepts and how to develop parallel programs and algorithms to solve complex problems.

Uploaded by

Abdullahi Zubairu Sokomba

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views

Computer Achitecture II - Parallel - Computing

Uploaded by

Abdullahi Zubairu Sokomba

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 46

Advance Computer

Architecture
Parallel Computing
CPE 713 Core 3 Units

Department of Computer Engineering

CPE 713
Course Lecturer
Dr. Eustace Dogo
Lecture Time: 11:30am
Location: CPE Board Room
Student Evaluation: CA=40%,
Exams=60%
Overview
Parallel Computing – what is it & its uses?
Concepts and Terminology
Parallel Computer Memory Architectures
Parallel Programming Models
Designing Parallel Programs
Parallel Algorithm Examples
Conclusion
Concepts and Terminology:
What is Parallel Computing?
Traditionally software has been written for
serial computation.
 To be run on a single computer having a single Central Processing Unit (CPU);
 A problem is broken into a discrete series of instructions.
 Instructions are executed one after another.
 Only one instruction may execute at any moment in time

Parallel computing is the simultaneous use

of multiple compute resources to solve a
computational problem.
 To be run using multiple CPUs
 A problem is broken into discrete parts that can be solved concurrently (“in parallel”)
 Each part is further broken down to a series of instructions
 Instructions from each part execute simultaneously on different CPUs
Traditional/Serial
Computing

Parallel Computing
The compute resources might be:
 A single computer with multiple processors;
 An arbitrary number of computers connected by a network;
 A combination of both.
The computational problem should be able to:
 Be broken apart into discrete pieces of work that can be solved
simultaneously;
 Execute multiple program instructions at any moment in time;
 Be solved in less time with multiple compute resources than with a
single compute resource.
The Real World is Massively Parallel
The Universe is Parallel:
Parallel computing is an evolution of serial computing that attempts to emulate
what has always been the state of affairs in the natural world: many complex,
interrelated events happening at the same time, yet within a temporal sequence.
Uses for Parallel Computing:
•Science and Engineering: To model difficult/complex problems in many areas
of science and engineering:
•Atmosphere, Earth, Environment •Geology, Seismology
•Physics - applied, nuclear, •Mechanical Engineering - from
particle, condensed matter, high prosthetics to spacecraft
pressure, fusion, photonics •Electrical Engineering, Circuit
•Bioscience, Biotechnology, Design, Microelectronics
Genetics •Computer Science, Mathematics
•Chemistry, Molecular Sciences

Industrial and Commercial: faster computers for processing of large

amounts of data in sophisticated ways.

•Databases, data mining •Financial and economic

•Oil exploration modeling
•Web search engines, web based •Management of national and
business services multi-national corporations
•Medical imaging and diagnosis •Advanced graphics and virtual
•Pharmaceutical design reality, particularly in the
entertainment industry
•Networked video and multi-
media technologies
•Collaborative work environments
Concepts and Terminology:
Why Use Parallel Computing?
Saves time – wall clock time
Potentially saving Cost
Overcoming memory constraints to able to
solve large/complex problems.
Provide concurrency
Limitations in serial computing (Computer
Architecture is increasing relying on hardware level parallelism Multi execution Units,
pipelined instruction, Multi-core

It’s the future of computing

Frequency scaling due to physical
constraints
Concepts and Terminology:
von Neumann Architecture
Named after Hungarian (general requirements for
electronic computer)Mathematician John von
Neumann in 1945
Virtually all computers have followed the basic
design. Parallel computers still follow this basic
design/Architecture, just multiple units
Comprise of four main components:
o Memory
o Control Unit
o Arithmetic Logic Unit
o Input/output

o Read/write, random access memory is used to store both program instructions and data
 Program instructions are coded data which tell the computer to do something
 Data is simply information to be used by the program
o Control unit fetches instructions/data from memory, decodes the instructions and then
sequentially coordinates operations to accomplish the programmed task.
o Aritmetic Unit performs basic arithmetic operations
o Input/Output is the interface to the human operator
Concepts and Terminology:
Flynn’s Classical Taxonomy (1966)
Distinguishes multi-processor architecture
by instruction and data. Each dimension
can only be Single or Multiple. There are 4
possible classifications
SISD – Single Instruction, Single Data
SIMD – Single Instruction, Multiple Data
MISD – Multiple Instruction, Single Data
MIMD – Multiple Instruction, Multiple Data
Flynn’s Classical Taxonomy:
SISD
Serial Computer
Only one instruction
and one data stream
is acted on during any
one clock cycle
Oldest and the most
common type of
computer even today,
eg PC
Flynn’s Classical Taxonomy:
SIMD
All processing units
execute the same
instruction at any given
clock cycle.
Each processing unit
operates on a different
data element.
A type of Parallel
Computer
Doing the same operation
repeatedly over a large
data set. This is
commonly done in
signal processing
Flynn’s Classical Taxonomy:
MISD
Different instructions operated on a single data
element.
Very few practical uses for this type of
classification & most rarely used. This is a type
of parallel computer
Example: Multiple cryptography algorithms attempting to crack a single coded
message.
multiple frequency filters operating on a single signal stream
Flynn’s Classical Taxonomy:
MIMD
Can execute different
instructions on different data
elements.
Most common type of
parallel computer.
Many MIMD architectures
also include SIMD execution
sub-components.
Examples: Supercomputers,
Multi-core PCs, Networked
parallel computer
clusters/grids
Examples of MIMD Architectures

IBM POWER5 HP/Compaq Alpha server INTEL IA32

AMD Opteron Cray XT3 IBM BG/L

Concepts and Terminology:
General Terminology
Task – A logically discrete section of
computational work
Parallel Task – Task that can be executed
by multiple processors safely
Communications – Data exchange
between parallel tasks
Synchronization – The coordination of
parallel tasks in real time
Concepts and Terminology:
More Terminology
Granularity – The ratio of computation to
communication
 Coarse – High computation, low communication
 Fine – Low computation, high communication
Parallel Overhead – Amount of time required to
coordinate parallel task, not useful work
 Synchronizations
 Data Communications
 Software overhead - Overhead imposed by
compilers, libraries, tools, operating systems, etc.
Concepts and Terminology:
More Terminology
Massively Parallel – Hardware that comprises a
give parallel system having many processors
numbering in the hundreds of thousands

Scalability – Hardware/Software increase in

speed with addition of more processors. Factor
include: Hardware (Memory, CPU, bandwidth &
network communication); Application algorithm;
parallel overhead; Characteristic of specific
application code
Concepts and Terminology:
More Terminology
Pipelining - Breaking a task into steps
performed by different processor units,
with inputs streaming through, much like
an assembly line; a type of parallel
computing.
Symmetric Multi-Processor (SMP) -
Hardware architecture where multiple
processors share a single address space
and access to all resources; shared
memory computing
Type of Parallelism
Bit-level Parallelism
speed-up in computer architecture was driven by doubling
computer word size – the amount of information the processor can
manipulate per cycle.
Increasing the word size reduces the number of instructions the
processor must execute to perform an operation on variables whose
sizes are greater than the length of the word. For example, where
an 8-bit processor must add two 16-bit integers, the processor must
first add the 8 lower-order bits from each integer using the standard
addition instruction, then add the 8 higher-order bits using an add-
with-carry instruction and the carry bit from the lower order addition;
thus, an 8-bit processor requires two instructions to complete a
single operation, where a 16-bit processor would be able to
complete the operation with a single instruction
4-bit microprocessors were replaced with 8-bit, then 16-bit, then 32-bit
microprocessors.
Instruction-level Parallelism
A computer program is, in essence, a stream of instructions executed
by a processor. These instructions can be re-ordered and combined
into groups which are then executed in parallel without changing the
result of the program. This is known as instruction-level parallelism

Data Parallelism
Data parallelism is parallelism inherent in program loops, which
focuses on distributing the data across different computing nodes to
be processed in parallel. "Parallelizing loops often leads to similar
(not necessarily identical) operation sequences or functions being
performed on elements of a large data structure
Task Parallelism
Task parallelism is the characteristic of a parallel program that entirely
different calculations can be performed on either the same or
different sets of data This contrasts with data parallelism, where the
same calculation is performed on the same or different sets of data.
Task parallelism does not usually scale with the size of a problem.
Parallel Computer Memory
Architectures:
Shared Memory (SM) Architecture
All processors access
all memory as a single
global address space.
Data sharing is fast.
Lack of scalability
between memory and Uniform Memory Access
(UMA)
CPUs
Shared Memory
machines: UMA or
NUMA based on
memory access time
Non-Uniform Memory Access (NUMA)
Parallel Computer Memory
Architectures:
Distributed Memory
Each processor has
its own memory.
Is scalable, no
overhead for cache
coherency.
Programmer is
responsible for many
details of
communication
between processors
& synchronization
between tasks.
Parallel Computer Memory
Architectures
Uniform Memory Access (UMA) is sometime
called CC-UMA - Cache Coherent UMA. Cache
coherent means if one processor updates a
location in shared memory, all the other
processors know about the update. Cache
coherency is accomplished at the hardware
level.
Likewise, If cache coherency is maintained in
Non Uniform Memory Access(NUMA), then may
also be called CC-NUMA - Cache Coherent
NUMA
Parallel Programming Models
Exist as an abstraction above hardware
and memory architectures
Examples:
 Shared Memory
 Threads
 Messaging Passing – Distributed memory
model on a shared memory machine
 Data Parallel
Parallel Programming Models:
Shared Memory Model
Appears to the user as a single shared
memory, despite hardware
implementations.
Mechanism such as Locks and
semaphores may be used to control
shared memory access.
Program development can be simplified
since there is no need to explicitly specify
communication between tasks.
Parallel Programming Models:
Threads Model
A single process may have multiple,
concurrent execution paths.
Typically used with a shared memory
architecture.
Programmer is responsible for determining
all parallelism.
A thread's work may best be described as
a subroutine within the main program. Any
thread can execute any subroutine at the
same time as other threads.
Parallel Programming Models:
Message Passing Model
Tasks exchange data by sending and receiving
messages.
Typically used with distributed memory architectures.
Data transfer requires cooperative operations to be
performed by each process. Example - a send operation
must have a receive operation.
MPI (Message Passing Interface) is the interface
standard for message passing.
Parallel Programming Models:
Data Parallel Model
Tasks performing the same operations on a set
of data. Each task working on a separate piece
of the set.
Works well with either shared memory or
distributed memory architectures.
Designing Parallel Programs:
Automatic Parallelization
Automatic
 Compiler analyzes code and identifies
opportunities for parallelism
 Analysis includes attempting to compute
whether or not the parallelism actually
improves performance.
 Loops are the most frequent target for
automatic parallelism.
Designing Parallel Programs:
Manual Parallelization
Understand the problem
 A Parallelizable Problem:
Calculate the potential energy for each of several
thousand independent conformations of a
molecule. When done find the minimum energy
conformation.
 A Non-Parallelizable Problem:
The Fibonacci Series
 All calculations are dependent
Designing Parallel Programs:
Domain Decomposition
Each task handles a portion of the data set.
Designing Parallel Programs:
Functional Decomposition
Each task performs a function of the overall work
Designing Parallel Programs:
Limits & Costs of parallel programming (Algorithm)

Amdahl’s Law states that potential program

speedup is defined by the fraction of code (P)
that can be parallelized
1
speedup = --------
1 -P
If P = 0 Speedup = 1 (no speedup
If P = 1 Speedup = ∞ (Infinite)
If P = 50% Speedup = 2x
Designing Parallel Programs:
Limits & Costs of parallel programming (Algorithm)
Designing Parallel Programs:
Limits & Costs of parallel programming (Algorithm)
Introducing the number of processors performing the parallel
fraction of work, the relationship can be modeled by Gustafson’s
law:
1
Speedup = ----------------
P+S
---
N
where P = parallel fraction, N = number of processors and S = serial
fraction.
It soon become obvious that there are limits to scalability of parallelism,
as shown by the graph in next slide
Both Amdahl’s & Gustafson’s law assume that the running time of
sequential portion of the program is independent of the number of
processors.
Designing Parallel Programs:
Limits & Costs of parallel programming (Algorithm)
Designing Parallel Programs:
Limits & Costs of parallel programming (Algorithm)

Amdahl's law assumes that the entire

problem is of fixed size so that the total
amount of work to be done in parallel is
also independent of the number of
processors, whereas Gustafson's law
assumes that the total amount of work to
be done in parallel varies linearly with the
number of processors.
Parallel Algorithm Examples:
Array Processing
Serial Solution
 Perform a function on a 2D array.
 Single processor iterates through each
element in the array
Possible Parallel Solution
 Assign each processor a partition of the array.
 Each process iterates through its own
partition.
Parallel Algorithm Examples:
Odd-Even Transposition Sort
Basic idea is bubble sort, but concurrently
comparing odd indexed elements with an
adjacent element, then even indexed
elements.
If there are n elements in an array and
there are n/2 processors. The algorithm is
effectively O(n)!
Parallel Algorithm Examples:
Odd Even Transposition Sort
Initial array: Worst case scenario.
 6, 5, 4, 3, 2, 1, 0
6, 4, 5, 2, 3, 0, 1 Phase 1
4, 6, 2, 5, 0, 3, 1 Phase 2
4, 2, 6, 0, 5, 1, 3 Phase 1
2, 4, 0, 6, 1, 5, 3 Phase 2
2, 0, 4, 1, 6, 3, 5 Phase 1
0, 2, 1, 4, 3, 6, 5 Phase 2
0, 1, 2, 3, 4, 5, 6 Phase 1
Other Parallelizable Problems
The n-body problem
Floyd’s Algorithm
 Serial: O(n^3), Parallel: O(n log p)
Game Trees
Divide and Conquer Algorithms
Conclusion
Parallel computing is fast.
There are many different approaches and
models of parallel computing.
Parallel computing is the future of
computing.
References
A Library of Parallel Algorithms,
www-2.cs.cmu.edu/~scandal/nesl/algorithms.html

Internet Parallel Computing Archive, wotug.ukc.ac.uk/parallel

Parallel Programming in C with MPI and OpenMP, Michael J. Quinn,

McGraw Hill Higher Education, 2003

The New Turing Omnibus, A. K. Dewdney, Henry Holt and

Company, 1993

Parallel Processing
No ratings yet
Parallel Processing
35 pages
Joomla Component Development PDF
100% (6)
Joomla Component Development PDF
58 pages
SQLite With Xamarin Forms Step by Step Guide
No ratings yet
SQLite With Xamarin Forms Step by Step Guide
14 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Parallel Computing Main
No ratings yet
Parallel Computing Main
47 pages
Lecture Week - 2 General Parallelism Terms
No ratings yet
Lecture Week - 2 General Parallelism Terms
24 pages
Parallel Computing Terminology
No ratings yet
Parallel Computing Terminology
11 pages
01 Intro Parallel Computing
No ratings yet
01 Intro Parallel Computing
40 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
Cloud Computing
No ratings yet
Cloud Computing
27 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
28 pages
KCS 713 Unit 1 Lecture 5
No ratings yet
KCS 713 Unit 1 Lecture 5
32 pages
I Notes
No ratings yet
I Notes
27 pages
Lec1 Introduction to Parallel Computing (2)
No ratings yet
Lec1 Introduction to Parallel Computing (2)
40 pages
Lecture Parallel Computing
No ratings yet
Lecture Parallel Computing
6 pages
CC UNIT-1 Material
No ratings yet
CC UNIT-1 Material
26 pages
Introduction To Computing
No ratings yet
Introduction To Computing
6 pages
SISd
No ratings yet
SISd
17 pages
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
No ratings yet
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
22 pages
Basics of Parallel Programming: Unit-1
No ratings yet
Basics of Parallel Programming: Unit-1
79 pages
Parallel Processing
No ratings yet
Parallel Processing
31 pages
Model
No ratings yet
Model
14 pages
Parallel Processor Computing Unit 1
No ratings yet
Parallel Processor Computing Unit 1
10 pages
Parallel Computing
No ratings yet
Parallel Computing
34 pages
Lecture 4
No ratings yet
Lecture 4
27 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
30 pages
1 Introduction To Parallel Computing
No ratings yet
1 Introduction To Parallel Computing
58 pages
CA Unit IV Notes Part 1 PDF
No ratings yet
CA Unit IV Notes Part 1 PDF
17 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
72 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
72 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
30 pages
Chapter 6 Advanced Topics
No ratings yet
Chapter 6 Advanced Topics
14 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
91 pages
Review of LSS CSC
No ratings yet
Review of LSS CSC
21 pages
10 Parallel Computing
No ratings yet
10 Parallel Computing
15 pages
Chapter 1 - Parallel Architectures
No ratings yet
Chapter 1 - Parallel Architectures
60 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
15 pages
Lecture 2 General Parallelism Terms
No ratings yet
Lecture 2 General Parallelism Terms
22 pages
Parallel Computing (Unit5)
No ratings yet
Parallel Computing (Unit5)
25 pages
Advanced Computer Architecture: Parallel Computer Models 1.1 The State of Computing
100% (1)
Advanced Computer Architecture: Parallel Computer Models 1.1 The State of Computing
46 pages
Parallel_computing
No ratings yet
Parallel_computing
32 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
90 pages
Unit 5
No ratings yet
Unit 5
66 pages
Introduction To Parallel Computing
0% (1)
Introduction To Parallel Computing
34 pages
Lecture 2 General Parallelism Terms
No ratings yet
Lecture 2 General Parallelism Terms
22 pages
Parallel Computing
100% (1)
Parallel Computing
53 pages
1. Parallel and Distributed Computing-1
No ratings yet
1. Parallel and Distributed Computing-1
17 pages
Week1-Parallel-and-Distributed-Computing
No ratings yet
Week1-Parallel-and-Distributed-Computing
55 pages
Module-1: Chapter-1 Parallel Computer Models
No ratings yet
Module-1: Chapter-1 Parallel Computer Models
42 pages
Module 4- Architecture
No ratings yet
Module 4- Architecture
22 pages
Parallel Processing
No ratings yet
Parallel Processing
22 pages
Synopsis On "Massive Parallel Processing (MPP) "
No ratings yet
Synopsis On "Massive Parallel Processing (MPP) "
4 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
Advanced Computer Architecture Slides
No ratings yet
Advanced Computer Architecture Slides
105 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
No ratings yet
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
43 pages
Introduction To Parallel Computing-Dr Nousheen
No ratings yet
Introduction To Parallel Computing-Dr Nousheen
43 pages
ACA Notes TechJourney PDF
No ratings yet
ACA Notes TechJourney PDF
206 pages
Module 1 ACA
86% (7)
Module 1 ACA
42 pages
Quantum Computer Vs Traditional Computer
From Everand
Quantum Computer Vs Traditional Computer
Arief Muinnudin
No ratings yet
Learn Computer Science
From Everand
Learn Computer Science
Knowledge Flow
No ratings yet
CS504 Mcqs MidTerm by Vu Topper RM
No ratings yet
CS504 Mcqs MidTerm by Vu Topper RM
63 pages
PhotonNetwork Documentation
No ratings yet
PhotonNetwork Documentation
135 pages
PustakHub Main Report Using Mernstack
No ratings yet
PustakHub Main Report Using Mernstack
39 pages
Instruction Elegantflyer
No ratings yet
Instruction Elegantflyer
2 pages
Project Proposal
No ratings yet
Project Proposal
3 pages
Learn C++ by Example: Covers Versions 11 to 23 (Final Release) 1 / converted Edition Frances Buontempo download pdf
100% (2)
Learn C++ by Example: Covers Versions 11 to 23 (Final Release) 1 / converted Edition Frances Buontempo download pdf
65 pages
Ooad Unit 2
33% (3)
Ooad Unit 2
26 pages
Abap News For 740 Sp08 Open SQL
No ratings yet
Abap News For 740 Sp08 Open SQL
12 pages
Interview Guide
No ratings yet
Interview Guide
289 pages
Career Summary:: AWS Cloud Architect
No ratings yet
Career Summary:: AWS Cloud Architect
5 pages
16 Questions For Practice On Scheduling
No ratings yet
16 Questions For Practice On Scheduling
20 pages
Facebook Login Helper
No ratings yet
Facebook Login Helper
24 pages
BC-3000plus DMS Communication Guide
No ratings yet
BC-3000plus DMS Communication Guide
14 pages
Final Quiz 2 - Attempt Review
No ratings yet
Final Quiz 2 - Attempt Review
4 pages
Sr. Cloud DevOps and Linux Engineer_Resume (1)
No ratings yet
Sr. Cloud DevOps and Linux Engineer_Resume (1)
7 pages
C Language by Mahesh
No ratings yet
C Language by Mahesh
93 pages
Opp Lab Assignment 5 and 6
No ratings yet
Opp Lab Assignment 5 and 6
10 pages
Laravel Resume
No ratings yet
Laravel Resume
16 pages
Sample Qbank Programming With JavaScript - 0
No ratings yet
Sample Qbank Programming With JavaScript - 0
6 pages
7 Atomic Software Component
No ratings yet
7 Atomic Software Component
13 pages
ujjwal resume
No ratings yet
ujjwal resume
1 page
Core Java
No ratings yet
Core Java
7 pages
R20CSE2207 Software Engineering
No ratings yet
R20CSE2207 Software Engineering
143 pages
SWE2008 - Android Programming: Dr. J. Christy Jackson
No ratings yet
SWE2008 - Android Programming: Dr. J. Christy Jackson
87 pages
Operatingsystem 230418003436 6780fbfe
No ratings yet
Operatingsystem 230418003436 6780fbfe
184 pages
Analysis of Efficiency of Automated Software Testing Methods Direction of Research
No ratings yet
Analysis of Efficiency of Automated Software Testing Methods Direction of Research
5 pages
Cypress in A Nutshell
No ratings yet
Cypress in A Nutshell
19 pages
C Language Material
No ratings yet
C Language Material
205 pages