Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
113 views

Parallel Computing Unit 2 - Parallel Computing Architecture

This document discusses parallel computing architectures and Flynn's taxonomy. It begins by stating the objectives of explaining Flynn's taxonomy, symmetric multiprocessors, and types of MIMD architectures. It then provides details on Flynn's taxonomy, which classifies computers based on whether their instructions and data are single or multiple. The main categories are SISD, SIMD, MISD, and MIMD. It proceeds to define each category, providing examples for SISD, SIMD and an overview of MIMD architectures.

Uploaded by

Harveen Velan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views

Parallel Computing Unit 2 - Parallel Computing Architecture

This document discusses parallel computing architectures and Flynn's taxonomy. It begins by stating the objectives of explaining Flynn's taxonomy, symmetric multiprocessors, and types of MIMD architectures. It then provides details on Flynn's taxonomy, which classifies computers based on whether their instructions and data are single or multiple. The main categories are SISD, SIMD, MISD, and MIMD. It proceeds to define each category, providing examples for SISD, SIMD and an overview of MIMD architectures.

Uploaded by

Harveen Velan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Unit 2 – Parallel Computing

Architecture
Parallel Computing

2022 Parallel Computing | UNITEN 1


Objectives
• To differentiate the categories in Flynn’s Taxonomy
• To explain the concept of symmetric multiprocessor
• To differentiate the types of MIMD in Shared and Distributed
Memories

2022 Parallel Computing | UNITEN 2


Flynn's Classical Taxonomy
• There are different ways to classify parallel computers. One of the
more widely used classifications, in use since 1966, is called Flynn's
Taxonomy.
• Flynn's taxonomy distinguishes multi-processor computer
architectures according to how they can be classified along the two
independent dimensions of Instruction and Data. Each of these
dimensions can have only one of two possible states: Single or
Multiple.

2022 Parallel Computing | UNITEN 3


Flynn Taxonomy - Categories

SINGLE DATA MULTIPLE DATA

SISD SIMD
SINGLE
Single Instruction Stream Single Instruction Stream
INSTRUCTION
Single Data stream Multiple Data stream

MISD MIMD
MULTIPLE
Multiple Instruction Stream Multiple Instruction Stream
INSTRUCTION
Single Data stream Multiple Data stream

2022 Parallel Computing | UNITEN 4


Taxonomy of Parallel Processor Architectures
Processor
Organization

SISD SIMD MISD MIMD

Vector Array Shared Distributed


Uniprocessor
processor processor memory memory

SMP NUMA Clusters

2022 Parallel Computing | UNITEN 5


2022 Parallel Computing | UNITEN 6
Single Instruction, Single
Data (SISD)

2022 Parallel Computing | UNITEN 7


Flynn Taxonomy - Categories
SINGLE DATA MULTIPLE DATA • Uniprocessor
SINGLE
SISD SIMD • Single processor
Single Instruction Stream Single Instruction Stream
INSTRUCTION
Single Data stream Multiple Data stream
• Single instruction
MISD MIMD
MULTIPLE
INSTRUCTION
Multiple Instruction
Stream
Multiple Instruction
Stream
• Single memory
Single Data stream Multiple Data stream

2022 Parallel Computing | UNITEN 8


Single Instruction,
Single Data (SISD)
• A serial (non-parallel) computer
• Single instruction: only one instruction
stream is being acted on by the CPU during
any one clock cycle
• Single data: only one data stream is being
used as input during any one clock cycle
• Deterministic execution
• This is the oldest and until recently, the
most prevalent form of computer
• Examples: most PCs, single CPU
workstations and mainframes

2022 Parallel Computing | UNITEN 9


Single Instruction,
Multiple Data (SIMD)

2022 Parallel Computing | UNITEN 10


Flynn Taxonomy - Categories
SINGLE DATA MULTIPLE DATA • Single instruction
SISD SIMD
control simultaneous
SINGLE
INSTRUCTION
Single Instruction Stream
Single Data stream
Single Instruction Stream
Multiple Data stream
execution of
MISD MIMD
Processing Elements
MULTIPLE
INSTRUCTION
Multiple Instruction
Stream
Multiple Instruction
Stream
(PE)
• Different PE different
Single Data stream Multiple Data stream

data
• Vector and array
processor

2022 Parallel Computing | UNITEN 11


Single Instruction, Multiple Data (SIMD)
• A type of parallel computer
• Single instruction: All processing units execute the same instruction at any given clock cycle
• Multiple data: Each processing unit can operate on a different data element
• This type of machine typically has an instruction dispatcher, a very high-bandwidth internal
network, and a very large array of very small-capacity instruction units.
• Best suited for specialized problems characterized by a high degree of regularity, such as
image processing.
• Synchronous (lockstep) and deterministic execution
• Two varieties: Processor Arrays and Vector Pipelines
• Examples:
• Processor Arrays: Connection Machine CM-2, Maspar MP-1, MP-2
• Vector Pipelines: IBM 9000, Cray C90, Fujitsu VP, NEC SX-2, Hitachi S820

2022 Parallel Computing | UNITEN 12


SIMD : Single Instruction Multiple Data Stream
2022 Parallel Computing | UNITEN 13
Multiple Instruction,
Single Data (MISD)

2022 Parallel Computing | UNITEN 14


Flynn Taxonomy - Categories
SINGLE DATA MULTIPLE DATA • Same data different
SISD SIMD
processors
SINGLE
• Different instruction
Single Instruction Stream Single Instruction Stream
INSTRUCTION
Single Data stream Multiple Data stream

MISD MIMD
MULTIPLE Multiple Instruction Multiple Instruction
INSTRUCTION Stream Stream
Single Data stream Multiple Data stream

2022 Parallel Computing | UNITEN 15


Multiple Instruction, Single Data (MISD)
• A single data stream is fed into multiple processing units.
• Each processing unit operates on the data independently via
independent instruction streams.
• Few actual examples of this class of parallel computer have ever existed.
One is the experimental Carnegie-Mellon C.mmp computer (1971).
• Some conceivable uses might be:
• multiple frequency filters operating on a single signal stream
• multiple cryptography algorithms attempting to crack a single coded
message.

2022 Parallel Computing | UNITEN 16


2022 Parallel Computing | UNITEN 17
Multiple Instruction,
Multiple Data (MIMD)

2022 Parallel Computing | UNITEN 18


Flynn Taxonomy - Categories
SINGLE DATA MULTIPLE DATA • Different instruction
SINGLE
SISD SIMD • Different data
Single Instruction Stream Single Instruction Stream
INSTRUCTION
Single Data stream Multiple Data stream
• Different processors
MISD MIMD
MULTIPLE
INSTRUCTION
Multiple Instruction
Stream
Multiple Instruction
Stream
• Apply in:
Single Data stream Multiple Data stream
• Symmetric
multiprocessor (SMP)
• Nonuniform Memory
access (NUMA)
• Clusters

2022 Parallel Computing | UNITEN 19


2022 Parallel Computing | UNITEN 20
Multiple Instruction,
Multiple Data (MIMD)
• Currently, the most common type of parallel
computer. Most modern computers fall into this
category.
• Multiple Instruction: every processor may be
executing a different instruction stream
• Multiple Data: every processor may be working
with a different data stream
• Execution can be synchronous or asynchronous,
deterministic or non-deterministic
• Examples: most current supercomputers,
networked parallel computer "grids" and multi-
processor SMP computers - including some types of
PCs.

Parallel Computing | UNITEN 21


2022
Shared Memory
Processor
Organization

SISD SIMD MISD MIMD

Vector Array Shared Distributed


Uniprocessor
processor processor memory memory

SMP NUMA Clusters

2022 Parallel Computing | UNITEN 22


MIMD: Shared Memory
Symmetric Multiprocessor (SMP)
Nonuniform Memory Access (NUMA)

2022 Parallel Computing | UNITEN 23


Symmetric multiprocessor (SMP) -
Characteristics
• Two or more similar processors of comparable capability (identical processors)
• Uniformed Memory Access (UMA)
• All processors
• Share connection to memory and I/O devices
• Share access to I/O devices
• Perform same functions
• Equal access and equal access time to memory
• System is controlled by integrated OS
• interaction between processors and job, task, file, data
• Cache coherent
• one processor update the memory all processors are aware of the update.

2022 Parallel Computing | UNITEN 24


SMP - Advantages
• Performance
• enable parallel processing
• Availability
• failure of a single processor does not halt the machine
• Flexibility & Incremental Growth
• enhance performance by adding an additional processor
• Scaling
• wide range of products with different price and performance characteristics –
number of processors

2022 Parallel Computing | UNITEN 25


SMP - Organization

Processor Processor … Processor


L1 cache L1 cache L1 cache

L2 cache L2 cache L2 cache

Shared Bus

Main
Memory I/O Subsystem

2022 Parallel Computing | UNITEN 26


SMP – Key design issues
• Simultaneous concurrent processes – OS allow processors to enable
parallel execution
• Scheduling – avoid conflicts between processes
• Synchronization – Enforce mutual exclusion and event ordering.
Potential access to the same memory location.
• Memory management – paging
• Reliability and fault tolerance – OS handle degradation caused by
processor failure

2022 Parallel Computing | UNITEN 27


MIMD: Shared Memory
Symmetric Multiprocessor (SMP)
Nonuniform Memory Access (NUMA)

2022 Parallel Computing | UNITEN 28


NUMA – Non-uniform Memory Access
• Often made by physically linking two or more SMPs
• All processors – access all parts of main memory
• Access time - depends on location

2022 Parallel Computing | UNITEN 29


NUMA – Characteristics
• Maintain transparent system wide memory – multiprocessor nodes
• One SMP can directly access memory of another SMP
• Not all processors have equal access time to all memories
• If cache coherency is maintained, then may also be called CC-NUMA -
Cache Coherent NUMA
• Cache-coherent NUMA (CC-NUMA)
• Cache coherence is maintained among the caches (different processors)

2022 Parallel Computing | UNITEN 30


CC-NUMA – Organization of Fetch
Start
Operation
Processor initiates
memory address

Flowchart Location
Memory Found in
No L2 cache initiates found in Noo Noo
location in remote
fetch operation from local
current main
main memory main
cache found memory
memory

Yes Yes Yes


Fetch across
Send desired memory Fetch across
interconnection
address to processor local bus
network

End

2022 Parallel Computing | UNITEN 31


Processor 1-1 … Processor 1-m Processor 2-1 … Processor 2-m
L1 cache L1 cache L1 cache L1 cache

L2 cache L2 cache L2 cache L2 cache


Local Bus Local Bus

Main Main
I/O Subsystem I/O Subsystem
Memory 1 Memory 2

Interconnect
network
Processor N-1 … Processor N-M
L1 cache L1 cache

L2 cache L2 cache
Local Bus

Main
I/O Subsystem
Memory N

CC-NUMA
Organization
2022 Parallel Computing | UNITEN 32
NUMA - Design issue

issues solution
• Overloaded memory access to • Use L1 and L2 – minimize
remote nodes memory access
• Additional nodes mean • Software – good spatial locality
additional traffic (virtual memory)
• OS page migration

2022 Parallel Computing | UNITEN 33


NUMA – Pros and Cons

pros cons
• Performance – no major • Not as transparent as SMP –
software changes software changes
• Availability

2022 Parallel Computing | UNITEN 34


MIMD: Distributed
Memory
Cluster

2022 Parallel Computing | UNITEN 35


Distributed Memory
Processor
Organization

SISD SIMD MISD MIMD

Vector Array Shared Distributed


Uniprocessor
processor processor memory memory

SMP NUMA Clusters

2022 Parallel Computing | UNITEN 36


A group of interconnected
computer working together as a
unified computing resource

Distributed Approach to provide high


Memory: performance and high
availability
Cluster
Node – each computer in a
cluster

2022 Parallel Computing | UNITEN 37


Distributed Memory: Cluster
• Processors - own local memory –
not share
• Operates independently
• CPU needs access to data in
another processor – define by
programmer

2022 Parallel Computing | UNITEN 38


Distributed Memory: Advantage
• Memory is scalable with the number of processors
• Absolute scalability – better than a standalone machine
• Incremental scalability – possible to add new node
• High availability – failure in one node will not impact the whole
system
• Superior price/performance – add equal or greater computing power
(lower cost)
• Each processor can rapidly access its own memory without
interference

2022 Parallel Computing | UNITEN 39


Distributed Memory: Disadvantage
• Increase programmer’s task – arrange the data
• It may be difficult to map existing data structures - based on global
memory
• Non-uniform memory access times - data residing on a remote node
takes longer to access than node local data

2022 Parallel Computing | UNITEN 40


Standby server with no shared
disk

Cluster
Configurations
Shared disk

2022 Parallel Computing | UNITEN 41


Cluster Configuration

Standby server with no shared disk Shared disk


• Two-node interconnected • Two-node interconnected
• Message exchange - high-speed • Message exchange - high-speed
link link
• LAN • LAN
• WAN • WAN
• Disk subsystem
• RAID system

2022 Parallel Computing | UNITEN 42


Standby server with no shared disk

Processor Processor Processor Processor


… …

Main Main
I/O Subsystem I/O Subsystem
Memory Memory

High-speed message link

Shared Disk
Processor Processor Processor Processor
… …
RAID
Main Main
I/O Subsystem I/O Subsystem
Memory Memory

High-speed message link

2022 Parallel Computing | UNITEN 43


Clustering Method
• Passive Standby
• One computer handle all of the processing load. Only in the event of failure
the other computer in the cluster will take over.
• Primary machine sends “heartbeat” message to standby machines.
• Increases availability but not performance
• Active Secondary
• Separate Servers
• All actively doing processing
• Servers Connected to disks
• Servers Share Disks

2022 Parallel Computing | UNITEN 44


Clusters compared to SMP
Characteristic Clusters SMP
Number of Multiple processors Multiple processors
processors
Configuration Need to deal with different Easier to manage and
computer range and configure
specifications
Physical space More space to allocated each Located on computer
nodes motherboard
Power consumption Power required by each node Power required by each
processor
Stability Not stable – depends on Stable and established
connection and node’s
availability
Scalability Big and expandable Small scale/single
computing

2022 Parallel Computing | UNITEN 45


Hybrid Distributed-
Shared Memory

2022 Parallel Computing | UNITEN 46


Hybrid
Distributed-
Shared Memory
• Apply shared and distributed
memory architectures
• Shared memory component - a
shared memory machine and/or
graphics processing units (GPU)
• Distributed memory component -
networking of multiple shared
memory/GPU machines – only
knows their own memory
• Network communications –
data transfer

2022 Parallel Computing | UNITEN 47


Reference
1. https://computing.llnl.gov/tutorials/parallel_comp/
2. Roosta, S.H., Parallel Processing and Parallel Algorithms: Theory and Computation. 2000: Springer-Verlag
New York, Inc. 566.
3. Leopold, C., Parallel and Distributed Computing: A Survey of Models, Paradigms, and Approaches. Wiley
Series On Parallel And Distributed Computing, ed. A.Y. Zomaya. 2001: A Wiley-Interscience Publication,
John Wiley & Sons, Inc. 260.
4. Symmetric Multi-Processing (SMP) systems on top of contemporary Intel appliances
http://www.cs.uta.fi/research/theses/masters/Hlusi_Jiri.pdf
5. The Linux Kernel Module Programming Guide - Symmetric Multi Processing,
http://www.tldp.org/LDP/lkmpg/2.4/html/c1294.htm
6. Cluster Computing: High-Performance, High-Availability, and High-Throughput Processing on a Network of
Computers, http://www.cloudbus.org/papers/ic_cluster.pdf
7. Bertil Svensson, SIMD Architectures, 1992
http://www.hh.se/download/18.70cf2e49129168da0158000103477/1341267676514/SIMD+Architecture
s.pdf
8. W. Stallings, Computer Organization and Architecture, 9th ed. England: Pearson Education Limited, 2013,
pp. 635-678.

2022 Parallel Computing | UNITEN 48


2022 Universiti Tenaga Nasional. All rights reserved.

2022 Parallel Computing | UNITEN 49

You might also like