Aca Unit 1.1

Advanced Computer Architecture
Module I
Flynn’s Classification
Jyoti Kumari
Computer Architecture
• Computer architecture is concerned with balancing the performance, efficiency, cost,
and reliability of a computer system.
• The discipline of computer architecture has three main subcategories:
• Instruction set architecture (ISA): defines the machine code that a processor reads
and acts upon as well as the word size, memory address modes, processor registers,
and data type. This part is the software that makes it run, such as Windows or
Photoshop or similar programs.
• Microarchitecture: also known as "computer organization", this describes how a

particular processor will implement the ISA. These might include DVD storage
devices or similar devices.
• Systems design: includes all of the other hardware components within a computing
system, such as data processing other than the CPU (e.g., direct memory
access), virtualization, and multiprocessing
Difference between CA and CO
Computer Architecture Computer Organization
• Describes what the computer • Describes how it does it.

does. • Indicates computer’s
• Indicates computer’s performance.
hardware. • Handles the segments of the
• Coordinates between the hardware network in a system.
and software of the system. • Deals with structural
• Deals with functional relationship.
behaviour of computer system.
Advanced Computer
Architecture
• Advanced computer architectures that utilize parallelism via
multiple processing units.
• Parallel processors are computer systems consisting of multiple processing
units connected via some interconnection network plus the software
needed to make the processing units work together.
Advantage of multiprocessors
– Faster speed than the fastest single-processor system.

– Cost-effective than building a high-performance single processor.
– Fault tolerance
Flynn’s Classification
Proposed by Michael J. Flynn in 1966
• This distinguishes Based on the number of instruction and

multi-processor
classification computer architectures data streams that can be processed simultaneously,
according to two independent dimensions: computing systems are classified into four major
categories:
1. Instruction stream - An instruction stream
is sequence of instructions performed
by the processing unit.
2. Data stream - A data stream is a data traffic
exchanged between the memory
and the processing unit.
• Each of these dimensions can have only one

of two possible states: Single or
Multiple (one or many).
• Flynn’s classification depends on the distinction

between the performance of control unit
and the data processing unit rather than its
operational and structural
interconnections.
Single-instruction, single-data
streams (SISD)
• An SISD computing system is a uniprocessor
(scalar processor) machine which is
capable of executing a single instruction,
operating on a single data stream.
• In SISD, machine instructions are processed in a
sequential manner as shown in Figure 1.1.
• Computers adopting this model are popularly
called sequential computers.
• Most conventional computers have SISD
architecture.
• Single instruction: only one instruction stream
is being acted on by the CPU during any
one clock cycle.
• Single data: only one data stream is being used
as input during any one clock cycle.
• Deterministic execution.
• Examples: most PCs, single CPU workstations
and mainframes.
Single-instruction, multiple-data
streams (SIMD)
• When there is only one control unit and all
processors execute the same instruction in
a synchronized fashion, the
parallel machine is classified as SIMD.
• An SIMD system is a multiprocessor
machine capable of executing the same
instruction on all the CPUs but operating
on different data streams as shown in
Figure 1.2.
• Single instruction: All processing units
execute the same instruction issued by the
control unit at any given clock cycle.
• Multiple data: Each processing unit can
operate on a different data element.
• Synchronous (lockstep) and deterministic
execution .
• The SIMD model of parallel computing
consists of two parts: a front-end computer
of the usual von Neumann style, and a
processor array as shown in Figure 1.3.
• The processor array is a set of identical

synchronized processing elements
capable of simultaneously performing the
same operation on different data.
• Each processor in the array has a small
amount of local memory where the
distributed data resides while it is
being processed in parallel.
• The processor array is connected to the
memory bus of the front end so that the
front end can randomly access the local
processor memories as if it were another
memory.
• Best suited for specialized problems
characterized by a high degree of regularity,
such as image processing and vector
computation. Figure 1.3 SIMD
Architectural Model
SIMD Schemes
• There are two main configurations that
have been used in SIMD machines (see
Fig.).
• In the first scheme, each processor has its own
local memory. Processors can communicate with
each other through the interconnection network.
If the interconnection network does not provide
direct connection between a given pair of
processors, then this pair can exchange data via
an intermediate processor.
• The ILLIAC IV used such an interconnection

scheme.
• The interconnection network in the ILLIAC IV

allowed each processor to communicate directly
with four neighbouring processors in an 8 X 8
matrix pattern such that the i th processor can
communicate directly with the (i - 1)th, (i +1)th,
(i - 8)th, and (i + 8)th processors.
SIMD Schemes
• In the second SIMD scheme,

processors and memory
modules communicate with
each other via the
interconnection network.
• Two processors can transfer
data between each other via
intermediate memory
module(s) or possibly via
intermediate processor(s).
• The BSP (Burroughs’
Scientific Processor) used the
second SIMD scheme.
Multiple-instruction, single-data
streams (MISD)
• In the MISD category, the same
stream of data flows
through a linear array
of processors executing
different instruction streams
as shown in figure 1.4.
• In practice, there is no viable
MISD machine; however,
some authors have
considered pipelined
machines (and
perhaps systolic-array
computers)
Carnegie-Mellon as examples
C.mmp for
computer
MISD. Figure 1.4. MISD Architecture
(1971).
• Example:
Experimental
Multiple-instruction Multiple-data
streams (MIMD)
• In a MIMD machine, each processor has its
own control unit and can execute
different instructions on different data
as shown in figure 1.5.
• Multiple Instruction: every processor may be
executing a different instruction stream
• Multiple Data: every processor may be
working with a different data stream,
multiple data stream is provided by shared
memory.
• Can be categorized as loosely coupled or
tightly coupled depending on sharing of
• data
Execution and control. be synchronous
can
asynchronous, or
deterministic. deterministic or
• Examples: non
current supercomputers,
most
networked parallel computer "grids" and
Figure 1.5 MIMD Architecture
multi-processor SMP computers.
MIMD machines are broadly categorized into shared-memory MIMD
and distributed-memory MIMD based on the way PEs are coupled to
the main memory.
• In the shared memory MIMD model

(tightly coupled multiprocessor
systems), all the PEs are connected to
a single global memory and they all
have access to it.
• The communication between PEs in
this model takes place through the
shared memory, modification of the
data stored in the global memory by
one PE is visible to all other PEs.
• E.g., Silicon Graphics machines and
Sun/IBM’s SMP (Symmetric
Multi-Processing).
• In Distributed memory MIMD
(message passing) machines
(loosely coupled multiprocessor
systems) all PEs have a local
memory.
• The communication between PEs
in this model takes place through
the interconnection network (the
inter process communication
channel, or IPC).
• The network connecting PEs can
be configured to tree, mesh or
in accordance with the
requirement.
Difference
Shared memory MIMD Distributed memory MIMD
• Tightly coupled • Loosely coupled
• Easier to program • Comparatively harder
• Less tolerant to failure • High tolerance
• Failure affect the • Does not affect entire
system because PEs are
entire system.
easily isolated.
• Not scalable because the • Scalable because each PE
addition of more PEs leads has its own memory.
to memory contention. • Better than shared memory
MIMD.
Tightly-coupled (shared memory)
multiprocessor system
• Multiprocessors are called tightly coupled systems
due to the high degree of resource sharing.
• Based on how the memory and peripheral resources
are shared or distributed, the multiprocessor systems
are divided into 3 different models:
1. Uniform memory-access (UMA) model

2. Non uniform memory-access (NUMA) model
3. Cache-only memory architecture (COMA) model
UMA Model
• ln a UMA multiprocessor model (Fig. ), the physical

memory is uniformly shared by all the processors.
• All processors have equal access time to all memory
words, which is why it is called uniform memory
access.
• Each processor may use a private cache.
• Peripherals are also shared in some fashion.
• Suitable for general-purpose and time sharing
applications by multiple users.
NUMA Model
• Shared-memory system in which the access
time varies with the location of the
memory word.
• The shared memory is physically distributed
to all processors, called local memories.
• The collection of all local memories forms a
global address space accessible by all
processors.
• Two NUMA machine models are depicted
in Fig. a.
• It is faster to access a local memory with a
local processor.
• The access of remote memory attached to
other processors takes longer due to the
added delay through the interconnection
network.
• Example- BBN TC-2000 Butterfly
multiprocessor
• A hierarchically structured multiprocessor is
modelled in Fig. b. The processors are
divided into several clusters.
• Each cluster is itself an UMA or a NUMA
multiprocessor.
• The clusters are connected to global
shared-memory modules.
• The entire system is considered a NUMA
multiprocessor.
• All processors belonging to the same cluster are
allowed to uniformly access the
cluster shared-memory modules.
• All clusters have equal access to the global
memory.
• However, the access time to the cluster memory
is shorter than that to the global memory.
One can specify the access rights among
inter-cluster memories in various ways.
• Example- Cedar multiprocessor
COMA Model
• The COMA model is a special case of a
NUMA machine, in which the distributed
main memories are converted to caches.
• There is no memory hierarchy at each
processor node.
• All the caches form a global address
space.
• Remote cache access is assisted by the
distributed cache directories.
• Besides the UMA, NUMA, and COMA
models specified above, other variations
exist for multiprocessors. For example,
cache-coherent NUMA (CC-NUMA)
model can be specified with distributed
shared memory and cache directories.
Distributed-memory multicomputers
• A distributed-memory multicomputer
system is modelled in Fig.
• The system consists of multiple computers,
often called nodes, interconnected by a
message-passing network.
• Each node is an autonomous computer
consisting of a processor, local memory,
and sometimes attached disks or l/O
peripherals.
• The message-passing network
provides point-to-point static
connections among the nodes.
• All local memories are private and are
accessible only by local processors. Forth
is reason, traditional multicomputers have
also been called no-remote-memory-access
(NORMA) machines.
• scalability and fault-tolerance.

Aca Unit 1.1

Uploaded by

Copyright:

Available Formats

Aca Unit 1.1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Aca Unit 1.1

Uploaded by

Copyright:

Available Formats

Advanced Computer Architecture

• The discipline of computer architecture has three main subcategories:

• Microarchitecture: also known as "computer organization", this describes how a

• Describes what the computer • Describes how it does it.

– Faster speed than the fastest single-processor system.

• This distinguishes Based on the number of instruction and

• Each of these dimensions can have only one

• Flynn’s classification depends on the distinction

• The processor array is a set of identical

• The ILLIAC IV used such an interconnection

• The interconnection network in the ILLIAC IV

• In the second SIMD scheme,

• In the shared memory MIMD model

1. Uniform memory-access (UMA) model

• ln a UMA multiprocessor model (Fig. ), the physical

You might also like