Multiprocessor Architecture: Taxonomy of Parallel Architectures
Multiprocessor Architecture: Taxonomy of Parallel Architectures
Multiprocessor architecture
Taxonomy of Parallel Architectures
Parallel computing is a computing where the jobs are
broken into discrete parts that can be executed concurrently.
Each part is further broken down to a series of instructions.
Instructions from each part execute simultaneously on
different CPUs.
1|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS
Flynn’s classification –
2|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS
Example Z = sin(x)+cos(x)+tan(x)
The system performs different operations on the same
data set. Machines built using the MISD model are not
useful in most of the application, a few machines are
built, but none of them are available commercially.
4. Multiple-instruction, multiple-data (MIMD)
systems –
An MIMD system is a multiprocessor machine which is
capable of executing multiple instructions on multiple
data sets. Each PE in the MIMD model has separate
instruction and data streams; therefore machines built
using this model are capable to any kind of application.
Unlike SIMD and MISD machines, PEs in MIMD
machines work asynchronously.
4|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS
5|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS
Cache Coherence
• With multiple caches, one CPU can modify
memory at locations that other CPUs have cached.
For example:
•
7|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS
8|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS
• Consistency :
o This indicates when a modification to
to another as needed.
Cache-Coherence Protocols
• Small-scale multiprocessor use hardware
mechanisms to track the state of data blocks that
are shared.
• Directory based.
10 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS
• Snooping.
o The sharing status is distributed and
kept with the block in each cache.
• Write invalidate.
o It is the most common protocol, both for
11 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS
12 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS
▪ If not, no broadcast is
necessary.
Conten Conten Conten
Process Bus ts of ts of ts of
or activit CPU CPU mem
activity y A's B's locatio
cache cache nX
CPU A Cache
0 0
reads X miss
CPU B Cache
0 0 0
reads X miss
CPU A Broadca
1 1 1
writes 1 st
CPU B
1 1 1
reads X
▪ This example also assumes
a write-back cache.
Performance Differences between Bus Snooping Protocols
• Write invalidate is much more popular.
differences.
• Multiple writes to the same word with no intervening
reads require multiple broadcasts.
• With multiword cache blocks, each word written requires
a broadcast.
13 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS
written invalidates.
o Also write invalidate works on blocks ,
miss.
address to be invalidated.
15 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS
CPU access.
17 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS
19 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS
invalidate operation.
• Real protocols distinguish between shared and clean data
in exactly one cache.
o A "clean and private" state eliminates
Every Greek deity object is in hand by a node. The initial owner is that the
node that created the object. possession will amendment as the object
moves from node to node. Once a method accesses information within the
shared address space, the mapping manager maps shared memory address
to physical memory (local or remote).
20 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS
Interconnection Network
Interconnection networks are composed of switching elements. Topology
is the pattern to connect the individual switches to other elements, like
processors, memories and other switches. A network allows exchange of
data between processors in the parallel system.
• Direct connection networks − Direct networks have point-to-point
connections between neighboring nodes. These networks are
static, which means that the point-to-point connections are fixed.
Some examples of direct networks are rings, meshes and cubes.
• Indirect connection networks − Indirect networks have no fixed
neighbors. The communication topology can be changed
dynamically based on the application demands. Indirect networks
can be subdivided into three parts: bus networks, multistage
networks and crossbar switches.
o Bus networks − A bus network is composed of a number of
bit lines onto which a number of resources are attached.
When busses use the same physical lines for data and
22 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS
Routing Mechanisms
Arithmetic, source-based port select, and table look-up are three
mechanisms that high-speed switches use to determine the output
channel from information in the packet header. All of these mechanisms
are simpler than the kind of general routing computations implemented
in traditional LAN and WAN routers. In parallel computer networks, the
switch needs to make the routing decision for all its inputs in every cycle,
so the mechanism needs to be simple and fast.
Deterministic Routing
A routing algorithm is deterministic if the route taken by a message is
determined exclusively by its source and destination, and not by other
traffic in the network. If a routing algorithm only selects shortest paths
toward the destination, it is minimal, otherwise it is non-minimal.
Deadlock Freedom
Deadlock can occur in a various situations. When two nodes attempt to
send data to each other and each begins sending before either receives,
a ‘head-on’ deadlock may occur. Another case of deadlock occurs, when
there are multiple messages competing for resources within the network.
The basic technique for proving a network is deadlock free, is to clear the
dependencies that can occur between channels as a result of messages
moving through the networks and to show that there are no cycles in the
overall channel dependency graph; hence there is no traffic patterns that
can lead to a deadlock. The common way of doing this is to number the
channel resources such that all routes follow a particular increasing or
decreasing sequences, so that no dependency cycles arise.
Switch Design
Design of a network depends on the design of the switch and how the
switches are wired together. The degree of the switch, its internal routing
mechanisms, and its internal buffering decides what topologies can be
supported and what routing algorithms can be implemented. Like any
other hardware component of a computer system, a network switch
contains data path, control, and storage.
Ports
The total number of pins is actually the total number of input and output
ports times the channel width. As the perimeter of the chip grows slowly
compared to the area, switches tend to be pin limited.
24 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS
Internal Datapath
The datapath is the connectivity between each of the set of input ports
and every output port. It is generally referred to as the internal cross-bar.
A non-blocking cross-bar is one where each input port can be connected
to a distinct output in any permutation simultaneously.
Channel Buffers
The organization of the buffer storage within the switch has an important
impact on the switch performance. Traditional routers and switches tend
to have large SRAM or DRAM buffers external to the switch fabric, while
in VLSI switches the buffering is internal to the switch and comes out of
the same silicon budget as the datapath and the control section. As the
chip size and density increases, more buffering is available and the
network designer has more options, but still the buffer real-estate comes
at a prime choice and its organization is important.
Flow Control
When multiple data flows in the network attempt to use the same shared
network resources at the same time, some action must be taken to
control these flows. If we don’t want to lose any data, some of the flows
must be blocked while others proceed.
The problem of flow control arises in all networks and at many levels. But
it is qualitatively different in parallel computer networks than in local and
wide area networks. In parallel computers, the network traffic needs to
be delivered about as accurately as traffic across a bus and there are a
very large number of parallel flows on very small-time scale.
25 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS
ii.Sequential Consistency
iii.Linearizability
26 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS
iv.Causal Consistency
v.FIFO Consistency
27 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS
vi.Weak consistency
vii.Release Consistency
viii.Entry Consistency
28 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS
i.Eventual Consistency
iii.Monotonic Writes
29 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS
30 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS
Cluster Computers
Cluster is a set of loosely or tightly connected computers working together as
a unified computing resource that can create the illusion of being one
machine. Computer clusters have each node set to perform the same task,
controlled and produced by software.
The components of a clusters are usually connected to each other using fast
area networks, with each node running its own instance of an operating
system. In most circumstances, all the nodes uses same hardware and the
same operating system, although in some setups different hardware or
different operating system can be used.
Types of Clusters –
Computer Clusters are arranged together in such a way so as to support
different purpose from general purpose business needs such as web-service
support, to computation intensive scientific calculation. Basically there are
three types of Clusters, they are:
• Load-Balancing Cluster – A cluster requires an effective capability
for balancing the load among available computers. In this, cluster
nodes share computational workload so as to enhance the overall
performance. For example- a high-performance cluster used for
31 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS
32 | P a g e
Contact: 7008443534, 9090042626