100% found this document useful (1 vote)

1K views

Multiprocessor Architecture: Taxonomy of Parallel Architectures

1. The document discusses computer system architecture and parallel processing. It covers Flynn's taxonomy of parallel architectures which includes single-instruction single-data (SISD), single-instruction multiple-data (SIMD), multiple-instruction single-data (MISD), and multiple-instruction multiple-data (MIMD) systems. 2. MIMD systems are further divided into shared-memory and distributed-memory models. Shared-memory MIMD machines have all processors connected to a single global memory while distributed-memory MIMD machines have local memory for each processor. 3. The document also discusses cache coherence issues that can arise in shared-memory multiprocessor systems and outlines some of the requirements

Uploaded by

Anshu Kumar Tiwari

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

1K views

Multiprocessor Architecture: Taxonomy of Parallel Architectures

Uploaded by

Anshu Kumar Tiwari

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Subject: Computer System Architecture

Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

Multiprocessor architecture
Taxonomy of Parallel Architectures
Parallel computing is a computing where the jobs are
broken into discrete parts that can be executed concurrently.
Each part is further broken down to a series of instructions.
Instructions from each part execute simultaneously on
different CPUs.

Parallel systems deal with the simultaneous use of multiple

computer resources that can include a single computer with
multiple processors, a number of computers connected by a
network to form a parallel processing cluster or a combination
of both.

Parallel systems are more difficult to program than computers

with a single processor because the architecture of parallel
computers varies accordingly and the processes of multiple
CPUs must be coordinated and synchronized.

The crux of parallel processing are CPUs. Based on the

number of instruction and data streams that can be
processed simultaneously, computing systems are classified
into four major categories:

1|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

Flynn’s classification –

1. Single-instruction, single-data (SISD) systems –

An SISD computing system is a uniprocessor machine
which is capable of executing a single instruction,
operating on a single data stream. In SISD, machine
instructions are processed in a sequential manner and
computers adopting this model are popularly called
sequential computers. Most conventional computers
have SISD architecture. All the instructions and data to
be processed have to be stored in primary memory.

2|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

The speed of the processing element in the SISD

model is limited(dependent) by the rate at which the
computer can transfer information internally. Dominant
representative SISD systems are IBM PC,
workstations.
2. Single-instruction, multiple-data (SIMD) systems
–
An SIMD system is a multiprocessor machine capable
of executing the same instruction on all the CPUs but
operating on different data streams. Machines based
on an SIMD model are well suited to scientific
computing since they involve lots of vector and matrix
operations. So that the information can be passed to
all the processing elements (PEs) organized data
elements of vectors can be divided into multiple
sets(N-sets for N PE systems) and each PE can process
one data set.

Dominant representative SIMD systems is Cray’s

vector processing machine.
3|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

3. Multiple-instruction, single-data (MISD) systems

–
An MISD computing system is a multiprocessor
machine capable of executing different instructions on
different PEs but all of them operating on the same
dataset .

Example Z = sin(x)+cos(x)+tan(x)
The system performs different operations on the same
data set. Machines built using the MISD model are not
useful in most of the application, a few machines are
built, but none of them are available commercially.
4. Multiple-instruction, multiple-data (MIMD)
systems –
An MIMD system is a multiprocessor machine which is
capable of executing multiple instructions on multiple
data sets. Each PE in the MIMD model has separate
instruction and data streams; therefore machines built
using this model are capable to any kind of application.
Unlike SIMD and MISD machines, PEs in MIMD
machines work asynchronously.

4|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

MIMD machines are broadly categorized into shared-

memory MIMD and distributed-memory
MIMD based on the way PEs are coupled to the main
memory.
In the shared memory MIMD model (tightly coupled
multiprocessor systems), all the PEs are connected to
a single global memory and they all have access to it.
The communication between PEs in this model takes
place through the shared memory, modification of the
data stored in the global memory by one PE is visible
to all other PEs. Dominant representative shared
memory MIMD systems are Silicon Graphics machines
and Sun/IBM’s SMP (Symmetric Multi-Processing).
In Distributed memory MIMD machines (loosely
coupled multiprocessor systems) all PEs have a local
memory. The communication between PEs in this
model takes place through the interconnection network
(the inter process communication channel, or IPC). The
network connecting PEs can be configured to tree,
mesh or in accordance with the requirement.
The shared-memory MIMD architecture is easier to
program but is less tolerant to failures and harder to
extend with respect to the distributed memory MIMD
model. Failures in a shared-memory MIMD affect the
entire system, whereas this is not the case of the
distributed model, in which each of the PEs can be
easily isolated. Moreover, shared memory MIMD
architectures are less likely to scale because the
addition of more PEs leads to memory contention. This

5|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

is a situation that does not happen in the case of

distributed memory, in which each PE has its own
memory. As a result of practical outcomes and user’s
requirement , distributed memory MIMD architecture is
superior to the other existing models.

Centralized shared memory architecture

• The use of large multilevel caches can substantially
reduce memory bandwidth demands of a processor.

• This has made it possible for several

(micro)processors to share the same memory
through a shared bus.

• Caching supports both private and shared data.

o For private data, once cached, it's

treatment is identical to that of a

uniprocessor.
o For shared data, the shared value may be

replicated in many caches.

Replication has several advantages:

•

• Reduced latency and memory bandwidth requirements.

• Reduced contention for data items that are read by
multiple processors simultaneously.

• However, it also introduces a problem: Cache

coherence .
6|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

Cache Coherence
• With multiple caches, one CPU can modify
memory at locations that other CPUs have cached.

For example:
•

• CPU A reads location x, getting the value N .

• Later, CPU B reads the same location, getting the
value N .
• Next, CPU A writes location x with the value N - 1 .
• At this point, any reads from CPU B will get the
value N , while reads from CPU A will get the value N -
1.

• This problem occurs both with write-

through caches and (more seriously) with write-
back caches.

• Cache coherence : informal definition:

o A memory system is coherent if any read

of a data item returns the most recently

written value of that data item.

• Upon closer inspection, there are several aspects

that need to be addressed.
Cache Coherence
• Coherence defines what values can be returned by a
read.

7|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

A memory system is coherent if:

•

• Read after write works for a single processor.

o If CPU A writes N to location X, all

future reads of location X will return N

if no other processor writes location X
after CPU A.

• Other processors' writes eventually propagate.

o If CPU A writes value N to location X,

CPU B will eventually be able to read

value N from location X.

o Once it does so, it will continue to read

value N until location X is written
again.

o This is our intuitive notion of a coherent

view of memory.
Cache Coherence

• Writes to a single location are serialized.

o If CPUs A and B both write to location

X, all processors see the same order of the

writes.

o This does not mean that all reads must

return the same value.

8|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

▪ If value N1 is written "first" to

location X, followed closely by
reads of X and a write of X with
value N2, some reads may
return N1 and some N2.

▪ However, a processor that reads

N2 will return N2 for all future
reads.

• Consistency :
o This indicates when a modification to

memory is seen by other processors (i.e.

will be returned by a read).

o Clearly, this can NOT be

"instantaneous" since it may be that the
new value has not even left the processor
when a read occurs.
Cache Coherence
• Consistency :
o The issue of when a written value MUST

be seen by a reader is defined by a

memory consistency model.

o For now, let's assume that a write is not

complete until all processors have "seen"
the effect of the write.
9|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

o Also, assume that a processor may not

reorder memory accesses to move reads
before an outstanding write.
▪ Reads can be reordered,

but reads and writes can not be

interchanged.

Coherent caches provide both:

•

• Replication of shared data items (reduces latency and

contention).
o Here, the purpose is to provide multiple

copies of data so that several processors

can access a single piece of memory
without serialization.

• Migration of data items (reduces latency).

o Data items are moved from one processor

to another as needed.
Cache-Coherence Protocols
• Small-scale multiprocessor use hardware
mechanisms to track the state of data blocks that
are shared.

Two classes of protocols:

•

• Directory based.

10 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

o The sharing status of a block of physical

memory is kept in one location (the
directory).

• Snooping.
o The sharing status is distributed and
kept with the block in each cache.

o The caches are usually on a shared

memory bus.
▪ The cache controllers snoop the

bus to watch for transactions

that occur on data blocks that
they hold.
Bus Snooping Protocols

• Write invalidate.
o It is the most common protocol, both for

snooping and for directory schemes.

o The basic idea behind this protocol is that

writes to a location invalidate other
caches' copies of the block.
▪ Reads by other processors on

invalidated data cause cache

misses.

11 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

▪ If two processors write at the

same time, one wins and obtains
exclusive access.
Conten Conten Conten
Process Bus ts of ts of ts of
or activit CPU CPU mem
activity y A's B's locatio
cache cache nX
CPU A Cache
0 0
reads X miss
CPU B Cache
0 0 0
reads X miss
CPU A Invalida
1 0
writes 1 te
CPU B Cache
1 1 1
reads X miss
▪ This example assumes a write-
back cache.
Bus Snooping Protocols

• Write broadcast (write update).

o An alternative is to update all cached

copies of the data item when it is written.

o To reduce bandwidth requirements, this

protocol keeps track of whether or not a
word in the cache is shared.

12 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

▪ If not, no broadcast is
necessary.
Conten Conten Conten
Process Bus ts of ts of ts of
or activit CPU CPU mem
activity y A's B's locatio
cache cache nX
CPU A Cache
0 0
reads X miss
CPU B Cache
0 0 0
reads X miss
CPU A Broadca
1 1 1
writes 1 st
CPU B
1 1 1
reads X
▪ This example also assumes
a write-back cache.
Performance Differences between Bus Snooping Protocols
• Write invalidate is much more popular.

This is due primarily to the performance

•

differences.
• Multiple writes to the same word with no intervening
reads require multiple broadcasts.
• With multiword cache blocks, each word written requires
a broadcast.

13 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

For write invalidate, the first word

written invalidates.
o Also write invalidate works on blocks ,

while write broadcast must work on

individual words or bytes.
• The delay between writing by one processor and reading
by another is lower in the write broadcast scheme.
o For write invalidate, the read causes a

miss.

• Since bus and memory bandwidth are more

important in a bus-based multiprocessor, write
invalidation performs better.

• Therefore, we focus on implementation of the write

invalidate protocol.
Implementation of Write Invalidate Protocols
• Write invalidate is simple in bus-based schemes.
o Acquire the bus and broadcast the

address to be invalidated.

• Since all processors snoop the bus, they can check

the address against items in their cache.

• Bus acquisition also serializes write operations to

the same memory location.
o Writes to a shared data item cannot

complete until the bus is acquired.

14 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

• What about locating a data item when a cache miss

occurs ?
o For write-through , it's in memory.

o For write-back , snooping can be used.

▪ If a processor finds that it has a

dirty copy of the requested cache

block, it provides the block
instead of memory.

• Note, write-back caches are greatly preferred in

a multiprocessor environment since they reduce
memory bandwidth.
Implementation of Write Invalidate Protocol on Write-Back caches.
• Writes are the issue here.

• We would like to know if any other caches contain

the block to be written by a processor.
o If there are none, then the write

need not be placed on the bus.

o This reduces the time to complete the

write and reduces memory bandwidth.

• This can be tracked by adding an extra state bit (in

addition to the valid and dirty bits) that indicates if
the block is shared.

15 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

o If the bit is set (the block is shared), the

cache generates an invalidation on the
bus and marks the block as private.

o If another processor later requests the

block, the miss is snooped and the
"owner" sets the state bit to shared.
Implementation of Write Invalidate Protocol on Write-Back caches.
• Note that every bus transaction checks cache-
address tags.
o This could potentially interfere with

CPU cache access.

This interference can be reduced by:

•

• Duplicating the tags.

o Bus access can proceed in parallel with

CPU access.

o On misses, the processor must arbitrate

for and update both sets of tags.
▪ The same is true for the snoop

(to perform an invalidate or to

update the shared bit).

o However, a snoop may require fetching a

block.
▪ This is the only instance that

may cause a stall.

16 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

Implementation of Write Invalidate Protocol on Write-Back caches.

• Employing a multilevel cache with inclusion.

o Every entry in L1 is in L2.

▪ Therefore, snooping can be

directed to L2, where there are

fewer processor accesses.

o If a snoop gets a hit, then it must

arbitrate for L1 to update state and
possibly retrieve data.
▪ This usually stalls the processor.

o Since it is popular to use multi-level

caches in multiprocessors (to reduce
memory bandwidth), this solution is
usually adopted.

o It is also possible to duplicate the tags in

L2 to further reduce contention.
An Example Centralized Shared-Memory Snooping Protocol
• Implemented by incorporating a finite state
controller in each node.

• The controller responds to requests from the

processor and bus:
To simplify the controller, write hits and write
misses to shared blocks are treated as write misses.

17 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

Request Source Function

Read hit Processor Read data in cache.
Write hit Processor Write data in cache.
Read Request data from cache or
Bus
miss memory.
Write Request data from cache or memory
Bus
miss (perform any needed invalidates).
o This causes processors with copies to
invalidate them.
An Example Centralized Shared-Memory Snooping Protocol
• Write invalidation and a write-back cache
assumed:

An Example Centralized Shared-Memory Snooping Protocol

These state transitions have no analog in a uniprocessor cache
controller.
18 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

An Example Centralized Shared-Memory Snooping Protocol

Complications we have ignored:
•

• Assumes that operations are atomic .

o In reality, a write miss is not atomic --

just too much work to do.

o Also, read misses on a split transaction

bus are not atomic.

• Nonatomic actions introduce the possibility that the
protocol can deadlock .
o See Appendix E for a fix.

Two major simplifications:

•

• Real protocols distinguish between write hits and write

misses.

19 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

From the shared state, a write

miss would require the action shown

previously.
o However, a write hit does not require

that the data be fetched since it is up-to-

date.
▪ All that is needed is an

invalidate operation.
• Real protocols distinguish between shared and clean data
in exactly one cache.
o A "clean and private" state eliminates

the need to generate a bus transaction on

a write to a "clean and private" block.

Architecture of Distributed Shared

Memory(DSM)
Distributed Shared Memory (DSM) implements the distributed systems
shared memory model in a distributed system, that hasn’t any physically
shared memory. Shared model provides a virtual address area shared
between any or all nodes.

To beat the high forged of communication in distributed system. DSM

memo, model provides a virtual address area shared between all nodes.
systems move information to the placement of access. Information moves
between main memory and secondary memory (within a node) and
between main recollections of various nodes.

Every Greek deity object is in hand by a node. The initial owner is that the
node that created the object. possession will amendment as the object
moves from node to node. Once a method accesses information within the
shared address space, the mapping manager maps shared memory address
to physical memory (local or remote).

20 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

DSM permits programs running on separate reasons to share information while

not the software engineer having to agitate causation message instead
underlying technology can send the messages to stay the DSM consistent
between compute. DSM permits programs that wont to treat constant laptop to
be simply tailored to control on separate reason. Programs access what seems to
them to be traditional memory.
Hence, programs that Pine Tree State DSM square measure sometimes shorter
and easier to grasp than programs that use message passing. But, DSM isn’t
appropriate for all things. Client-server systems square measure typically less
suited to DSM, however, a server is also wont to assist in providing DSM
practicality for information shared between purchasers.

Architecture of Distributed Shared Memory (DSM) :

Every node consists of 1 or additional CPU’s and a memory unit. High-speed
communication network is employed for connecting the nodes. A straightforward
21 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

message passing system permits processes on completely different nodes to

exchange one another.
Memory mapping manager unit :
Memory mapping manager routine in every node maps the native memory onto
the shared computer storage. For mapping operation, the shared memory house
is divided into blocks.
Information caching may be a documented answer to deal with operation
latency. DMA uses information caching to scale back network latency. the most
memory of the individual nodes is employed to cache items of the shared
memory house.
Memory mapping manager of every node reads its native memory as an
enormous cache of the shared memory house for its associated processors. The
bass unit of caching may be a memory block. Systems that support DSM,
information moves between secondary memory and main memory also as
between main reminiscences of various nodes.
Communication Network Unit :
Once method access information within the shared address house mapping
manager maps the shared memory address to the physical memory. The mapped
layer of code enforced either within the operating kernel or as a runtime routine.
Physical memory on every node holds pages of shared virtual–address house.
Native pages area unit gift in some node’s memory. Remote pages in some other
node’s memory.

Interconnection Network
Interconnection networks are composed of switching elements. Topology
is the pattern to connect the individual switches to other elements, like
processors, memories and other switches. A network allows exchange of
data between processors in the parallel system.
• Direct connection networks − Direct networks have point-to-point
connections between neighboring nodes. These networks are
static, which means that the point-to-point connections are fixed.
Some examples of direct networks are rings, meshes and cubes.
• Indirect connection networks − Indirect networks have no fixed
neighbors. The communication topology can be changed
dynamically based on the application demands. Indirect networks
can be subdivided into three parts: bus networks, multistage
networks and crossbar switches.
o Bus networks − A bus network is composed of a number of
bit lines onto which a number of resources are attached.
When busses use the same physical lines for data and
22 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

addresses, the data and the address lines are time

multiplexed. When there are multiple bus-masters attached
to the bus, an arbiter is required.
o Multistage networks − A multistage network consists of
multiple stages of switches. It is composed of ‘axb’ switches
which are connected using a particular interstage connection
pattern (ISC). Small 2x2 switch elements are a common
choice for many multistage networks. The number of stages
determine the delay of the network. By choosing different
interstage connection patterns, various types of multistage
network can be created.
o Crossbar switches − A crossbar switch contains a matrix of
simple switch elements that can switch on and off to create
or break a connection. Turning on a switch element in the
matrix, a connection between a processor and a memory can
be made. Crossbar switches are non-blocking, that is all
communication permutations can be performed without
blocking.

Evaluating Design Trade-offs in Network

Topology
If the main concern is the routing distance, then the dimension has to be
maximized and a hypercube made. In store-and-forward routing,
assuming that the degree of the switch and the number of links were not
a significant cost factor, and the numbers of links or the switch degree
are the main costs, the dimension has to be minimized and a mesh built.
In worst case traffic pattern for each network, it is preferred to have high
dimensional networks where all the paths are short. In patterns where
each node is communicating with only one or two nearby neighbors, it is
preferred to have low dimensional networks, since only a few of the
dimensions are actually used.
Routing
The routing algorithm of a network determines which of the possible
paths from source to destination is used as routes and how the route
followed by each particular packet is determined. Dimension order
routing limits the set of legal paths so that there is exactly one route from
each source to each destination. The one obtained by first traveling the
correct distance in the high-order dimension, then the next dimension and
so on.
23 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

Routing Mechanisms
Arithmetic, source-based port select, and table look-up are three
mechanisms that high-speed switches use to determine the output
channel from information in the packet header. All of these mechanisms
are simpler than the kind of general routing computations implemented
in traditional LAN and WAN routers. In parallel computer networks, the
switch needs to make the routing decision for all its inputs in every cycle,
so the mechanism needs to be simple and fast.
Deterministic Routing
A routing algorithm is deterministic if the route taken by a message is
determined exclusively by its source and destination, and not by other
traffic in the network. If a routing algorithm only selects shortest paths
toward the destination, it is minimal, otherwise it is non-minimal.
Deadlock Freedom
Deadlock can occur in a various situations. When two nodes attempt to
send data to each other and each begins sending before either receives,
a ‘head-on’ deadlock may occur. Another case of deadlock occurs, when
there are multiple messages competing for resources within the network.
The basic technique for proving a network is deadlock free, is to clear the
dependencies that can occur between channels as a result of messages
moving through the networks and to show that there are no cycles in the
overall channel dependency graph; hence there is no traffic patterns that
can lead to a deadlock. The common way of doing this is to number the
channel resources such that all routes follow a particular increasing or
decreasing sequences, so that no dependency cycles arise.
Switch Design
Design of a network depends on the design of the switch and how the
switches are wired together. The degree of the switch, its internal routing
mechanisms, and its internal buffering decides what topologies can be
supported and what routing algorithms can be implemented. Like any
other hardware component of a computer system, a network switch
contains data path, control, and storage.
Ports
The total number of pins is actually the total number of input and output
ports times the channel width. As the perimeter of the chip grows slowly
compared to the area, switches tend to be pin limited.

24 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

Internal Datapath
The datapath is the connectivity between each of the set of input ports
and every output port. It is generally referred to as the internal cross-bar.
A non-blocking cross-bar is one where each input port can be connected
to a distinct output in any permutation simultaneously.
Channel Buffers
The organization of the buffer storage within the switch has an important
impact on the switch performance. Traditional routers and switches tend
to have large SRAM or DRAM buffers external to the switch fabric, while
in VLSI switches the buffering is internal to the switch and comes out of
the same silicon budget as the datapath and the control section. As the
chip size and density increases, more buffering is available and the
network designer has more options, but still the buffer real-estate comes
at a prime choice and its organization is important.
Flow Control
When multiple data flows in the network attempt to use the same shared
network resources at the same time, some action must be taken to
control these flows. If we don’t want to lose any data, some of the flows
must be blocked while others proceed.
The problem of flow control arises in all networks and at many levels. But
it is qualitatively different in parallel computer networks than in local and
wide area networks. In parallel computers, the network traffic needs to
be delivered about as accurately as traffic across a bus and there are a
very large number of parallel flows on very small-time scale.

Memory Consistency Model

• A consistency model is contract between a distributed
data store and processes, in which the processes agree to
obey certain rules in contrast the store promises to work
correctly.
• A consistency model basically refers to the degree of
consistency that should be maintained for the shared
memory data.
• If a system supports the stronger consistency model, then
the weaker consistency model is automatically supported
but the converse is not true.

25 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

• The types of consistency models are Data-Centric and

client centric consistency models.

1.Data-Centric Consistency Models

A data store may be physically distributed across multiple
machines. Each process that can access data from the store is
assumed to have a local or nearby copy available of the entire
store.
i.Strict Consistency model

• Any read on a data item X returns a value corresponding

to the result of the most recent write on X
• This is the strongest form of memory coherence which has
the most stringent consistency requirement.
• Strict consistency is the ideal model but it is impossible to
implement in a distributed system. It is based on absolute
global time or a global agreement on commitment of
changes.

ii.Sequential Consistency

• Sequential consistency is an important data-centric

consistency model which is a slightly weaker consistency
model than strict consistency.
• A data store is said to be sequentially consistent if the
result of any execution is the same as if the (read and
write) operations by all processes on the data store were
executed in some sequential order and the operations of
each individual process should appear in this sequence in
a specified order.
• Example: Assume three operations read(R1), write(W1),
read(R2) performed in an order on a memory address.
Then (R1,W1,R2),(R1,R2,W1),(W1,R1,R2)(R2,W1,R1) are
acceptable provided all processes see the same ordering.

iii.Linearizability

• It that is weaker than strict consistency, but stronger than

sequential consistency.

26 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

• A data store is said to be linearizable when each operation

is timestamped and the result of any execution is the
same as if the (read and write) operations by all
processes on the data store were executed in some
sequential order
• The operations of each individual process appear in
sequence order specified by its program.
• If tsOP1(x)< tsOP2(y), then operation OP1(x) should
precede OP2(y) in this sequence.

iv.Causal Consistency

• It is a weaker model than sequential consistency.

• In Casual Consistency all processes see only those
memory reference operations in the correct order that are
potentially causally related.
• Memory reference operations which are not related may
be seen by different processes in different order.
• A memory reference operation is said to be casually
related to another memory reference operation if the first
operation is influenced by the second operation.
• If a write(w2) operation is casually related to another
write (w1) the acceptable order is (w1, w2).

v.FIFO Consistency

• It is weaker than causal consistency.

• This model ensures that all write operations performed by
a single process are seen by all other processes in the
order in which they were performed like a single process
in a pipeline.
• This model is simple and easy to implement having good
performance because processes are ready in the pipeline.
• Implementation is done by sequencing write operations
performed at each node independently of the operations
performed on other nodes.
• Example: If (w11) and (w12) are write operations
performed by p1 in that order and (w21),(w22) by p2. A
process p3 can see them as [(w11,w12),(w21,w2)] while
p4 can view them as [(w21,w2),(w11,w12)].

27 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

vi.Weak consistency

• The basic idea behind the weak consistency model is

enforcing consistency on a group of memory reference
operations rather than individual operations.
• A Distributed Shared Memory system that supports the
weak consistency model uses a special variable called a
synchronization variable which is used to synchronize
memory.
• When a process accesses a synchronization variable, the
entire memory is synchronized by making visible the
changes made to the memory to all other processes.

vii.Release Consistency

• Release consistency model tells whether a process is

entering or exiting from a critical section so that the
system performs either of the operations when a
synchronization variable is accessed by a process.
• Two synchronization variables acquire and release are
used instead of single synchronization variable. Acquire is
used when process enters critical section and release is
when it exits a critical section.
• Release consistency can be viewed as synchronization
mechanism based on barriers instead of critical sections.

viii.Entry Consistency

• In entry consistency every shared data item is associated

with a synchronization variable.
• In order to access consistent data, each synchronization
variable must be explicitly acquired.
• Release consistency affects all shared data but entry
consistency affects only those shared data associated with
a synchronization variable.

2.Client-Centric Consistency Models

28 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

• Client-centric consistency models aim at providing a

system wide view on a data store.
• This model concentrates on consistency from the
perspective of a single mobile client.
• Client-centric consistency models are generally used for
applications that lack simultaneous updates were most
operations involve reading data.

i.Eventual Consistency

• In Systems that tolerate high degree of inconsistency, if

no updates take place for a long time all replicas will
gradually and eventually become consistent. This form of
consistency is called eventual consistency.
• Eventual consistency only requires those updates that
guarantee propagation to all replicas.
• Eventual consistent data stores work fine as long as
clients always access the same replica.
• Write conflicts are often relatively easy to solve when
assuming that only a small group of processes can
perform updates. Eventual consistency is therefore often
cheap to implement.

ii.Monotonic Reads Consistency

• A data store is said to provide monotonic-read consistency

if a process reads the value of a data item x, any
successive read operation on x by that process will always
return that same value or a more recent value.
• A process has seen a value of x at time t, it will never see
an older version of x at a later time.
• Example: A user can read incoming mail while moving.
Each time the user connects to a different e-mail server,
that server fetches all the updates from the server that
the user previously visited. Monotonic Reads guarantees
that the user sees all updates, no matter from which
server the automatic reading takes place.

iii.Monotonic Writes
29 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

• A data store is said to be monotonic-write consistent if a

write operation by a process on a data item x is completed
before any successive write operation on X by the same
process.
• A write operation on a copy of data item x is performed
only if that copy has been brought up to date by means of
any preceding write operations, which may have taken
place on other copies of x.
• Example: Monotonic-write consistency guarantees that if
an update is performed on a copy of Server S, all
preceding updates will be performed first. The resulting
server will then indeed become the most recent version
and will include all updates that have led to previous
versions of the server.

iv.Read Your Writes

• A data store is said to provide read-your-writes

consistency if the effect of a write operation by a process
on data item x will always be a successive read operation
on x by the same process.
• A write operation is always completed before a successive
read operation by the same process no matter where that
read operation takes place.
• Example: Updating a Web page and guaranteeing that the
Web browser shows the newest version instead of its
cached copy.

v.Writes Follow Reads

• A data store is said to provide writes-follow-reads

consistency if a process has write operation on a data
item x following a previous read operation on x then it is
guaranteed to take place on the same or a more recent
value of x that was read.
• Any successive write operation by a process on a data
item x will be performed on a copy of x that is up to date
with the value most recently read by that process.
• Example: Suppose a user first reads an article A then
posts a response B. By requiring writes-follow-reads

30 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

consistency, B will be written to any copy only after A has

been written.

Cluster Computers
Cluster is a set of loosely or tightly connected computers working together as
a unified computing resource that can create the illusion of being one
machine. Computer clusters have each node set to perform the same task,
controlled and produced by software.
The components of a clusters are usually connected to each other using fast
area networks, with each node running its own instance of an operating
system. In most circumstances, all the nodes uses same hardware and the
same operating system, although in some setups different hardware or
different operating system can be used.

Types of Clusters –
Computer Clusters are arranged together in such a way so as to support
different purpose from general purpose business needs such as web-service
support, to computation intensive scientific calculation. Basically there are
three types of Clusters, they are:
• Load-Balancing Cluster – A cluster requires an effective capability
for balancing the load among available computers. In this, cluster
nodes share computational workload so as to enhance the overall
performance. For example- a high-performance cluster used for

31 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

scientific calculation would balance load from different algorithms

from the web-server cluster, which may just use a round-robin
method by assigning each new request to a different node.This type
of cluster is used on farms of Web servers (web farm).
• Fail-Over Clusters – The function of switching applications and
data resources over from a failed system to an alternative system in
the cluster is referred to as fail-over. These types are used to
cluster database of critical mission, mail, file and application
servers
• High-Availability Clusters – These are also known as “HA
clusters”. They offer a high probability that all the resources will be
in service. If a failure does occur, such as a system goes down or a
disk volume is lost, then the queries in progress are lost. Any lost
query, if retried, will be serviced by different computer in the cluster.
This type of cluster is widely used in web, email, news or FTP
servers.
Benefits –
• Absolute scalability – It is possible to create a large clusters that
beats the power of even the largest standalone machines. A cluster
can have dozens of multiprocessor machines.
• Additional scalability – A cluster is configured in such a way that it
is possible to add new systems to the cluster in small increment.
Clusters have the ability to add systems horizontally. This means
that more computers may be added to the clusters to improve its
performance, redundancy and fault tolerance(the ability for a
system to continue working with a malfunctioning of node).
• High availability – As we know that each node in a cluster is a
standalone computer, the failure of one node does not mean loss of
service. A single node can be taken down for maintenance, while
the rest of the clusters takes on the load of that individual node.
• Preferable price/performance – Clusters are usually set up to
improve performance and availability over single computers, while
typically being much more cost effective than single computers of
comparable speed or availability.

32 | P a g e
Contact: 7008443534, 9090042626

Rate Limiting Using Token Bucket Filter - Java Multithreading For Senior Engineering Interviews
No ratings yet
Rate Limiting Using Token Bucket Filter - Java Multithreading For Senior Engineering Interviews
8 pages
VIT Bhopal University: Practical File Fall 2020-2021
No ratings yet
VIT Bhopal University: Practical File Fall 2020-2021
51 pages
Technical Seminar PPT Format
No ratings yet
Technical Seminar PPT Format
8 pages
Parallel and Distributed Computing
33% (3)
Parallel and Distributed Computing
10 pages
Flynn's Classification
No ratings yet
Flynn's Classification
4 pages
Advance Computer Architecture2
No ratings yet
Advance Computer Architecture2
36 pages
Handling Exceptions
No ratings yet
Handling Exceptions
12 pages
25th August MCA New First Year Syllabus 2020
No ratings yet
25th August MCA New First Year Syllabus 2020
24 pages
Common Bus System
No ratings yet
Common Bus System
4 pages
Compiler Design-UNIT-5
No ratings yet
Compiler Design-UNIT-5
34 pages
Cache Memory Mapping Techniques
No ratings yet
Cache Memory Mapping Techniques
16 pages
4th Sem Syllabus of RGPV Bhopal Cse
No ratings yet
4th Sem Syllabus of RGPV Bhopal Cse
14 pages
Computer Registers & Common Bus System
No ratings yet
Computer Registers & Common Bus System
21 pages
Register, Bus and Memory Transfer-5
No ratings yet
Register, Bus and Memory Transfer-5
17 pages
Boe310 Lecture
No ratings yet
Boe310 Lecture
25 pages
Compiler Design
No ratings yet
Compiler Design
130 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
2 pages
Numerical Problem of Unit4 Coa
No ratings yet
Numerical Problem of Unit4 Coa
26 pages
Unit-3 - Microinstruction Format
No ratings yet
Unit-3 - Microinstruction Format
15 pages
CO Unit 1-2
No ratings yet
CO Unit 1-2
14 pages
Opening and Closing A File in C++
No ratings yet
Opening and Closing A File in C++
4 pages
CSC403 - Computer Organization and Architecture PDF
No ratings yet
CSC403 - Computer Organization and Architecture PDF
3 pages
2.1: Advanced Processor Technology: Qn:Explain Design Space of Processor?
No ratings yet
2.1: Advanced Processor Technology: Qn:Explain Design Space of Processor?
29 pages
Variants of Turing Machines Model
No ratings yet
Variants of Turing Machines Model
38 pages
Assignment 01: 1. What Is A Micro Operation? List and Explain Its Categories
No ratings yet
Assignment 01: 1. What Is A Micro Operation? List and Explain Its Categories
15 pages
Kpit Cummins Placement Papers
100% (1)
Kpit Cummins Placement Papers
8 pages
Computer Architecture and Organisation Notes
No ratings yet
Computer Architecture and Organisation Notes
18 pages
Computer Networks - CS3591 - Question Bank and Important 2 Marks Questions With Answer
No ratings yet
Computer Networks - CS3591 - Question Bank and Important 2 Marks Questions With Answer
33 pages
Microprocessor - Overview: Block Diagram of A Computer
No ratings yet
Microprocessor - Overview: Block Diagram of A Computer
5 pages
Practice Assignment 11 Sol 12453
100% (1)
Practice Assignment 11 Sol 12453
6 pages
Horspool Algorithm
No ratings yet
Horspool Algorithm
6 pages
COA Lab File
No ratings yet
COA Lab File
25 pages
STLD Bits
No ratings yet
STLD Bits
18 pages
Unit 3 Basic Processing Unit
No ratings yet
Unit 3 Basic Processing Unit
86 pages
Unit III - Preprocessor & Files
No ratings yet
Unit III - Preprocessor & Files
28 pages
5 - Memory Management: (5) What Is Paging? Explain The Conversion of Virtual Address in Paging With Example
100% (2)
5 - Memory Management: (5) What Is Paging? Explain The Conversion of Virtual Address in Paging With Example
3 pages
Mapping Techniques
No ratings yet
Mapping Techniques
4 pages
Data Structures Using C++ Lab - Record - II A - Dec 2020
No ratings yet
Data Structures Using C++ Lab - Record - II A - Dec 2020
68 pages
Pipelining: by Based On The Text Book "Computer Organization" by Carl Hamacher Et Al., Fifth Edition
No ratings yet
Pipelining: by Based On The Text Book "Computer Organization" by Carl Hamacher Et Al., Fifth Edition
23 pages
2.2 Properties and Limitations of Finite State Machines
No ratings yet
2.2 Properties and Limitations of Finite State Machines
3 pages
SIMD Computer Organizations
0% (1)
SIMD Computer Organizations
20 pages
Introduction of System Call
No ratings yet
Introduction of System Call
13 pages
File Accessing Methods
No ratings yet
File Accessing Methods
2 pages
Hashing in Data Structures
No ratings yet
Hashing in Data Structures
27 pages
Viewing: 1. Classical and Computer Viewing
No ratings yet
Viewing: 1. Classical and Computer Viewing
5 pages
Implementation of High-Speed and Area-Efficient VLSI Architecture of Three-Operand Binary Adder
No ratings yet
Implementation of High-Speed and Area-Efficient VLSI Architecture of Three-Operand Binary Adder
26 pages
Chapter 4 (Processors and Memory Hierarchy)
100% (1)
Chapter 4 (Processors and Memory Hierarchy)
17 pages
Microprocessors & Its Applications CSM-3305 PDF
No ratings yet
Microprocessors & Its Applications CSM-3305 PDF
255 pages
Mca 1 Sem Fundamental of Computers and Emerging Technologies Kca 101 2023
No ratings yet
Mca 1 Sem Fundamental of Computers and Emerging Technologies Kca 101 2023
2 pages
15CS45 OOC Solutions - Jun - July 2018
No ratings yet
15CS45 OOC Solutions - Jun - July 2018
19 pages
Parallelism in Computer Architecture
No ratings yet
Parallelism in Computer Architecture
27 pages
Bca Sep 2024
0% (1)
Bca Sep 2024
24 pages
Recursion in JAVA
No ratings yet
Recursion in JAVA
5 pages
Scientific Calculator Project Report
No ratings yet
Scientific Calculator Project Report
19 pages
CG Unit III Its A Notes Computer Graphics of Sppu
No ratings yet
CG Unit III Its A Notes Computer Graphics of Sppu
21 pages
Pipeline Hazards Detailed Notes
No ratings yet
Pipeline Hazards Detailed Notes
49 pages
Pipeline Optimization Techniques
No ratings yet
Pipeline Optimization Techniques
7 pages
Vector and Pipelined Processors
100% (1)
Vector and Pipelined Processors
12 pages
Parallel Computing Unit 2 - Parallel Computing Architecture
No ratings yet
Parallel Computing Unit 2 - Parallel Computing Architecture
49 pages
Touchpad Information Technology Class 9
From Everand
Touchpad Information Technology Class 9
Sanjay Jain
No ratings yet
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet
Touchpad Computer Applications Class 10
From Everand
Touchpad Computer Applications Class 10
Sanjay Jain
No ratings yet
Dining Philosophers Problem
No ratings yet
Dining Philosophers Problem
4 pages
Biruk Tewoderos 1790
No ratings yet
Biruk Tewoderos 1790
21 pages
3.distributed Mutual Exclusion
No ratings yet
3.distributed Mutual Exclusion
2 pages
(Ebook) An Introduction to Parallel Programming by Pacheco, Peter S.; Malensek, Matthew ISBN 9780128046050, 0128046058 - Quickly download the ebook to read anytime, anywhere
100% (2)
(Ebook) An Introduction to Parallel Programming by Pacheco, Peter S.; Malensek, Matthew ISBN 9780128046050, 0128046058 - Quickly download the ebook to read anytime, anywhere
80 pages
Lab 7 Scheduling Course: Operating Systems: Lecturer: Thanh Le-Hai Hoang
No ratings yet
Lab 7 Scheduling Course: Operating Systems: Lecturer: Thanh Le-Hai Hoang
9 pages
Unit-Iii Distributed Objects and Remote Invocation
No ratings yet
Unit-Iii Distributed Objects and Remote Invocation
12 pages
Indirect_Communication_Distributed_Systems
No ratings yet
Indirect_Communication_Distributed_Systems
3 pages
OS Lecture2 - CPU Scheduling
No ratings yet
OS Lecture2 - CPU Scheduling
48 pages
Introduction Hadoop Ecosystem Hdfs I Slides
No ratings yet
Introduction Hadoop Ecosystem Hdfs I Slides
12 pages
Database Management System 21: Concurrency Control
No ratings yet
Database Management System 21: Concurrency Control
18 pages
Log
No ratings yet
Log
5 pages
Lab # 2 by Akram
No ratings yet
Lab # 2 by Akram
14 pages
Synchronization Mechanisms
No ratings yet
Synchronization Mechanisms
41 pages
Distributed System Models and Enabling Technologies
No ratings yet
Distributed System Models and Enabling Technologies
23 pages
Chapter 6 - Synchronization Tools - Part 1
No ratings yet
Chapter 6 - Synchronization Tools - Part 1
24 pages
Log
No ratings yet
Log
12 pages
Merchant Engineering College, Basna Lesson Planning
No ratings yet
Merchant Engineering College, Basna Lesson Planning
4 pages
3-INSTRUCTION LEVEL PARALLELISM-12-Dec-2019Material - I - 12-Dec-2019 - ILP PDF
No ratings yet
3-INSTRUCTION LEVEL PARALLELISM-12-Dec-2019Material - I - 12-Dec-2019 - ILP PDF
15 pages
Programming Assignment: Dining Java Philosophers: 1 Administrative Details
No ratings yet
Programming Assignment: Dining Java Philosophers: 1 Administrative Details
2 pages
OS Question Bank To Exam Section
No ratings yet
OS Question Bank To Exam Section
7 pages
Operating System - PCCCS503 - Study Material
No ratings yet
Operating System - PCCCS503 - Study Material
56 pages
crash_20241208
No ratings yet
crash_20241208
3 pages
UltraViewerService Log
No ratings yet
UltraViewerService Log
110 pages
5 Marks Q. Describe Array Processor Architecture
No ratings yet
5 Marks Q. Describe Array Processor Architecture
11 pages
WS 2.4
0% (1)
WS 2.4
3 pages
Lec 14 Reader Writer Problem and Monitor
No ratings yet
Lec 14 Reader Writer Problem and Monitor
10 pages
Unit 1 Modern Processors
No ratings yet
Unit 1 Modern Processors
52 pages
ch6: Operating System Process Synchronization
No ratings yet
ch6: Operating System Process Synchronization
36 pages