Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (1 vote)
1K views

Multiprocessor Architecture: Taxonomy of Parallel Architectures

1. The document discusses computer system architecture and parallel processing. It covers Flynn's taxonomy of parallel architectures which includes single-instruction single-data (SISD), single-instruction multiple-data (SIMD), multiple-instruction single-data (MISD), and multiple-instruction multiple-data (MIMD) systems. 2. MIMD systems are further divided into shared-memory and distributed-memory models. Shared-memory MIMD machines have all processors connected to a single global memory while distributed-memory MIMD machines have local memory for each processor. 3. The document also discusses cache coherence issues that can arise in shared-memory multiprocessor systems and outlines some of the requirements
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
1K views

Multiprocessor Architecture: Taxonomy of Parallel Architectures

1. The document discusses computer system architecture and parallel processing. It covers Flynn's taxonomy of parallel architectures which includes single-instruction single-data (SISD), single-instruction multiple-data (SIMD), multiple-instruction single-data (MISD), and multiple-instruction multiple-data (MIMD) systems. 2. MIMD systems are further divided into shared-memory and distributed-memory models. Shared-memory MIMD machines have all processors connected to a single global memory while distributed-memory MIMD machines have local memory for each processor. 3. The document also discusses cache coherence issues that can arise in shared-memory multiprocessor systems and outlines some of the requirements
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Subject: Computer System Architecture

Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

Multiprocessor architecture
Taxonomy of Parallel Architectures
Parallel computing is a computing where the jobs are
broken into discrete parts that can be executed concurrently.
Each part is further broken down to a series of instructions.
Instructions from each part execute simultaneously on
different CPUs.

Parallel systems deal with the simultaneous use of multiple


computer resources that can include a single computer with
multiple processors, a number of computers connected by a
network to form a parallel processing cluster or a combination
of both.

Parallel systems are more difficult to program than computers


with a single processor because the architecture of parallel
computers varies accordingly and the processes of multiple
CPUs must be coordinated and synchronized.

The crux of parallel processing are CPUs. Based on the


number of instruction and data streams that can be
processed simultaneously, computing systems are classified
into four major categories:

1|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

Flynn’s classification –

1. Single-instruction, single-data (SISD) systems –


An SISD computing system is a uniprocessor machine
which is capable of executing a single instruction,
operating on a single data stream. In SISD, machine
instructions are processed in a sequential manner and
computers adopting this model are popularly called
sequential computers. Most conventional computers
have SISD architecture. All the instructions and data to
be processed have to be stored in primary memory.

2|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

The speed of the processing element in the SISD


model is limited(dependent) by the rate at which the
computer can transfer information internally. Dominant
representative SISD systems are IBM PC,
workstations.
2. Single-instruction, multiple-data (SIMD) systems

An SIMD system is a multiprocessor machine capable
of executing the same instruction on all the CPUs but
operating on different data streams. Machines based
on an SIMD model are well suited to scientific
computing since they involve lots of vector and matrix
operations. So that the information can be passed to
all the processing elements (PEs) organized data
elements of vectors can be divided into multiple
sets(N-sets for N PE systems) and each PE can process
one data set.

Dominant representative SIMD systems is Cray’s


vector processing machine.
3|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

3. Multiple-instruction, single-data (MISD) systems



An MISD computing system is a multiprocessor
machine capable of executing different instructions on
different PEs but all of them operating on the same
dataset .

Example Z = sin(x)+cos(x)+tan(x)
The system performs different operations on the same
data set. Machines built using the MISD model are not
useful in most of the application, a few machines are
built, but none of them are available commercially.
4. Multiple-instruction, multiple-data (MIMD)
systems –
An MIMD system is a multiprocessor machine which is
capable of executing multiple instructions on multiple
data sets. Each PE in the MIMD model has separate
instruction and data streams; therefore machines built
using this model are capable to any kind of application.
Unlike SIMD and MISD machines, PEs in MIMD
machines work asynchronously.

4|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

MIMD machines are broadly categorized into shared-


memory MIMD and distributed-memory
MIMD based on the way PEs are coupled to the main
memory.
In the shared memory MIMD model (tightly coupled
multiprocessor systems), all the PEs are connected to
a single global memory and they all have access to it.
The communication between PEs in this model takes
place through the shared memory, modification of the
data stored in the global memory by one PE is visible
to all other PEs. Dominant representative shared
memory MIMD systems are Silicon Graphics machines
and Sun/IBM’s SMP (Symmetric Multi-Processing).
In Distributed memory MIMD machines (loosely
coupled multiprocessor systems) all PEs have a local
memory. The communication between PEs in this
model takes place through the interconnection network
(the inter process communication channel, or IPC). The
network connecting PEs can be configured to tree,
mesh or in accordance with the requirement.
The shared-memory MIMD architecture is easier to
program but is less tolerant to failures and harder to
extend with respect to the distributed memory MIMD
model. Failures in a shared-memory MIMD affect the
entire system, whereas this is not the case of the
distributed model, in which each of the PEs can be
easily isolated. Moreover, shared memory MIMD
architectures are less likely to scale because the
addition of more PEs leads to memory contention. This

5|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

is a situation that does not happen in the case of


distributed memory, in which each PE has its own
memory. As a result of practical outcomes and user’s
requirement , distributed memory MIMD architecture is
superior to the other existing models.

Centralized shared memory architecture


• The use of large multilevel caches can substantially
reduce memory bandwidth demands of a processor.

• This has made it possible for several


(micro)processors to share the same memory
through a shared bus.

• Caching supports both private and shared data.


o For private data, once cached, it's

treatment is identical to that of a


uniprocessor.
o For shared data, the shared value may be

replicated in many caches.

Replication has several advantages:


• Reduced latency and memory bandwidth requirements.


• Reduced contention for data items that are read by
multiple processors simultaneously.

• However, it also introduces a problem: Cache


coherence .
6|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

Cache Coherence
• With multiple caches, one CPU can modify
memory at locations that other CPUs have cached.

For example:

• CPU A reads location x, getting the value N .


• Later, CPU B reads the same location, getting the
value N .
• Next, CPU A writes location x with the value N - 1 .
• At this point, any reads from CPU B will get the
value N , while reads from CPU A will get the value N -
1.

• This problem occurs both with write-


through caches and (more seriously) with write-
back caches.

• Cache coherence : informal definition:


o A memory system is coherent if any read

of a data item returns the most recently


written value of that data item.

• Upon closer inspection, there are several aspects


that need to be addressed.
Cache Coherence
• Coherence defines what values can be returned by a
read.

7|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

A memory system is coherent if:


• Read after write works for a single processor.


o If CPU A writes N to location X, all

future reads of location X will return N


if no other processor writes location X
after CPU A.

• Other processors' writes eventually propagate.


o If CPU A writes value N to location X,

CPU B will eventually be able to read


value N from location X.

o Once it does so, it will continue to read


value N until location X is written
again.

o This is our intuitive notion of a coherent


view of memory.
Cache Coherence

• Writes to a single location are serialized.


o If CPUs A and B both write to location

X, all processors see the same order of the


writes.

o This does not mean that all reads must


return the same value.

8|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

▪ If value N1 is written "first" to


location X, followed closely by
reads of X and a write of X with
value N2, some reads may
return N1 and some N2.

▪ However, a processor that reads


N2 will return N2 for all future
reads.

• Consistency :
o This indicates when a modification to

memory is seen by other processors (i.e.


will be returned by a read).

o Clearly, this can NOT be


"instantaneous" since it may be that the
new value has not even left the processor
when a read occurs.
Cache Coherence
• Consistency :
o The issue of when a written value MUST

be seen by a reader is defined by a


memory consistency model.

o For now, let's assume that a write is not


complete until all processors have "seen"
the effect of the write.
9|P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

o Also, assume that a processor may not


reorder memory accesses to move reads
before an outstanding write.
▪ Reads can be reordered,

but reads and writes can not be


interchanged.

Coherent caches provide both:


• Replication of shared data items (reduces latency and


contention).
o Here, the purpose is to provide multiple

copies of data so that several processors


can access a single piece of memory
without serialization.

• Migration of data items (reduces latency).


o Data items are moved from one processor

to another as needed.
Cache-Coherence Protocols
• Small-scale multiprocessor use hardware
mechanisms to track the state of data blocks that
are shared.

Two classes of protocols:


• Directory based.

10 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

o The sharing status of a block of physical


memory is kept in one location (the
directory).

• Snooping.
o The sharing status is distributed and
kept with the block in each cache.

o The caches are usually on a shared


memory bus.
▪ The cache controllers snoop the

bus to watch for transactions


that occur on data blocks that
they hold.
Bus Snooping Protocols

• Write invalidate.
o It is the most common protocol, both for

snooping and for directory schemes.

o The basic idea behind this protocol is that


writes to a location invalidate other
caches' copies of the block.
▪ Reads by other processors on

invalidated data cause cache


misses.

11 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

▪ If two processors write at the


same time, one wins and obtains
exclusive access.
Conten Conten Conten
Process Bus ts of ts of ts of
or activit CPU CPU mem
activity y A's B's locatio
cache cache nX
CPU A Cache
0 0
reads X miss
CPU B Cache
0 0 0
reads X miss
CPU A Invalida
1 0
writes 1 te
CPU B Cache
1 1 1
reads X miss
▪ This example assumes a write-
back cache.
Bus Snooping Protocols

• Write broadcast (write update).


o An alternative is to update all cached

copies of the data item when it is written.

o To reduce bandwidth requirements, this


protocol keeps track of whether or not a
word in the cache is shared.

12 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

▪ If not, no broadcast is
necessary.
Conten Conten Conten
Process Bus ts of ts of ts of
or activit CPU CPU mem
activity y A's B's locatio
cache cache nX
CPU A Cache
0 0
reads X miss
CPU B Cache
0 0 0
reads X miss
CPU A Broadca
1 1 1
writes 1 st
CPU B
1 1 1
reads X
▪ This example also assumes
a write-back cache.
Performance Differences between Bus Snooping Protocols
• Write invalidate is much more popular.

This is due primarily to the performance


differences.
• Multiple writes to the same word with no intervening
reads require multiple broadcasts.
• With multiword cache blocks, each word written requires
a broadcast.

13 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

For write invalidate, the first word


o

written invalidates.
o Also write invalidate works on blocks ,

while write broadcast must work on


individual words or bytes.
• The delay between writing by one processor and reading
by another is lower in the write broadcast scheme.
o For write invalidate, the read causes a

miss.

• Since bus and memory bandwidth are more


important in a bus-based multiprocessor, write
invalidation performs better.

• Therefore, we focus on implementation of the write


invalidate protocol.
Implementation of Write Invalidate Protocols
• Write invalidate is simple in bus-based schemes.
o Acquire the bus and broadcast the

address to be invalidated.

• Since all processors snoop the bus, they can check


the address against items in their cache.

• Bus acquisition also serializes write operations to


the same memory location.
o Writes to a shared data item cannot

complete until the bus is acquired.


14 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

• What about locating a data item when a cache miss


occurs ?
o For write-through , it's in memory.

o For write-back , snooping can be used.

▪ If a processor finds that it has a

dirty copy of the requested cache


block, it provides the block
instead of memory.

• Note, write-back caches are greatly preferred in


a multiprocessor environment since they reduce
memory bandwidth.
Implementation of Write Invalidate Protocol on Write-Back caches.
• Writes are the issue here.

• We would like to know if any other caches contain


the block to be written by a processor.
o If there are none, then the write

need not be placed on the bus.


o This reduces the time to complete the

write and reduces memory bandwidth.

• This can be tracked by adding an extra state bit (in


addition to the valid and dirty bits) that indicates if
the block is shared.

15 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

o If the bit is set (the block is shared), the


cache generates an invalidation on the
bus and marks the block as private.

o If another processor later requests the


block, the miss is snooped and the
"owner" sets the state bit to shared.
Implementation of Write Invalidate Protocol on Write-Back caches.
• Note that every bus transaction checks cache-
address tags.
o This could potentially interfere with

CPU cache access.

This interference can be reduced by:


• Duplicating the tags.


o Bus access can proceed in parallel with

CPU access.

o On misses, the processor must arbitrate


for and update both sets of tags.
▪ The same is true for the snoop

(to perform an invalidate or to


update the shared bit).

o However, a snoop may require fetching a


block.
▪ This is the only instance that

may cause a stall.


16 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

Implementation of Write Invalidate Protocol on Write-Back caches.

• Employing a multilevel cache with inclusion.


o Every entry in L1 is in L2.

▪ Therefore, snooping can be

directed to L2, where there are


fewer processor accesses.

o If a snoop gets a hit, then it must


arbitrate for L1 to update state and
possibly retrieve data.
▪ This usually stalls the processor.

o Since it is popular to use multi-level


caches in multiprocessors (to reduce
memory bandwidth), this solution is
usually adopted.

o It is also possible to duplicate the tags in


L2 to further reduce contention.
An Example Centralized Shared-Memory Snooping Protocol
• Implemented by incorporating a finite state
controller in each node.

• The controller responds to requests from the


processor and bus:
To simplify the controller, write hits and write
misses to shared blocks are treated as write misses.

17 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

Request Source Function


Read hit Processor Read data in cache.
Write hit Processor Write data in cache.
Read Request data from cache or
Bus
miss memory.
Write Request data from cache or memory
Bus
miss (perform any needed invalidates).
o This causes processors with copies to
invalidate them.
An Example Centralized Shared-Memory Snooping Protocol
• Write invalidation and a write-back cache
assumed:

An Example Centralized Shared-Memory Snooping Protocol


These state transitions have no analog in a uniprocessor cache
controller.
18 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

An Example Centralized Shared-Memory Snooping Protocol


Complications we have ignored:

• Assumes that operations are atomic .


o In reality, a write miss is not atomic --

just too much work to do.


o Also, read misses on a split transaction

bus are not atomic.


• Nonatomic actions introduce the possibility that the
protocol can deadlock .
o See Appendix E for a fix.

Two major simplifications:


• Real protocols distinguish between write hits and write


misses.

19 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

From the shared state, a write


o

miss would require the action shown


previously.
o However, a write hit does not require

that the data be fetched since it is up-to-


date.
▪ All that is needed is an

invalidate operation.
• Real protocols distinguish between shared and clean data
in exactly one cache.
o A "clean and private" state eliminates

the need to generate a bus transaction on


a write to a "clean and private" block.

Architecture of Distributed Shared


Memory(DSM)
Distributed Shared Memory (DSM) implements the distributed systems
shared memory model in a distributed system, that hasn’t any physically
shared memory. Shared model provides a virtual address area shared
between any or all nodes.

To beat the high forged of communication in distributed system. DSM


memo, model provides a virtual address area shared between all nodes.
systems move information to the placement of access. Information moves
between main memory and secondary memory (within a node) and
between main recollections of various nodes.

Every Greek deity object is in hand by a node. The initial owner is that the
node that created the object. possession will amendment as the object
moves from node to node. Once a method accesses information within the
shared address space, the mapping manager maps shared memory address
to physical memory (local or remote).

20 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

DSM permits programs running on separate reasons to share information while


not the software engineer having to agitate causation message instead
underlying technology can send the messages to stay the DSM consistent
between compute. DSM permits programs that wont to treat constant laptop to
be simply tailored to control on separate reason. Programs access what seems to
them to be traditional memory.
Hence, programs that Pine Tree State DSM square measure sometimes shorter
and easier to grasp than programs that use message passing. But, DSM isn’t
appropriate for all things. Client-server systems square measure typically less
suited to DSM, however, a server is also wont to assist in providing DSM
practicality for information shared between purchasers.

Architecture of Distributed Shared Memory (DSM) :


Every node consists of 1 or additional CPU’s and a memory unit. High-speed
communication network is employed for connecting the nodes. A straightforward
21 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

message passing system permits processes on completely different nodes to


exchange one another.
Memory mapping manager unit :
Memory mapping manager routine in every node maps the native memory onto
the shared computer storage. For mapping operation, the shared memory house
is divided into blocks.
Information caching may be a documented answer to deal with operation
latency. DMA uses information caching to scale back network latency. the most
memory of the individual nodes is employed to cache items of the shared
memory house.
Memory mapping manager of every node reads its native memory as an
enormous cache of the shared memory house for its associated processors. The
bass unit of caching may be a memory block. Systems that support DSM,
information moves between secondary memory and main memory also as
between main reminiscences of various nodes.
Communication Network Unit :
Once method access information within the shared address house mapping
manager maps the shared memory address to the physical memory. The mapped
layer of code enforced either within the operating kernel or as a runtime routine.
Physical memory on every node holds pages of shared virtual–address house.
Native pages area unit gift in some node’s memory. Remote pages in some other
node’s memory.

Interconnection Network
Interconnection networks are composed of switching elements. Topology
is the pattern to connect the individual switches to other elements, like
processors, memories and other switches. A network allows exchange of
data between processors in the parallel system.
• Direct connection networks − Direct networks have point-to-point
connections between neighboring nodes. These networks are
static, which means that the point-to-point connections are fixed.
Some examples of direct networks are rings, meshes and cubes.
• Indirect connection networks − Indirect networks have no fixed
neighbors. The communication topology can be changed
dynamically based on the application demands. Indirect networks
can be subdivided into three parts: bus networks, multistage
networks and crossbar switches.
o Bus networks − A bus network is composed of a number of
bit lines onto which a number of resources are attached.
When busses use the same physical lines for data and
22 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

addresses, the data and the address lines are time


multiplexed. When there are multiple bus-masters attached
to the bus, an arbiter is required.
o Multistage networks − A multistage network consists of
multiple stages of switches. It is composed of ‘axb’ switches
which are connected using a particular interstage connection
pattern (ISC). Small 2x2 switch elements are a common
choice for many multistage networks. The number of stages
determine the delay of the network. By choosing different
interstage connection patterns, various types of multistage
network can be created.
o Crossbar switches − A crossbar switch contains a matrix of
simple switch elements that can switch on and off to create
or break a connection. Turning on a switch element in the
matrix, a connection between a processor and a memory can
be made. Crossbar switches are non-blocking, that is all
communication permutations can be performed without
blocking.

Evaluating Design Trade-offs in Network


Topology
If the main concern is the routing distance, then the dimension has to be
maximized and a hypercube made. In store-and-forward routing,
assuming that the degree of the switch and the number of links were not
a significant cost factor, and the numbers of links or the switch degree
are the main costs, the dimension has to be minimized and a mesh built.
In worst case traffic pattern for each network, it is preferred to have high
dimensional networks where all the paths are short. In patterns where
each node is communicating with only one or two nearby neighbors, it is
preferred to have low dimensional networks, since only a few of the
dimensions are actually used.
Routing
The routing algorithm of a network determines which of the possible
paths from source to destination is used as routes and how the route
followed by each particular packet is determined. Dimension order
routing limits the set of legal paths so that there is exactly one route from
each source to each destination. The one obtained by first traveling the
correct distance in the high-order dimension, then the next dimension and
so on.
23 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

Routing Mechanisms
Arithmetic, source-based port select, and table look-up are three
mechanisms that high-speed switches use to determine the output
channel from information in the packet header. All of these mechanisms
are simpler than the kind of general routing computations implemented
in traditional LAN and WAN routers. In parallel computer networks, the
switch needs to make the routing decision for all its inputs in every cycle,
so the mechanism needs to be simple and fast.
Deterministic Routing
A routing algorithm is deterministic if the route taken by a message is
determined exclusively by its source and destination, and not by other
traffic in the network. If a routing algorithm only selects shortest paths
toward the destination, it is minimal, otherwise it is non-minimal.
Deadlock Freedom
Deadlock can occur in a various situations. When two nodes attempt to
send data to each other and each begins sending before either receives,
a ‘head-on’ deadlock may occur. Another case of deadlock occurs, when
there are multiple messages competing for resources within the network.
The basic technique for proving a network is deadlock free, is to clear the
dependencies that can occur between channels as a result of messages
moving through the networks and to show that there are no cycles in the
overall channel dependency graph; hence there is no traffic patterns that
can lead to a deadlock. The common way of doing this is to number the
channel resources such that all routes follow a particular increasing or
decreasing sequences, so that no dependency cycles arise.
Switch Design
Design of a network depends on the design of the switch and how the
switches are wired together. The degree of the switch, its internal routing
mechanisms, and its internal buffering decides what topologies can be
supported and what routing algorithms can be implemented. Like any
other hardware component of a computer system, a network switch
contains data path, control, and storage.
Ports
The total number of pins is actually the total number of input and output
ports times the channel width. As the perimeter of the chip grows slowly
compared to the area, switches tend to be pin limited.

24 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

Internal Datapath
The datapath is the connectivity between each of the set of input ports
and every output port. It is generally referred to as the internal cross-bar.
A non-blocking cross-bar is one where each input port can be connected
to a distinct output in any permutation simultaneously.
Channel Buffers
The organization of the buffer storage within the switch has an important
impact on the switch performance. Traditional routers and switches tend
to have large SRAM or DRAM buffers external to the switch fabric, while
in VLSI switches the buffering is internal to the switch and comes out of
the same silicon budget as the datapath and the control section. As the
chip size and density increases, more buffering is available and the
network designer has more options, but still the buffer real-estate comes
at a prime choice and its organization is important.
Flow Control
When multiple data flows in the network attempt to use the same shared
network resources at the same time, some action must be taken to
control these flows. If we don’t want to lose any data, some of the flows
must be blocked while others proceed.
The problem of flow control arises in all networks and at many levels. But
it is qualitatively different in parallel computer networks than in local and
wide area networks. In parallel computers, the network traffic needs to
be delivered about as accurately as traffic across a bus and there are a
very large number of parallel flows on very small-time scale.

Memory Consistency Model


• A consistency model is contract between a distributed
data store and processes, in which the processes agree to
obey certain rules in contrast the store promises to work
correctly.
• A consistency model basically refers to the degree of
consistency that should be maintained for the shared
memory data.
• If a system supports the stronger consistency model, then
the weaker consistency model is automatically supported
but the converse is not true.

25 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

• The types of consistency models are Data-Centric and


client centric consistency models.

1.Data-Centric Consistency Models


A data store may be physically distributed across multiple
machines. Each process that can access data from the store is
assumed to have a local or nearby copy available of the entire
store.
i.Strict Consistency model

• Any read on a data item X returns a value corresponding


to the result of the most recent write on X
• This is the strongest form of memory coherence which has
the most stringent consistency requirement.
• Strict consistency is the ideal model but it is impossible to
implement in a distributed system. It is based on absolute
global time or a global agreement on commitment of
changes.

ii.Sequential Consistency

• Sequential consistency is an important data-centric


consistency model which is a slightly weaker consistency
model than strict consistency.
• A data store is said to be sequentially consistent if the
result of any execution is the same as if the (read and
write) operations by all processes on the data store were
executed in some sequential order and the operations of
each individual process should appear in this sequence in
a specified order.
• Example: Assume three operations read(R1), write(W1),
read(R2) performed in an order on a memory address.
Then (R1,W1,R2),(R1,R2,W1),(W1,R1,R2)(R2,W1,R1) are
acceptable provided all processes see the same ordering.

iii.Linearizability

• It that is weaker than strict consistency, but stronger than


sequential consistency.

26 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

• A data store is said to be linearizable when each operation


is timestamped and the result of any execution is the
same as if the (read and write) operations by all
processes on the data store were executed in some
sequential order
• The operations of each individual process appear in
sequence order specified by its program.
• If tsOP1(x)< tsOP2(y), then operation OP1(x) should
precede OP2(y) in this sequence.

iv.Causal Consistency

• It is a weaker model than sequential consistency.


• In Casual Consistency all processes see only those
memory reference operations in the correct order that are
potentially causally related.
• Memory reference operations which are not related may
be seen by different processes in different order.
• A memory reference operation is said to be casually
related to another memory reference operation if the first
operation is influenced by the second operation.
• If a write(w2) operation is casually related to another
write (w1) the acceptable order is (w1, w2).

v.FIFO Consistency

• It is weaker than causal consistency.


• This model ensures that all write operations performed by
a single process are seen by all other processes in the
order in which they were performed like a single process
in a pipeline.
• This model is simple and easy to implement having good
performance because processes are ready in the pipeline.
• Implementation is done by sequencing write operations
performed at each node independently of the operations
performed on other nodes.
• Example: If (w11) and (w12) are write operations
performed by p1 in that order and (w21),(w22) by p2. A
process p3 can see them as [(w11,w12),(w21,w2)] while
p4 can view them as [(w21,w2),(w11,w12)].

27 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

vi.Weak consistency

• The basic idea behind the weak consistency model is


enforcing consistency on a group of memory reference
operations rather than individual operations.
• A Distributed Shared Memory system that supports the
weak consistency model uses a special variable called a
synchronization variable which is used to synchronize
memory.
• When a process accesses a synchronization variable, the
entire memory is synchronized by making visible the
changes made to the memory to all other processes.

vii.Release Consistency

• Release consistency model tells whether a process is


entering or exiting from a critical section so that the
system performs either of the operations when a
synchronization variable is accessed by a process.
• Two synchronization variables acquire and release are
used instead of single synchronization variable. Acquire is
used when process enters critical section and release is
when it exits a critical section.
• Release consistency can be viewed as synchronization
mechanism based on barriers instead of critical sections.

viii.Entry Consistency

• In entry consistency every shared data item is associated


with a synchronization variable.
• In order to access consistent data, each synchronization
variable must be explicitly acquired.
• Release consistency affects all shared data but entry
consistency affects only those shared data associated with
a synchronization variable.

2.Client-Centric Consistency Models

28 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

• Client-centric consistency models aim at providing a


system wide view on a data store.
• This model concentrates on consistency from the
perspective of a single mobile client.
• Client-centric consistency models are generally used for
applications that lack simultaneous updates were most
operations involve reading data.

i.Eventual Consistency

• In Systems that tolerate high degree of inconsistency, if


no updates take place for a long time all replicas will
gradually and eventually become consistent. This form of
consistency is called eventual consistency.
• Eventual consistency only requires those updates that
guarantee propagation to all replicas.
• Eventual consistent data stores work fine as long as
clients always access the same replica.
• Write conflicts are often relatively easy to solve when
assuming that only a small group of processes can
perform updates. Eventual consistency is therefore often
cheap to implement.

ii.Monotonic Reads Consistency

• A data store is said to provide monotonic-read consistency


if a process reads the value of a data item x, any
successive read operation on x by that process will always
return that same value or a more recent value.
• A process has seen a value of x at time t, it will never see
an older version of x at a later time.
• Example: A user can read incoming mail while moving.
Each time the user connects to a different e-mail server,
that server fetches all the updates from the server that
the user previously visited. Monotonic Reads guarantees
that the user sees all updates, no matter from which
server the automatic reading takes place.

iii.Monotonic Writes
29 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

• A data store is said to be monotonic-write consistent if a


write operation by a process on a data item x is completed
before any successive write operation on X by the same
process.
• A write operation on a copy of data item x is performed
only if that copy has been brought up to date by means of
any preceding write operations, which may have taken
place on other copies of x.
• Example: Monotonic-write consistency guarantees that if
an update is performed on a copy of Server S, all
preceding updates will be performed first. The resulting
server will then indeed become the most recent version
and will include all updates that have led to previous
versions of the server.

iv.Read Your Writes

• A data store is said to provide read-your-writes


consistency if the effect of a write operation by a process
on data item x will always be a successive read operation
on x by the same process.
• A write operation is always completed before a successive
read operation by the same process no matter where that
read operation takes place.
• Example: Updating a Web page and guaranteeing that the
Web browser shows the newest version instead of its
cached copy.

v.Writes Follow Reads

• A data store is said to provide writes-follow-reads


consistency if a process has write operation on a data
item x following a previous read operation on x then it is
guaranteed to take place on the same or a more recent
value of x that was read.
• Any successive write operation by a process on a data
item x will be performed on a copy of x that is up to date
with the value most recently read by that process.
• Example: Suppose a user first reads an article A then
posts a response B. By requiring writes-follow-reads

30 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

consistency, B will be written to any copy only after A has


been written.

Cluster Computers
Cluster is a set of loosely or tightly connected computers working together as
a unified computing resource that can create the illusion of being one
machine. Computer clusters have each node set to perform the same task,
controlled and produced by software.
The components of a clusters are usually connected to each other using fast
area networks, with each node running its own instance of an operating
system. In most circumstances, all the nodes uses same hardware and the
same operating system, although in some setups different hardware or
different operating system can be used.

Types of Clusters –
Computer Clusters are arranged together in such a way so as to support
different purpose from general purpose business needs such as web-service
support, to computation intensive scientific calculation. Basically there are
three types of Clusters, they are:
• Load-Balancing Cluster – A cluster requires an effective capability
for balancing the load among available computers. In this, cluster
nodes share computational workload so as to enhance the overall
performance. For example- a high-performance cluster used for

31 | P a g e
Contact: 7008443534, 9090042626
Subject: Computer System Architecture
Created By: Asst. Prof. SK ABDUL ISRAR College: ABA, BLS

scientific calculation would balance load from different algorithms


from the web-server cluster, which may just use a round-robin
method by assigning each new request to a different node.This type
of cluster is used on farms of Web servers (web farm).
• Fail-Over Clusters – The function of switching applications and
data resources over from a failed system to an alternative system in
the cluster is referred to as fail-over. These types are used to
cluster database of critical mission, mail, file and application
servers
• High-Availability Clusters – These are also known as “HA
clusters”. They offer a high probability that all the resources will be
in service. If a failure does occur, such as a system goes down or a
disk volume is lost, then the queries in progress are lost. Any lost
query, if retried, will be serviced by different computer in the cluster.
This type of cluster is widely used in web, email, news or FTP
servers.
Benefits –
• Absolute scalability – It is possible to create a large clusters that
beats the power of even the largest standalone machines. A cluster
can have dozens of multiprocessor machines.
• Additional scalability – A cluster is configured in such a way that it
is possible to add new systems to the cluster in small increment.
Clusters have the ability to add systems horizontally. This means
that more computers may be added to the clusters to improve its
performance, redundancy and fault tolerance(the ability for a
system to continue working with a malfunctioning of node).
• High availability – As we know that each node in a cluster is a
standalone computer, the failure of one node does not mean loss of
service. A single node can be taken down for maintenance, while
the rest of the clusters takes on the load of that individual node.
• Preferable price/performance – Clusters are usually set up to
improve performance and availability over single computers, while
typically being much more cost effective than single computers of
comparable speed or availability.

32 | P a g e
Contact: 7008443534, 9090042626

You might also like