Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
90 views

Advanced Computer Architecture 2

The document discusses different types of pipelines used in computer architecture. It defines linear and non-linear pipelines. A linear pipeline consists of processing stages connected in a serial manner, with data flowing sequentially from stage to stage. A non-linear pipeline allows feedback and feedforward connections between stages and can reconfigure to perform different functions dynamically. The key differences are that linear pipelines perform fixed functions in a serial manner, while non-linear pipelines allow variable, interconnected functions through feedback loops.

Uploaded by

Pritesh Pawar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views

Advanced Computer Architecture 2

The document discusses different types of pipelines used in computer architecture. It defines linear and non-linear pipelines. A linear pipeline consists of processing stages connected in a serial manner, with data flowing sequentially from stage to stage. A non-linear pipeline allows feedback and feedforward connections between stages and can reconfigure to perform different functions dynamically. The key differences are that linear pipelines perform fixed functions in a serial manner, while non-linear pipelines allow variable, interconnected functions through feedback loops.

Uploaded by

Pritesh Pawar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

INSTITUTE OF ENGINEERING,

JIWAJI UNIVERSITY

ADVANCED COMPUTER ARCHITECTURE


ASSIGNMENT-02

PRIYANKA PAWAR
3RD YEAR(CSE)

171489954
Ques1: What is Pipeline?
Answer:
Pipelining is the process of accumulating instruction from the
processor through a pipeline. It allows storing and executing
instructions in an orderly process. It is also known as pipeline
processing.
Pipelining is a technique where multiple instructions are overlapped
during execution. Pipeline is divided into stages and these stages are
connected with one another to form a pipe like structure.
Instructions enter from one end and exit from another end.
Pipelining increases the overall instruction throughput.
In pipeline system, each segment consists of an input register
followed by a combinational circuit. The register is used to hold data
and combinational circuit performs operations on it. The output of
combinational circuit is applied to the input register of the next
segment.
Pipeline system is like the modern day assembly line setup in
factories. For example in a car manufacturing industry, huge
assembly lines are setup and at each point, there are robotic arms to
perform a certain task, and then the car moves on ahead to the next
arm.
Types of Pipeline:
It is divided into 2 categories:

1. Arithmetic Pipeline
2. Instruction Pipeline

Arithmetic Pipeline:
Arithmetic pipelines are usually found in most of the computers.
They are used for floating point operations, multiplication of fixed
point numbers etc. For example: The input to the Floating Point
Adder pipeline is:
X = A*2^a
Y = B*2^b
Here A and B are mantissas (significant digit of floating point
numbers), while a and b are exponents.
The floating point addition and subtraction is done in 4 parts:

1. Compare the exponents.


2. Align the mantissas.
3. Add or subtract mantissas
4. Produce the result.

Registers are used for storing the intermediate results between the
above operations.
Instruction Pipeline:
In this a stream of instructions can be executed by
overlapping fetch, decode and execute phases of an instruction cycle.
This type of technique is used to increase the throughput of the
computer system.
An instruction pipeline reads instruction from the memory while
previous instructions are being executed in other segments of the
pipeline. Thus we can execute multiple instructions simultaneously.
The pipeline will be more efficient if the instruction cycle is divided
into segments of equal duration.
Pipeline Conflicts:
There are some factors that cause the pipeline to deviate its normal
performance. Some of these factors are given below:
1. Timing Variations: All stages cannot take same amount of time.
This problem generally occurs in instruction processing where
different instructions have different operand requirements and thus
different processing time.
2. Data Hazards: When several instructions are in partial execution,
and if they reference same data then the problem arises. We must
ensure that next instruction does not attempt to access data before
the current instruction, because this will lead to incorrect results.
3. Branching: In order to fetch and execute the next instruction, we
must know what that instruction is. If the present instruction is a
conditional branch, and its result will lead us to the next instruction,
then the next instruction may not be known until the current one is
processed.
4. Interrupts: Interrupts set unwanted instruction into the
instruction stream. Interrupts effect the execution of instruction.
5. Data Dependency: It arises when an instruction depends upon
the result of a previous instruction but this result is not yet
available.
Advantages of Pipelining:

1. The cycle time of the processor is reduced.


2. It increases the throughput of the system
3. It makes the system reliable.

Disadvantages of Pipelining:

1. The design of pipelined processor is complex and costly to


manufacture.
2. The instruction latency is more.

Ques2: Explain Pipeline Hazard?


Answer:
 Pipelining doesn’t help latency of single task, it helps
throughput of entire workload
 Pipeline rate limited by slowest pipeline stage o Multiple tasks
operating simultaneously
 Potential speedup = Number of pipe stages
 Unbalanced lengths of pipe stages reduces speedup
 Time to “fill” pipeline and time to “drain” it reduces speedup o
Unbalanced lengths of pipe stages reduces speedup
 Execute billions of instructions, so throughput is what matters
o Data path design – Ideally we expect a CPI value of 1
 What is desirable in instruction sets for pipelining?
 Variable length instructions vs. all instructions same length?
 Memory operands part of any operation vs. memory operands
only in loads or stores?
 Register operand many places in instruction format vs.
registers located in same place?

 There are three classes of hazards:

 Structural Hazards: They arise from resource conflicts


when the hardware cannot support all possible
combinations of instructions in simultaneous overlapped
execution.
  Data Hazards: They arise when an instruction depends
on the result of a previous instruction in a way that is
exposed by the overlapping of instructions in the pipeline.
  Control Hazards: They arise from the pipelining of
branches and other instructions that change the PC.

How do we deal with hazards?

• Often, pipeline must be stalled.

• Stalling pipeline usually lets some instruction(s) in pipeline


proceed; another/others wait for data, resource, etc.

• A note on terminology:

– If we say an instruction was “issued later than instruction x”, we


mean that it was issued after instruction x and is not as far along in
the pipeline.

– If we say an instruction was “issued earlier than instruction x”, we


mean that it was issued before instruction x and is further along in
the pipeline.
Ques3: Explain linear and non-linear pipeline
processor?
Answer:
A dynamic pipeline can be reconfigured to perform variable
functions at different times. It allows feed-forward and feedback
connections in addition to the streamline connection.
 Linear Pipeline :
Linear pipeline is a pipeline in which a series of processors are
connected together in a serial manner. In linear pipeline the data
flows from the first block to the final block of processor. The
processing of data is done in a linear and sequential manner. The
input is supplied to the first block and we get the output from the
last block till which the processing of data is being done. The linear
pipelines can be further being divided into synchronous and
asynchronous models.

Non-Linear Pipeline :
Non-Linear pipeline is a pipeline which is made of different
pipelines that are present at different stages. The different pipelines
are connected to perform multiple functions. It also has feedback
and feed-forward connections. It is made such that it performs
various functions at different time intervals. In Non-Linear pipeline
the functions are dynamically assigned.

  Difference Between Linear and Non-Linear pipeline:

Linear Pipeline Non-Linear Pipeline

Linear pipeline are static pipeline Non-Linear pipeline are dynamic pipeline because
because they are used to perform they can be reconfigured to perform variable
fixed functions. functions at different times.
Linear pipeline allows only Non-Linear pipeline allows feed-forward and
streamline connections. feedback connections in addition to the streamline
connection.
It is relatively easy to partition a Function partitioning is relatively difficult because
given function into a sequence of the pipeline stages are interconnected with loops in
linearly ordered sub functions. addition to streamline connections.
The Output of the pipeline is The Output of the pipeline is not necessarily
produced from the last stage. produced from the last stage.
The reservation table is trivial in The reservation table is non-trivial in the sense tha
the sense that data flows in there is no linear streamline for data flows.
linear streamline.
Static pipelining is specified by Dynamic pipelining is specified by more than one
single Reservation table. Reservation table.

All initiations to a static pipeline A dynamic pipeline may allow different initiations t
use the same reservation table. follow a mix of reservation tables.

A Three stage Pipeline


  Reservation Table: Displays the time space flow of data through the
pipeline for one function evaluation.
Time
                                                          1      2    3     4     5     6    7     8
X X X
X X
X X X
S1
            Stage         S2
S3
Reservation function for a function x                                  
 Latency: The number of time units (clock cycles) between two
initiations of a pipeline is the latency between them. Latency values
must be non-negative integers.
 
  Collision: When two or more initiations are done at same pipeline
stage at the same time will cause a collision. A collision implies
resource conflicts between two initiations in the pipeline, so it
should be avoided.
  Forbidden and Permissible Latency: Latencies that cause collisions
are called forbidden latencies. (E.g. in above reservation table 2, 4,
5 and 7 are forbidden latencies).
Latencies that do not cause any collision are called permissible
latencies. (E.g. in above reservation table 1, 3 and 6 are permissible
latencies).
  Latency Sequence and Latency Cycle: A Latency Sequence is a
sequence of permissible non-forbidden latencies between
successive task initiations.
A Latency cycle is a latency sequence which repeats the same
subsequence (cycle) indefinitely.
The Average Latency of a latency cycle is obtained by dividing the
sum of all latencies by the number of latencies along the cycle.
The latency cycle (1, 8) has an average latency of (1+8)/2=4.5.
A Constant Cycle is a latency cycle which contains only one latency
value. (E.g. Cycles (3) and (6) both are constant cycle).

  Collision Vector: The combined set of permissible and forbidden


latencies can be easily displayed by a collision vector, which is an
m-bit (m<=n-1 in a n column reservation table) binary vector C=(C m-
Cm-1….C2C1). The value of Ci=1 if latency I causes a collision and Ci=0 if
latency i is permissible. (E.g. Cx= (1011010)).
  State Diagram: Specifies the permissible state transitions among
successive initiations based on the collision vector.
 
 
The minimum latency edges in the state diagram are marked with
asterisk.
 
 Simple Cycle, Greedy Cycle and MAL : A Simple Cycle is a latency
cycle in which each state appears only once. In above state diagram
only (3), (6), (8), (1, 8), (3, 8), and (6, 8) are simple cycles. The cycle
(1, 8, 6, 8) is not simple as it travels twice through state (1011010).
A Greedy Cycle is a simple cycle whose edges are all made with
minimum latencies from their respective starting states. The cycle
(1, 8) and (3) are greedy cycles.
MAL (Minimum Average Latency) is the minimum average latency
obtained from the greedy cycle. In greedy cycles (1, 8) and (3), the
cycle (3) leads to MAL value 3.
 
Ques4: What is cache coherence?
Answer:
In computer architecture, cache coherence is the uniformity of
shared resource data that ends up stored in multiple local caches.
When clients in a system maintain caches of a common memory
resource, problems may arise with incoherent data, which is
particularly the case with CPUs in a multiprocessing system.
In the illustration on the right, consider both the clients have a
cached copy of a particular memory block from a previous read.
Suppose the client on the bottom updates/changes that memory
block, the client on the top could be left with an invalid cache of
memory without any notification of the change. Cache coherence is
intended to manage such conflicts by maintaining a coherent view of
the data values in multiple caches.

Coherence defines the behavior of reads and writes to a single


address location.
One type of data occurring simultaneously in different cache
memory is called cache coherence, or in some systems, global
memory.
In a multiprocessor system, consider that more than one processor
has cached a copy of the memory location X. The following
conditions are necessary to achieve cache coherence:

1. In a read made by a processor P to a location X that follows a


write by the same processor P to X, with no writes to X by
another processor occurring between the write and the read
instructions made by P, X must always return the value
written by P.
2. In a read made by a processor P1 to location X that follows a
write by another processor P2 to X, with no other writes to X
made by any processor occurring between the two accesses
and with the read and write being sufficiently separated, X
must always return the value written by P2. This condition
defines the concept of coherent view of memory. Propagating
the writes to the shared memory location ensures that all the
caches have a coherent view of the memory. If processor P1
reads the old value of X, even after the write by P2, we can say
that the memory is incoherent.
The above conditions satisfy the Write Propagation criteria required
for cache coherence. However, they are not sufficient as they do not
satisfy the Transaction Serialization condition. To illustrate this
better, consider the following example:
A multi-processor system consists of four processors - P1, P2, P3
and P4, all containing cached copies of a shared variable S whose
initial value is 0. Processor P1 changes the value of S (in its cached
copy) to 10 following which processor P2 changes the value of S in
its own cached copy to 20. If we ensure only write propagation, then
P3 and P4 will certainly see the changes made to S by P1 and P2.
However, P3 may see the change made by P1 after seeing the change
made by P2 and hence return 10 on a read to S. P4 on the other hand
may see changes made by P1 and P2 in the order in which they are
made and hence return 20 on a read to S. The processors P3 and P4
now have an incoherent view of the memory.
Therefore, in order to satisfy Transaction Serialization, and hence
achieve Cache Coherence, the following condition along with the
previous two mentioned in this section must be met:

 Writes to the same location must be sequenced. In other


words, if location X received two different values A and B, in this
order, from any two processors, the processors can never read
location X as B and then read it as A. The location X must be seen
with values A and B in that order.
The alternative definition of a coherent system is via the definition
of sequential consistency memory model: "the cache coherent
system must appear to execute all threads’ loads and stores to
a single memory location in a total order that respects the program
order of each thread". Thus, the only difference between the cache
coherent system and sequentially consistent system is in the
number of address locations the definition talks about (single
memory location for a cache coherent system, and all memory
locations for a sequentially consistent system).
Another definition is: "a multiprocessor is cache consistent if all
writes to the same memory location are performed in some
sequential order".
Rarely, but especially in algorithms, coherence can instead refer to
the locality of reference. Multiple copies of same data can exist in
different cache simultaneously and if processors are allowed to
update their own copies freely, an inconsistent view of memory can
result.

Coherence mechanisms
he two most common mechanisms of ensuring coherency
are snooping and directory-based, each having their own benefits
and drawbacks. Snooping based protocols tend to be faster, if
enough bandwidth is available, since all transactions are a
request/response seen by all processors. The drawback is that
snooping isn't scalable. Every request must be broadcast to all nodes
in a system, meaning that as the system gets larger, the size of the
(logical or physical) bus and the bandwidth it provides must grow.
Directories, on the other hand, tend to have longer latencies (with a
3 hop request/forward/respond) but use much less bandwidth
since messages are point to point and not broadcast. For this reason,
many of the larger systems (>64 processors) use this type of cache
coherence.
 Snooping: First introduced in 1983,[7] snooping is a process
where the individual caches monitor address lines for accesses
to memory locations that they have cached.[4] The write-
invalidate protocols and write-update protocols make use of this
mechanism.
For the snooping mechanism, a snoop filter reduces the snooping
traffic by maintaining a plurality of entries, each representing a
cache line that may be owned by one or more nodes. When
replacement of one of the entries is required, the snoop filter selects
for the replacement the entry representing the cache line or lines
owned by the fewest nodes, as determined from a presence vector in
each of the entries. A temporal or other type of algorithm is used to
refine the selection if more than one cache line is owned by the
fewest nodes.
 Directory-based: In a directory-based system, the data
being shared is placed in a common directory that maintains
the coherence between caches. The directory acts as a filter
through which the processor must ask permission to load an
entry from the primary memory to its cache. When an entry is
changed, the directory either updates or invalidates the other
caches with that entry.
Distributed shared memory systems mimic these mechanisms in
an attempt to maintain consistency between blocks of memory in
loosely coupled systems.[9]

Coherence protocols

Coherence protocols apply cache coherence in multiprocessor


systems. The intention is that two clients must never see different
values for the same shared data.
The protocol must implement the basic requirements for coherence.
It can be tailor-made for the target system or application.
Protocols can also be classified as snoopy or directory-based.
Typically, early systems used directory-based protocols where a
directory would keep a track of the data being shared and the
sharers. In snoopy protocols, the transaction requests (to read,
write, or upgrade) are sent out to all processors. All processors
snoop the request and respond appropriately.

Write propagation in snoopy protocols can be implemented by


either of the following methods:
 Write-invalidate:
When a write operation is observed to a location that a cache
has a copy of, the cache controller invalidates its own copy of
the snooped memory location, which forces a read from main
memory of the new value on its next access.

 Write-update:
When a write operation is observed to a location that a cache
has a copy of, the cache controller updates its own copy of the
snooped memory location with the new data.
If the protocol design states that whenever any copy of the
shared data is changed, all the other copies must be "updated"
to reflect the change, then it is a write-update protocol. If the
design states that a write to a cached copy by any processor
requires other processors to discard or invalidate their cached
copies, then it is a write-invalidate protocol.
However, scalability is one shortcoming of broadcast
protocols.
Various models and protocols have been devised for
maintaining coherence, such as MSI, MESI (aka
Illinois), MOSI, MOESI, MERSI, MESIF, write-once, Synapse,
Berkeley, Firefly and Dragon protocol. In 2011, ARM
Ltd proposed the AMBA 4 ACE for handling coherency in SoCs.

You might also like