Module 5
Module 5
The first 2 steps are referred to as fetch phase; Step 3 is referred to as execution phase.
The operations specified by an instruction can be carried out by performing one or more
of the following actions:
1. Read the content of a given memory-location and load them into a register.
3.Perform ALU operations and place the result into the register.
Figure 5.1 shows the single bus organization. ALU and all the registers are interconnected
via a single common bus. Data and address line of the external memory bus is connected to the
internal processor bus via MDR and MAR respectively(MDR-Memory Data Register and MAR-
Memory Address Register).
→ issuing the signals that control the operation of all the units inside the processor (and
for interacting with memory bus).
Registers R0 through R(n-1) are provided for general purpose use by programmer.
Three registers Y, Z & TEMP are used by processor for temporary storage during
execution of some instructions.
These are transparent to the programmer i.e. programmer need not be concerned with
them because they are never referenced explicitly by any instruction.
→ output of Y or
As instruction execution progresses, data are transferred from one register to another,
often passing through ALU to perform arithmetic or logic operation.
3) Fetch the contents of a given memory-location and load them into a processor-
register.
Disadvantage: Only one data word can be transferred over the bus in a clock cycle.
Solution: Providing multiple data-paths allows several data transfer to take place in parallel.
Input & output of register Ri is connected to bus via switches controlled by 2 control-
signals: Riin & Riout. These are called gating signals.
When Riin=1, data on bus is loaded into Ri. Similarly, when Riout=1, content of Ri is
placed on bus. When Riout=0, bus can be used for transferring data from other registers.
This transfers the content of register R1 to R2. This can be accomplished as follows.
All operations and data transfers within the processor take place within time-periods
defined by the processor clock. When edge-triggered flip-flops are not used, 2 or more clock-
signals may be needed to guarantee proper transfer of data. This is known as multiphase clocking.
A 2-input multiplexer is used to select the data applied to the input of an edge-triggered D
flip-flop. When Riin=1, mux selects data on bus. This data will be loaded into flip-flop at rising-
edge of clock. When Riin=0, mux feeds back the value currently stored in flip-flop. Q output of
flip-flop is connected to bus via a tri-state gate. When Riout=0, gate's output is in the high-
impedance state. (This corresponds to the open circuit state of a switch). When Ri out=1, the gate
drives the bus to 0 or 1, depending on the value of Q.
2) R2out, Select Y, Add, Zin //R2 contents are transferred directly to B input of ALU.
The signals are activated for the duration of the clock cycle corresponding to that step. All other
signals are inactive.
(Note: In this Figure 5.4 , replace Register Ri with Registers R1, R2, R3)
The response time of each memory access varies. For this MFC (Memory Function
Completed): is used. It is the signal sent from Addressed-device to the processor. MFC informs
the processor that the requested operation is completed by addressed device.
Thus MFC is set to 1 to indicate that the contents of the specified location
1) R1out, MARin, Read ;desired address is loaded into MAR & Read command is
issued
2) MDRinE, WMFC ;load MDR from memory bus & Wait for MFC response
from memory.
3) MDRout, R2in ;load R2 from MDR where WMFC=control signal that
causes processor's control circuitry to wait for arrival of MFC signal
2) R2out, MDRin, Write ;data to be written are loaded into MDR & Write command is issued
3) MDRoutE, WMFC ;load data into memory location pointed by R1 from MDR
3) MDRout, IRin
Step1 The instruction-fetch operation is initiated by loading contents of PC into MAR & sending
a Read request to memory. The Select signal is set to Select4, which causes the Mux to select
constant 4. This value is added to operand at input B (PC‟s content), and the result is stored in Z
Step4 Contents of R3 are loaded into MAR & a memory read signal is issued.
Step6 When Read operation is completed, memory-operand is available in MDR, and the
addition is performed.
BRANCHING INSTRUCTIONS
3) MDRout, IRin
In step 5, the result, which is the branch-address, is loaded into the PC. The offset X used
in a branch instruction is usually the difference between the branch target-address and theaddress
immediately following the branch instruction. (For example, if the branch instruction isat location
1000 and branch target-address is 1200, then the value of X must be 196, since the PC will be
containing the address 1004 after fetching the instruction at location 1000).
In case of conditional branch, we need to check the status of the condition-codes before
loading a new value into the PC.
If N=0 then End If N=0, processor returns to step 1 immediately after step 4.
5.3 Pipelining:
The speed of execution of programs is influenced by many factors.
Let Fi and Ei refer to the fetch and execute steps for instruction Ii. Execution of a program
consists of a sequence of fetch and execute steps, as shown in Figure 5.7
.Now consider a computer that has two separate hardware units, one for fetching
instructions and another for executing them, as shown in Figure 5.8.
Operation of the computer proceeds as in Figure 5.9. In the first clock cycle, the fetch
unit fetches an instruction I1 (step F1) and stores it in buffer B1 at the end of the clock cycle. In
the second clock cycle, the instruction fetch unit proceeds with the fetch operation for instruction
I2 (step F2). Meanwhile, the execution unit performs the operation specified by instruction I1,
which is available to it in buffer B1 (step E1).
By the end of second clock cycle, the execution of instruction I1 is completed and
instruction I2 is available. Instruction I2 is stored in B1,replacing I1,which is no longer needed.
StepE2 is performed by the execution unit during the third clock cycle, while instruction I3 is being
fetched by the fetch unit.
In this manner, both the fetch and execute units are kept busy all the time.
In summary, the fetch and execute units in Figure 5.3 constitute a two-stage pipeline in
which each stage performs one step in processing an instruction. An inter-stage storage buffer, B1,
is needed to hold the information being passed from one stage to the next. New informationis
loaded into this buffer at the end of each clock cycle.
The processing of an instruction need not be divided into only two steps. For example, a
pipelined processor may process each instruction in four steps, as follows:
Four instructions are in progress at any given time. This means that four distinct hardware
units are needed, as shown in Figure 5.11
These units must be capable of performing their tasks simultaneously and without
interfering with one another. Information is passed from one unit to the next through a storage
buffer. As an instruction progresses through the pipeline, all the information needed by the stages
downstream must be passed along. For example, during clock cycle 4, the information in the
buffers is as follows:
• Buffer B1 holds instruction I3, which was fetched in cycle 3 and is being decoded bythe
instruction-decoding unit.
• Buffer B2 holds both the source operands for instruction I2 and the specification of the
operation to be performed. This is the information produced by the decoding hardware in
cycle3.Thebuffer also holds the information needed for the write step of instruction I2 (stepW2).
Even though it is not needed by stage E, this information must be passed on to stage W in the
following clock cycle to enable that stage to perform the required Write operation.
• Buffer B3 holds the results produced by the execution unit and the destination information
for instruction I1.
If different units require different amounts of time, the clock period must allow the longest
task to be completed. A unit that completes its task early is idle for the remainder of the clock
period. Hence, pipelining is most effective in improving performance if the tasks being performed
in different stages require about the same amount of time.
In Figure 5.12, the clock cycle has to be equal to or greater than the time needed to
complete a fetch operation. However, the access time of the main memory may be as much as ten
times greater than the time needed to perform basic pipeline stage operations inside the processor,
such as adding two numbers. Thus, if each instruction fetch required access to the main memory,
pipelining would be of little value.
The use of cache memories solves the memory access problem. In particular, when a cache
is included on the same chip as the processor, access time to the cache is usually the same as the
time needed to perform other basic operations inside the processor.
This makes it possible to divide instruction fetching and processing into steps that are more
or less equal in duration. Each of these steps is performed by a different pipeline stage, and the
clock period is chosen to correspond to the longest one.
Let us consider an example of, one of the pipeline stages may not be able to complete its
processing task for a given instruction in the time allotted as in Figure 5.13.
Figure 5.13: Execution unit takes more than one cycle for execution
Here instruction I2 requires three cycles to complete, from cycle 4 through cycle 6. Thus,
in cycles 5 and 6, the Write stage must be told to do nothing, because it has no data to work with.
Meanwhile, the information in buffer B2 must remain intact until the Execute stage has completed
its operation. This means that stage 2 and, in turn, stage1 are blocked from accepting new
instructions because the information in B1 cannot be overwritten. Thus, steps D4 and F5 must be
postponed.
Pipelined operation in Figure 5.13 is said to have been stalled for two clock cycles. Normal
pipelined operation resumes in cycle 7. Any condition that causes the pipeline to stall is called a
hazard.
1. Data hazard
3. Structural hazard
A data hazard is any condition in which either the source or the destination operands ofan
instruction are not available at the time expected in the pipeline. As a result some operation has to
be delayed, and the pipeline stalls.
The pipeline may also be stalled because of a delay in the availability of an instruction. For
example, this may be a result of a miss in the cache, requiring the instruction to be fetched from
the main memory. Such hazards are often called control hazards or instruction hazards. Figure
5.14 has instruction hazard with it.
Instruction I1 is fetched from the cache in cycle1, and its execution proceeds normally.
However, the fetch operation for instruction I2, which is started in cycle 2,results in a cache miss.
The instruction fetch unit must now suspend any further fetch requests and wait for I2 to arrive.
We assume that instruction I2 is received and loaded into buffer B1 at the end of cycle 5. The
pipeline resumes its normal operation at that point.
The memory address, X+[R1], is computed in step E2 in cycle4, then memory access takes
place in cycle5.The operand read from memory is written into register R2 in cycle 6. This means
that the execution step of this instruction takes two clock cycles (cycles 4 and 5). It causes the
pipeline to stall for one cycle, because both instructions I2 and I3 require access to the register file
in cycle 6 which is shown in Figure 5.15.
Even though the instructions and their data are all available, the pipeline stalled because
one hardware resource, the register file, cannot handle two operations at once. If the register file
had two input ports, that is, if it allowed two simultaneous write operations, the pipeline would not
be stalled. In general, structural hazards are avoided by providing sufficient hardware resources on
the processor chip.
The most common case in which this hazard may arise is in access to memory. One
instruction may need to access memory as part of the Execute or Write stage while another
instruction is being fetched. If instructions and data reside in the same cache unit, only one
instruction can proceed and the other instruction is delayed.
Many processors use separate instruction and data caches to avoid this delay.
An important goal in designing processors is to identify all hazards that may cause the
pipeline to stall and to find ways to minimize their impact.
Contents are taken from the text book given below and all the rights go to them.
This document is prepared for the benefit of 3rd Sem students under VTU 2022 scheme.
Only topic required for the scheme is taken from the text book.