DDCO Notes-162-171
DDCO Notes-162-171
DDCO Notes-162-171
Executing an Instruction
Fetch the contents of the memory location pointed to by the PC. The contents of this
location are loaded into the IR (fetch phase).
IR ← [[PC]]
Assuming that the memory is byte addressable, increment the contents of the PC by 4
(fetch phase).
PC ← [PC] + 4
Carry out the actions specified by the instruction in the IR (execution phase).
Processor Organization
1
Instruction decoder and controller logic block is responsible for issuing signals that control the
operation of all the units inside the processor and for interacting with memory bus.
This unit is responsible for implementing the actions specified by the instruction loaded in the
IR register.
The decoder generates the control signals needed to select the registers involved and direct the
transfer of data
Registers Y, Z, and TEMP: Programmers need not be concerned with them as they are never
referenced explicitly by any instruction.
They are used by the processor for temporary storage during execution of some instructions.
MUX: Selects either the output of register Y or a constant value 4 Select4 and SelectY
Constant 4 is used to increment the contents of the program counter.
Output of MUX is provided as input A of the ALU
The registers, the ALU, and the interconnecting bus are collectively referred to as the datapath.
Executing an Instruction
Transfer a word of data from one processor register to another or to the ALU.
Perform arithmetic or a logic operation and store the result in a processor register.
Fetch the contents of a given memory location and load them into a processor register.
Store a word of data from a processor register into a given memory location.
Register Transfers
Instruction execution – data are transferred from one register to another.
For each register, two control signals Riin and Riout - the input and output of register Ri are
controlled by these signals.
Ex : Move R1,R4
1.
Enable the output of R1 by setting R1out to 1.
2
• This places the contents of R1 on the processor bus.
2.
Enable the input of R4 by setting R4in to 1.
• This loads data from processor bus into R4.
All operations and data transfers are controlled by the processor clock.
3
To accommodate this, the processor waits until it receives an indication that the requested
operation has been completed (Memory-Function-Completed, MFC).
Eg: Consider the instruction Move (R1), R2. The actions needed to execute this instruction are:
MAR ← [R1]
Start a Read operation on the memory bus
Wait for the MFC response from the memory
Load MDR from the memory bus
R2 ← [MDR]
⚫ Memory Read operation requires above three steps. Signals are activated as shown.
1.
R1out, MARin, Read
2.
MDRinE, WFMC
3.
MDRout, R2in
Timing: Assume MAR is always available on the address lines of the memory bus.
4
1.
R1out, MARin
2.
R2out ,MDRin,Write
3.
MDRoutE, WMFC
5
usually obtained by adding an offset X given in the branch instruction.
• The offset X is usually the difference between the branch target address and the
address immediately following the branch instruction.
• Conditional branch
Unconditional branch
• A branch instruction replaces the contents of PC with the branch target address, which is
usually obtained by adding an offset X given in the branch instruction.
• The offset X is usually the difference between the branch target address and the address
immediately following the branch instruction.
Pipelining
• Pipelining is widely used in modern processors.
• Pipelining improves system performance in terms of throughput.
• Pipelined organization requires sophisticated compilation techniques.
Basic Concepts
• Use faster circuit technology to build the processor and the main memory.
• Arrange the hardware so that more than one operation can be performed at the same time.
• In the latter way, the number of operations performed per second is increased even though
the elapsed time needed to perform any one operation is not changed.
6
• Example of two stage pipelining: stages: Fetch and Execute
• Separate hardware units –Fetch, Execute
• Fetched instruction is deposited in intermediate buffer B1.
• This buffer is needed to enable the executing unit to execute the instruction while fetch unit
is fetching the next instruction.
• Fetch and execution steps are completed in one clock cycle.
4 – Stage Pipelining:
7
What is the status of 4-stage pipeline during clock 4 and what are the contents of buffers?
• Buffer b1 holds instruction I3, which was fetched in cycle 3 and is being decoded by the decoding
unit.
• Buffer B2 holds the source operands for instruction I2 and specification of the operation to be
performed.
• Buffer B3 holds the results produced by the execution unit and the destination information for
instruction I1.
Role of Cache Memory
• Each pipeline stage is expected to complete in one clock cycle.
• The clock period should be long enough to let the slowest pipeline stage to complete.
• Faster stages can only wait for the slowest one to complete.
• Since main memory is very slow compared to the execution, if each instruction needs to be fetched
from main memory, pipeline is almost useless.
• Fortunately, we have cache.
Pipeline Performance
• The potential increase in performance resulting from pipelining is proportional to the number
of pipeline stages.
• However, this increase would be achieved only if all pipeline stages require the same time
to complete, and there is no interruption throughout program execution.
• Unfortunately, this is not true.
• For many reasons, one of the pipeline stages may not be able to complete the task in the
time allotted.
• For example, division may take more time.
• In the following diagram, it is assumed that I2 takes three cycles to complete.
• In cycles 5 and 6, Write stage must be told to do nothing.
• Stage 2 and in turn stage 1 are blocked from accepting new instructions, because information in
B1 cannot be overwritten.
8
• This pipeline is said to have been stalled for two clock cycles.
• Any condition that causes a pipeline to stall is called a hazard.
• Data hazard – any condition in which either the source or the destination operands of an
instruction are not available at the time expected in the pipeline. So some operation has to be
delayed, and the pipeline stalls.
• Instruction (control) hazard – a delay in the availability of an instruction causes the pipeline
to stall.
• Structural hazard – the situation when two instructions require the use of a given hardware
resource at the same time.
Instruction Hazard: Effect of Cache miss: Pipeline caused by a cache miss in F2.
• In the following diagram, it is assumed that instruction fetch for I2 results in a cache miss.
• Again, pipelining does not result in individual instructions being executed faster; rather, it is
the throughput that increases.
• Throughput is measured by the rate at which instruction execution is completed.
• Pipeline stall causes degradation in pipeline performance.
• We need to identify all hazards that may cause the pipeline to stall and to find ways to
minimize their impact.