DDCO Notes-162-171

BCS302 – Digital Design and Computer Organization
Module 5 – Basic Processing Unit

Overview
 Instruction Set Processor (ISP) – executes machine instructions and coordinates the activities
of all other units.
 Central Processing Unit (CPU)
 A typical computing task consists of a series of steps specified by a sequence of machine
instructions that constitute a program.
 An instruction is executed by carrying out a sequence of more rudimentary operations.
Some Fundamental Concepts

 Processor fetches one instruction at a time and performs the operation specified.
 Instructions are fetched from successive memory locations until a branch or a jump instruction
is encountered.
 Processor keeps track of the address of the memory location containing the next instruction to
be fetched using Program Counter (PC).
 Instruction Register (IR)
Executing an Instruction
 Fetch the contents of the memory location pointed to by the PC. The contents of this
location are loaded into the IR (fetch phase).
IR ← [[PC]]
 Assuming that the memory is byte addressable, increment the contents of the PC by 4
(fetch phase).
PC ← [PC] + 4
 Carry out the actions specified by the instruction in the IR (execution phase).
Processor Organization
 Fig: Single-bus organization of the datapath inside a processor

 This bus is internal to the processor. Not the external bus that connects the processor to the
memory and I/O devices.
 The data and address lines of the external memory bus are connected to the internal
processor bus via MDR and MAR.
 MDR has two inputs and two outputs.
oData may be loaded into MDR either from the memory bus or from the internal processor
bus.
oThe data stored in MDR may be placed on either bus.
 The input of MAR is connected to the internal bus, and its output is connected to the external
bus
1
 Instruction decoder and controller logic block is responsible for issuing signals that control the
operation of all the units inside the processor and for interacting with memory bus.
 This unit is responsible for implementing the actions specified by the instruction loaded in the
IR register.
 The decoder generates the control signals needed to select the registers involved and direct the
transfer of data
 Registers Y, Z, and TEMP: Programmers need not be concerned with them as they are never
referenced explicitly by any instruction.
 They are used by the processor for temporary storage during execution of some instructions.
 MUX: Selects either the output of register Y or a constant value 4  Select4 and SelectY
 Constant 4 is used to increment the contents of the program counter.
 Output of MUX is provided as input A of the ALU
 The registers, the ALU, and the interconnecting bus are collectively referred to as the datapath.
Executing an Instruction
 Transfer a word of data from one processor register to another or to the ALU.
 Perform arithmetic or a logic operation and store the result in a processor register.
 Fetch the contents of a given memory location and load them into a processor register.
 Store a word of data from a processor register into a given memory location.
Register Transfers
 Instruction execution – data are transferred from one register to another.
 For each register, two control signals Riin and Riout - the input and output of register Ri are
controlled by these signals.
Ex : Move R1,R4
1.
Enable the output of R1 by setting R1out to 1.
2
• This places the contents of R1 on the processor bus.
2.
Enable the input of R4 by setting R4in to 1.
• This loads data from processor bus into R4.
 All operations and data transfers are controlled by the processor clock.
Performing an Arithmetic or Logic Operation

 The ALU is a combinational circuit that has no internal storage.
 ALU gets the two operands from MUX and bus. The result is temporarily stored
inregister Z.
 What is the sequence of operations to add the contents of register R1 to those of
R2and store the result in R3?
•
R1out, Yin
•
R2out, SelectY, Add, Zin
•
Zout, R3in
Fetching a Word from Memory

 Address into MAR; issue Read operation; data into MDR.
 The response time of each memory access varies (cache miss, memory-mapped I/O,…).
3
 To accommodate this, the processor waits until it receives an indication that the requested
operation has been completed (Memory-Function-Completed, MFC).
Eg: Consider the instruction Move (R1), R2. The actions needed to execute this instruction are:
 MAR ← [R1]
 Start a Read operation on the memory bus
 Wait for the MFC response from the memory
 Load MDR from the memory bus
 R2 ← [MDR]
⚫ Memory Read operation requires above three steps. Signals are activated as shown.
1.
R1out, MARin, Read
2.
MDRinE, WFMC
3.
MDRout, R2in
Timing: Assume MAR is always available on the address lines of the memory bus.
Storing a Word in Memory

• Follows the similar procedure.
• The desired address is loaded into the MAR register, and data to be written is loaded into MDR.
• Write command is issued. Example: Move R2, (R1)
• Sequence:
4
1.
R1out, MARin
2.
R2out ,MDRin,Write
3.
MDRoutE, WMFC
Execution of a Complete Instruction

• Add (R3), R1
• Fetch the instruction
• Fetch the first operand (the contents of the memory location pointed to by R3)
• Perform the addition
• Load the result into R1
Execution of Branch Instructions

• A branch instruction replaces the contents of PC with the branch target address, which is
5
usually obtained by adding an offset X given in the branch instruction.
• The offset X is usually the difference between the branch target address and the
address immediately following the branch instruction.
• Conditional branch
Unconditional branch
• A branch instruction replaces the contents of PC with the branch target address, which is
usually obtained by adding an offset X given in the branch instruction.
• The offset X is usually the difference between the branch target address and the address
immediately following the branch instruction.
Pipelining
• Pipelining is widely used in modern processors.
• Pipelining improves system performance in terms of throughput.
• Pipelined organization requires sophisticated compilation techniques.
Basic Concepts
Making the Execution of Programs Faster
• Use faster circuit technology to build the processor and the main memory.
• Arrange the hardware so that more than one operation can be performed at the same time.
• In the latter way, the number of operations performed per second is increased even though
the elapsed time needed to perform any one operation is not changed.
6
• Example of two stage pipelining: stages: Fetch and Execute
• Separate hardware units –Fetch, Execute
• Fetched instruction is deposited in intermediate buffer B1.
• This buffer is needed to enable the executing unit to execute the instruction while fetch unit
is fetching the next instruction.
• Fetch and execution steps are completed in one clock cycle.
4 – Stage Pipelining:
F Fetch: read the instruction from the memory

D Decode: decode the instruction and fetch the source operand(s)
E Execute: perform the operation specified buy the
instruction W Write: store the result in the destination location.
• 4 distinct hardware units are used.
• Information is passed from one unit to the next through storage buffer.
7
What is the status of 4-stage pipeline during clock 4 and what are the contents of buffers?
• Buffer b1 holds instruction I3, which was fetched in cycle 3 and is being decoded by the decoding
unit.
• Buffer B2 holds the source operands for instruction I2 and specification of the operation to be
performed.
• Buffer B3 holds the results produced by the execution unit and the destination information for
instruction I1.
Role of Cache Memory
• Each pipeline stage is expected to complete in one clock cycle.
• The clock period should be long enough to let the slowest pipeline stage to complete.
• Faster stages can only wait for the slowest one to complete.
• Since main memory is very slow compared to the execution, if each instruction needs to be fetched
from main memory, pipeline is almost useless.
• Fortunately, we have cache.
Pipeline Performance
• The potential increase in performance resulting from pipelining is proportional to the number
of pipeline stages.
• However, this increase would be achieved only if all pipeline stages require the same time
to complete, and there is no interruption throughout program execution.
• Unfortunately, this is not true.
• For many reasons, one of the pipeline stages may not be able to complete the task in the
time allotted.
• For example, division may take more time.
• In the following diagram, it is assumed that I2 takes three cycles to complete.
• In cycles 5 and 6, Write stage must be told to do nothing.
• Stage 2 and in turn stage 1 are blocked from accepting new instructions, because information in
B1 cannot be overwritten.
8
• This pipeline is said to have been stalled for two clock cycles.
• Any condition that causes a pipeline to stall is called a hazard.
• Data hazard – any condition in which either the source or the destination operands of an
instruction are not available at the time expected in the pipeline. So some operation has to be
delayed, and the pipeline stalls.
• Instruction (control) hazard – a delay in the availability of an instruction causes the pipeline
to stall.
• Structural hazard – the situation when two instructions require the use of a given hardware
resource at the same time.
Instruction Hazard: Effect of Cache miss: Pipeline caused by a cache miss in F2.
• In the following diagram, it is assumed that instruction fetch for I2 results in a cache miss.
Pipeline caused by a cache miss in F2
Veena O S, Asst. Professor, Dept. of CSE

9
Structural Hazard:
• One instruction may need to access memory as a part of the Execute or Write stage
while another instruction is being fetched.
• If instructions and data are reside in the same cache unit, only one instruction can
proceed and the other instruction is delayed.
Example: Load X(R1), R2
• The memory address X+[R1] is computed in step E2 in cycle 4.
• Memory access takes place in cycle 5.
• Operand read from memory is written into R2 in cycle 6  execution take 2 cycles.
• This causes the pipeline to stall for 1 cycle, because both I2 and I3 require access to
the register file in cycle 6.
• Again, pipelining does not result in individual instructions being executed faster; rather, it is
the throughput that increases.
• Throughput is measured by the rate at which instruction execution is completed.
• Pipeline stall causes degradation in pipeline performance.
• We need to identify all hazards that may cause the pipeline to stall and to find ways to
minimize their impact.
Veena O S, Asst. Professor, Dept. of CSE

10

DDCO Notes-162-171

Uploaded by

Copyright:

Available Formats

DDCO Notes-162-171

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DDCO Notes-162-171

Uploaded by

Copyright:

Available Formats

BCS302 – Digital Design and Computer Organization

Module 5 – Basic Processing Unit

Some Fundamental Concepts

 Fig: Single-bus organization of the datapath inside a processor

Performing an Arithmetic or Logic Operation

Fetching a Word from Memory

Storing a Word in Memory

Execution of a Complete Instruction

Execution of Branch Instructions

Making the Execution of Programs Faster

F Fetch: read the instruction from the memory

Pipeline caused by a cache miss in F2

Veena O S, Asst. Professor, Dept. of CSE

Veena O S, Asst. Professor, Dept. of CSE

You might also like