Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
11 views

Chapter7 - Basic Processing Unit 1

Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Chapter7 - Basic Processing Unit 1

Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 31

Basic Processing Unit

Overview
 Instruction Set Processor (ISP)
 Central Processing Unit (CPU)

 A typical computing task consists of a series

of steps specified by a sequence of machine


instructions that constitute a program.
 An instruction is executed by carrying out a

sequence of more rudimentary operations.


Some Fundamental
Concepts
Fundamental Concepts
 Processor fetches one instruction at a time and
perform the operation specified.
 Instructions are fetched from successive memory
locations until a branch or a jump instruction is
encountered.
 Processor keeps track of the address of the memory
location containing the next instruction to be fetched
using Program Counter (PC).
 Instruction Register (IR)
Executing an Instruction
 Fetch the contents of the memory location pointed
to by the PC. The contents of this location are
loaded into the IR (fetch phase).
IR ← [[PC]]
 Assuming that the memory is byte addressable,
increment the contents of the PC by 4 (fetch phase).
PC ← [PC] + 4
 Carry out the actions specified by the instruction in
the IR (execution phase).
Processor Organization Internal processor
bus
Control signals

PC

Instruction
Address
decoder and
lines
MDR HAS MAR control logic
TWO INPUTS Memory
AND TWO bus
OUTPUTS MDR
Data
lines IR

Datapath
Y
Constant 4 R0

Select MUX

Add
A B
ALU Sub R n - 1 
control ALU
lines
Carry-in
XOR TEMP

Z
Textbook Page 413

Figure 7.1. Single-bus organization of the datapath inside a processor.


Executing an Instruction
 Transfer a word of data from one processor
register to another or to the ALU.
 Perform an arithmetic or a logic operation
and store the result in a processor register.
 Fetch the contents of a given memory
location and load them into a processor
register.
 Store a word of data from a processor
register into a given memory location.
Register Transfers Riin
Internal processor
bus

Ri

Riout

Yin

Constant 4

Select MUX

A B
ALU

Zin

Z out

Figure 7.2. Input and output gating for the registers in Figure 7.1.
Register Transfers
 All operations and data transfers are controlled by the processor clock.
Bus

D Q
1
Q
Riout

Ri in
Clock

Figure
Figure7.3. Inputand
7.3. Input andoutput
outputgating
gatingforfor one
one registerbit.
register bit.
Performing an Arithmetic or
Logic Operation
 The ALU is a combinational circuit that has no
internal storage.
 ALU gets the two operands from MUX and bus.
The result is temporarily stored in register Z.
 What is the sequence of operations to add the
contents of register R1 to those of R2 and store the
result in R3?
1. R1out, Yin
2. R2out, SelectY, Add, Zin
3. Zout, R3in
Fetching a Word from Memory
 Address into MAR; issue Read operation; data into MDR.

Figure: Connection and control signals for register MDR.


Fetching a Word from Memory
 The response time of each memory access varies
(cache miss, memory-mapped I/O,…).
 To accommodate this, the processor waits until it
receives an indication that the requested operation
has been completed (Memory-Function-Completed,
MFC).
 Move (R1), R2
 MAR ← [R1]
 Start a Read operation on the memory bus
 Wait for the MFC response from the memory
 Load MDR from the memory bus
 R2 ← [MDR]
Step 1 2 3

Timing Clock

MARin MAR ← [R1]


Assume MAR
is always available Address
on the address lines
of the memory bus. Start a Read operation on the memory bus
Read

MR

MDRinE

Data

Wait for the MFC response from the memory


MFC

MDR out Load MDR from the memory bus


R2 ← [MDR]

Figure 7.5. Timing of a memory Read operation.


Execution of a Complete
Instruction
 Add (R3), R1
 Fetch the instruction

 Fetch the first operand (the contents of the

memory location pointed to by R3)


 Perform the addition

 Load the result into R1


Architecture Internal processor
bus
Riin

Ri

Ri out

Yin

Constant 4

Select MUX

A B
ALU

Z in

Z out

Figure: Input and output gating for the registers.


Execution of a Complete
Instruction Internal processor
bus

Add (R3), R1
Control signals

PC

Instruction
Step Action Address
decoder and
lines
MAR control logic

1 PCout , MAR in , Read,Select4,Add, Zin Memory


bus

2 Zout , PC in , Yin , WMF C MDR


Data
IR
3 MDR out , IR in lines

4 R3out , MAR in , Read Y

Constant 4 R0
5 R1out , Yin , WMF C
6 MDR out , SelectY,Add, Zin Select MUX

7 Zout , R1in , End Add


A B
ALU Sub R n - 1 
control ALU
lines
Carry-in
XOR TEMP
Figure: Control sequenceforexecutionoftheinstructionAdd (R3),R1.
Z

Figure 7.1. Single-bus organization of the datapath inside a processor.


Execution of Branch Instructions
A branch instruction replaces the contents of
PC with the branch target address, which is
usually obtained by adding an offset X given
in the branch instruction.
 The offset X is usually the difference between

the branch target address and the address


immediately following the branch instruction.
 Conditional branch
Step Action

1 PCout , MAR in , Read,Select4,Add, Zin


2 Zout, PCin , Yin, WMF C
3 MDRout , IR in
4 Offset-field-of-IR
out, Add, Zin

5 Zout, PCin , End

Figure : Control sequence for an unconditional branch instruction.


Pipelining
Overview
 Pipelining is widely used in modern
processors.
 Pipelining improves system performance in

terms of throughput.
 Pipelined organization requires sophisticated

compilation techniques.
Basic Concepts
Making the Execution of
Programs Faster
 Use faster circuit technology to build the
processor and the main memory.
 Arrange the hardware so that more than one

operation can be performed at the same time.


 In the latter way, the number of operations

performed per second is increased even though


the elapsed time needed to perform any one
operation is not changed.
Traditional Pipeline Concept

Laundry Example
Ann, Brian, Cathy, Dave

each have one load of clothes


to wash, dry, and fold A B C D
Washer takes 30 minutes

Dryer takes 40 minutes

Folder takes 20 minutes


6 PM 7 8 9 10 11 Midnight

Time

30 40 20 30 40 20 30 40 20 30 40 20
 Sequential laundry takes 6
A hours for 4 loads
 If they learned pipelining, how

long would laundry take?


B

D
6 PM 7 8 9 10 11 Midnight

Time
T
a 30 40 40 40 40 20
s
k A
 Pipelined laundry takes 3.5
hours for 4 loads
O B
r
d C
e
r D
Traditional Pipeline Concept
 Pipelining doesn’t help latency
6 PM 7 8 9 of single task, it helps
throughput of entire workload
Time  Pipeline rate limited by slowest
T pipeline stage
a 30 40 40 40 40 20
 Multiple tasks operating
s simultaneously using different
A
k resources
 Potential speedup = Number

pipe stages
O B
 Unbalanced lengths of pipe
r
stages reduces speedup
d C  Time to “fill” pipeline and time
e to “drain” it reduces speedup
r  Stall for Dependences
D
Use the Idea of Pipelining in a
Computer
Fetch + Execution
T ime
I1 I2 I3
Time
Clock cycle 1 2 3 4
F E F E F E
1 1 2 2 3 3 Instruction

I1 F1 E1
(a) Sequential execution

I2 F2 E2
Interstage buffer
B1
I3 F3 E3

Instruction Execution
fetch unit (c) Pipelined execution
unit

(b) Hardware organization


Basic idea of instruction pipelining.
Fetch + Decode+ Execution + Write

T ime
Clock cycle 1 2 3 4 5 6 7
Instruction
I1 F1 D1 E 1 W 1

F D E W
I2 2 2 2 2

F D E W
I 3 3 3 3
3
F D E W
I4 4 4 4 4

(a) Instruction execution divided into four steps

Interstage b uf fers

D : Decode
F : Fetch instruction E: Ex ecute W : Write
instruction and fetch operation
operands results
B1 B2 B3

(b) Hardware organization

A 4-stage pipeline.
Role of Cache Memory
 Each pipeline stage is expected to complete in one clock
cycle.
 The clock period should be long enough to let the slowest
pipeline stage to complete.
 Faster stages can only wait for the slowest one to
complete.
 Since main memory is very slow compared to the
execution, if each instruction needs to be fetched from
main memory, pipeline is almost useless.
 Fortunately, we have cache.
Pipeline Performance
 The potential increase in performance resulting
from pipelining is proportional to the number of
pipeline stages.
 However, this increase would be achieved only if

all pipeline stages require the same time to


complete, and there is no interruption throughout
program execution.
 Unfortunately, this is not true.
T ime
Clock c ycle 1 2 3 4 5 6 7 8 9

Instruction

I1 F1 D1 E1 W1

I2 F2 D2 E2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4 W4

I5 F5 D5 E5

Figure: Ef fect of an ex ecution operation taking more than one clock c ycle.

You might also like