0% found this document useful (0 votes)

35 views

Computer Architecture Kin Truc May Tinh

This document discusses computer architecture and provides materials for a course on the topic. It includes sections on instruction set architecture, arithmetic operations, and CPU organization. It also provides information about the lecturer and references several textbooks.

Uploaded by

venkeeku

Available Formats

Download as PPSX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views

Computer Architecture Kin Truc May Tinh

Uploaded by

venkeeku

Available Formats

Download as PPSX, PDF, TXT or read online on Scribd

You are on page 1/ 249

Computer Architecture

This course for using in HEDSPI Project

Nguyen Thanh Kien

Department of Computer Engineering
Faculty of Information Technology
Hanoi University of Technology
Acknowledge

 These materials are used as reference for this slide:

– “Computer Architecture, Background and Motivation” slides of UCSB,
B.Parhami.
– Computer architecture - Behrooz Parhami
– Computer organization and design - John L. Hennessy & David A.
Patterson

Page  2 © NTK 2009

Reference

 Text Books:
– Computer architecture - Behrooz Parhami
– Computer organization and design - John L. Hennessy & David A.
Patterson

 Reference books
– Computer Architecture and Organization - John P. Hayes
– Computer Organization and Architecture – Designing for Performance -
William Stallings
– Computer Architecture: Single and Parallel Systems - Mehdi R.
Zargham
– Assembly Language Programming and Organization of the IBM-PC -
Ytha Yu, Charles Marut
Page  3 © NTK 2009
About

 Lecturer:
– Nguyen Thanh Kien
– Department of Computer Engineering, Faculty of Information
Technology, Hanoi University of Technology
– Mobile: +84 983588135
– Email: kiennt-fit@mail.hut.edu.vn
thanhkien84@yahoo.com
– Address:
• Room 322, C1, Hanoi University of Technology
• No.1, Dai Co Viet, Hai Ba Trung, Hanoi.

Page  4 © NTK 2009

Content

1. Introduction - Computer system technology and Computer

Performance
2. Instruction Set Architecture
3. Arithmetic for Computer
4. CPU Organization

Page  5 © NTK 2009

Content

1. Introduction - Computer system technology and Computer

Performance
2. Instruction Set Architecture
3. Arithmetic for Computer
4. CPU Organization

Page  6 © NTK 2009

Content

1. Introduction - Computer system technology and Computer

Performance
2. Instruction Set Architecture
3. Arithmetic for Computer
4. CPU Organization

Page  7 © NTK 2009

2. Instruction Set Architecture

 2.1. Instructions and addressing

 2.2. Procedures and Data

Page  8 © NTK 2009

2.1. Instructions and addressing

2.1.1 Abstract View of Hardware

2.1.2 Instruction Formats
2.1.3 Simple Arithmetic / Logic Instructions
2.1.4 Load and Store Instructions
2.1.5 Jump and Branch Instructions
2.1.6 Addressing Modes

Page  9 © NTK 2009

2.1. Instructions and addressing

2.1.1 Abstract View of Hardware

2.1.2 Instruction Formats
2.1.3 Simple Arithmetic / Logic Instructions
2.1.4 Load and Store Instructions
2.1.5 Jump and Branch Instructions
2.1.6 Addressing Modes

Page  10 © NTK 2009

2.1.1 Abstract View of Hardware

...
Loc 0 Loc 4 Loc 8 m  2 32
4 B / location Memory
up to 2 30 words Loc Loc
m 8 m 4
...

EIU $0 Execution FPU $0 Floating-

(Main proc.) $1 & integer (Coproc. 1) $1 point unit
$2 unit $2

$31 $31
Integer FP
ALU
mul/div arith

Hi Lo
TMU BadVaddr Trap &
(Coproc. 0) Status memory
Cause unit
Chapter Chapter Chapter EPC
10 11 12

Memory and processing subsystems for MiniMIPS.

Page  11 © NTK 2009
Data Types

Byte =Byte
8 bits

Halfword= 2 bytes
Halfword

Word =Word
4 bytes

Doubleword = 8 bytes
Doubleword

MiniMIPS registers hold 32-bit (4-byte) words. Other common

data sizes include byte, halfword, and doubleword.

Page  12 © NTK 2009

$0 0 $zero
A 4-byte word
$1 $at 3
Reserved for assembler use sits in consecutive
$2 $v0 memory addresses 2
Procedure results 1
$3 $v1 according to the Register Conventions
$4 $a0 big -endian order 0
$5 $a1 Procedure (most significant
Saved byte has the
$6 $a2 arguments
lowest address)
$7 $a3
$8 $t0
$9 $t1 Byte numbering: 3 2 1 0
$10 $t2
When loading
$11 $t3 Temporary a byte into a
$12 $t4 values register, it goes
$13 $t5 in the low end Byte
$14 $t6
$15 $t7 Word
$16 $s0 Doubleword
$17 $s1
$18 $s2 Saved
$19 $s3 across
Operands procedure
$20 $s4
$21 $s5 calls
$22 $s6
$23 $s7
A doubleword
$24 $t8 More sits in consecutive
$25 $t9 temporaries registers or
$26 $k0 memory locations
Registers and
$27 $k1 Reserved for OS (kernel) according to the data sizes in
$28 $gp Global pointer big -endian order
$29 $sp Stack pointer
(most significant MiniMIPS.
Saved word comes first)
$30 $fp Frame pointer
$31 $ra Return address

Page  13 © NTK 2009

$4 $a0 big-endian order
(most significant
Registers $5 Used in
$a1 Thisarguments
Chapter
Procedure
Saved byte has the
$6 $a2
lowest address)
$7 $a3
$8 $t0 10 temporary registers
$9 $t1 Byte numbering: 3 2
$10 $t2 When loading
$11 $t3 Temporary a byte into a
$12 $t4 values register, it goe
$13 $t5 8 operand registers in the low end
$14 $t6
$15 $t7 Word
$16 $s0 Doublew ord
$17 $s1
$18 $s2 Saved
$19 $s3 across
Operands procedure
$20 $s4
$21 $s5 calls
$22 $s6
$23 $s7 A doublewor
$24 $t8 More sits in conse
$25 $t9 temporaries registers or
$26 $k0 memory loca
$27 $k1 Reserved for OS (kernel) according to
$28 $gp Global pointer big-endian o
Page  14 © NTK 2009
(most signifi
$29 $sp Stack pointer
2.1.2 Instruction Formats

High-level language statement: a = b + c

Assembly language instruction: add $t8, $s2, $s1

Machine language instruction: 000000 10010 10001 11000 00000 100000

ALU-type Register Register Register Addition
instruction 18 17 24 Unused opcode

P $17 ALU
C $18
$24

Instruction Register Data Register

Operation
fetch readout read/store writeback

A typical instruction for MiniMIPS and steps in its execution.

Page  15 © NTK 2009
Add, Subtract, and Specification of Constants

MiniMIPS add & subtract instructions; e.g., compute:

g = (b + c)  (e + f)
add $t8,$s2,$s3 # put the sum b + c in $t8
add $t9,$s5,$s6 # put the sum e + f in $t9
sub $s7,$t8,$t9 # set g to ($t8)  ($t9)

Decimal and hex constants

Decimal 25, 123456, 2873
Hexadecimal 0x59, 0x12b4c6, 0xffff0000

Machine instruction typically contains

an opcode
one or more source operands
possibly a destination operand

Page  16 © NTK 2009

MiniMIPS Instruction Formats

op rs rt rd sh fn
31 25 20 15 10 5 0
R 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
Opcode Source Source Destination Shift Opcode
register 1 register 2 register amount extension

op rs rt operand / offset
31 25 20 15 0
I 6 bits 5 bits 5 bits 16 bits
Opcode Source Destination Immediate operand
or base or data or address offset

op jump target address

31 25 0
J 6 bits 1 0 0 0 0 0 0 0 0 0 0 0 26
0 bits
0 0 0 0 0 0 0 1 1 1 1 0 1
Opcode Memory word address (byte address divided by 4)

MiniMIPS instructions come in only three formats: register (R),

immediate (I), and jump (J).

Page  17 © NTK 2009

2.1.3 Simple Arithmetic/Logic Instructions

Add and subtract already discussed; logical instructions are similar

add $t0,$s0,$s1 # set $t0 to ($s0)+($s1)
sub $t0,$s0,$s1 # set $t0 to ($s0)-($s1)
and $t0,$s0,$s1 # set $t0 to ($s0)($s1)
or $t0,$s0,$s1 # set $t0 to ($s0)($s1)
xor $t0,$s0,$s1 # set $t0 to ($s0)($s1)
nor $t0,$s0,$s1 # set $t0 to (($s0)($s1))
op rs rt rd sh fn
31 25 20 15 10 5 0
R 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 x 0
ALU Source Source Destination Unused add = 32
instruction register 1 register 2 register sub = 34

The arithmetic instructions add and sub have a format that is common
to all two-operand ALU instructions. For these, the fn field specifies the
arithmetic/logic operation to be performed.

Page  18 © NTK 2009

Arithmetic/Logic with One Immediate Operand

An operand in the range [32 768, 32 767], or [0x0000, 0xffff],

can be specified in the immediate field.
addi $t0,$s0,61 # set $t0 to ($s0)+61
andi $t0,$s0,61 # set $t0 to ($s0)61
ori $t0,$s0,61 # set $t0 to ($s0)61
xori $t0,$s0,0x00ff # set $t0 to ($s0) 0x00ff

For arithmetic instructions, the immediate operand is sign-extended

op rs rt operand / offset
31 25 20 15 0
I 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1
1 0 Errors 0 1
addi = 8 Source Destination Immediate operand

Instructions such as addi allow us to perform an arithmetic or logic

operation for which one operand is a small constant.

Page  19 © NTK 2009

2.1.4 Load and Store Instructions

op rs rt operand / offset
31 25 20 15 0
I 1 0 x 0 1 1 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0
lw = 35 Base Data Offset relative to base
sw = 43 register register

lw $t0,40($s3) Note on base and offset:

Memory lw $t0,A($s3) The memory address is the sum
of (rs) and an immediate value.
Address in
A[0] Calling one of these the base
base register
A[1] and the other the offset is quite
A[2] arbitrary. It would make perfect
sense to interpret the address
.
. Offset = 4i A($s3) as having the base A
. and the offset ($s3). However,
Element i a 16-bit base confines us to a
A[i]
of array A small portion of memory space.

MiniMIPS lw and sw instructions and their memory addressing

convention that allows for simple access to array elements via a base
address and an offset (offset = 4i leads us to the i th word).
Page  20 © NTK 2009
lw, sw, and lui Instructions

lw $t0,40($s3) # load mem[40+($s3)] in $t0

sw $t0,A($s3) # store ($t0) in mem[A+($s3)]
# “($s3)” means “content of $s3”
lui $s0,61 # The immediate value 61 is
# loaded in upper half of $s0
# with lower 16b set to 0s
op rs rt operand / offset
31 25 20 15 0
I 0 0 1 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1
lui = 15 Unused Destination
Immediate operand

0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Content of $s0 after the instruction is executed

The lui instruction allows us to load an arbitrary 16-bit value into the
upper half of a register while setting its lower half to 0s.

Page  21 © NTK 2009

Initializing a Register

Example

Show how each of these bit patterns can be loaded into $s0:
0010 0001 0001 0000 0000 0000 0011 1101
1111 1111 1111 1111 1111 1111 1111 1111
Solution
The first bit pattern has the hex representation: 0x2110003d
lui $s0,0x2110 # put the upper half in $s0
ori $s0, $s0, 0x003d# put the lower half in $s0
Same can be done, with immediate values changed to 0xffff
for the second bit pattern. But, the following is simpler and faster:
nor $s0,$zero,$zero # because (0  0) = 1

Page  22 © NTK 2009

2.1.5 Jump and Branch Instructions

Unconditional jump and jump through register instructions

j verify # go to mem loc named “verify”
jr $ra # go to address that is in $ra;
# $ra may hold a return address
op jump target address
31 25 0
J 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1
j=2

x x x x 0 0 0 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
From PC
Effective target address (32 bits)

op rs rt rd sh fn
31 25 20 15 10 5 0
R 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
ALU Source Unused Unused Unused jr = 8
instruction register

The jump instruction j of MiniMIPS is a J-type instruction which is shown

along with how its effective target address is obtained. The jump register
(jr) instruction is R-type, with its specified register often being $ra.

Page  23 © NTK 2009

Conditional Branch Instructions

Conditional branches use PC-relative addressing

bltz $s1,L # branch on ($s1)< 0
beq $s1,$s2,L # branch on ($s1)=($s2)
bne $s1,$s2,L # branch on ($s1)($s2)

op rs rt operand / offset
31 25 20 15 0
I 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1
bltz = 1 Source Zero Relative branch distance in words

op rs rt operand / offset
31 25 20 15 0
I 0 0 0 1 0 x 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1
beq = 4 Source 1 Source 2 Relative branch distance in words
bne = 5

Conditional branch instructions of MiniMIPS.

Page  24 © NTK 2009

Comparison Instructions for Conditional Branching

slt $s1,$s2,$s3 # if ($s2)<($s3), set $s1 to 1

# else set $s1 to 0;
# often followed by beq/bne
slti $s1,$s2,61 # if ($s2)<61, set $s1 to 1
# else set $s1 to 0

op rs rt rd sh fn
31 25 20 15 10 5 0
R 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 1 1 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0
ALU Source 1 Source 2 Destination Unused slt = 42
instruction register register

op rs rt operand / offset
31 25 20 15 0
I 0 0 1 0 1 0 1 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1
slti = 10 Source Destination Immediate operand

Comparison instructions of MiniMIPS.

Page  25 © NTK 2009

Examples for Conditional Branching

If the branch target is too far to be reachable with a 16-bit offset

(rare occurrence), the assembler automatically replaces the branch
instruction beq $s0,$s1,L1 with:
bne $s1,$s2,L2 # skip jump if (s1)(s2)
j L1 # goto L1 if (s1)=(s2)
L2: ...
Forming if-then constructs; e.g., if (i == j) x = x + y
bne $s1,$s2,endif # branch on ij
add $t1,$t1,$t2 # execute the “then” part
endif: ...
If the condition were (i < j), we would change the first line to:
slt $t0,$s1,$s2 # set $t0 to 1 if i<j
beq $t0,$0,endif # branch if ($t0)=0;
# i.e., i not< j or ij

Page  26 © NTK 2009

if-then-else Statements

Example
Show a sequence of MiniMIPS instructions corresponding to:
if (i<=j) x = x+1; z = 1; else y = y–1; z = 2*z
Solution

slt $t0,$s2,$s1 # j<i? (inverse condition)

bne $t0,$zero,else # if j<i goto else part
addi $t1,$t1,1 # begin then part: x = x+1
addi $t3,$zero,1 # z = 1
j endif # skip the else part
else: addi $t2,$t2,-1 # begin else part: y = y–1
add $t3,$t3,$t3 # z = z+z
endif:...

Page  27 © NTK 2009

while Statements

Example
The simple while loop: while (A[i]==k) i=i+1;
Assuming that: i, A, k are stored in $s1,$s2,$s3

Solution

loop: add $t1,$s1,$s1 # t1 = 4*i

add $t1,$t1,$t1 #
add $t1,$t1,$s2 # t1 = A + 4*i
lw $t0,0($t1) # t0 = A[i]
bne $t0,$s3,endwhl #
addi $s1,$s1,1 #
j loop #
endwhl: … #
Page  28 © NTK 2009
switch Statements

Example
The simple switch beq s1,t0,case_0
switch(test) { beq s1,t1,case_1
case 0: beq s1,t2,case_2
a=a+1; break; b default
case 1: case_0:
a=a-1; break; addi s2,s2,1 #a=a+1
case 2: b continue
b=2*b; break; case_1:
default: sub s2,s2,t1 #a=a-1
} b continue
case_2:
add s3,s3,s3 #b=2*b
Assuming that: test,a,b are b continue
stored in $s1,$s2,$s3 default:
continue:
Page  29 © NTK 2009
2.1.6 Addressing Modes
Addressing mode is the method by which the location of an operand
is specified within an instruction. MiniMIPS uses sixs addr modes:
Addressing Instruction Other elements involved Operand

Implied Some place jal

in the machine

Immediate Extend,
if required addi, ori…
Reg spec Reg file Reg data
Register

Base
Constant offset Mem
lw, sw…
Reg base Reg Add addr Mem
Reg file data Memory data

Constant offset Mem

PC-relative branch instr
Add addr Mem
Memory data
PC

Pseudodirect PC Mem j
addr Memory Mem
data
Page  30 © NTK 2009
Schematic representation of addressing modes in MiniMIPS.
Finding the Maximum Value in a List of Integers

Example
List A is stored in memory beginning at the address given in $s1.
List length is given in $s2.
Find the largest integer in the list and copy it into $t0.
Solution
lw $t0,0($s1) # initialize maximum to A[0]
addi $t1,$zero,0 # initialize index i to 0
loop: add $t1,$t1,1 # increment index i by 1
beq $t1,$s2,done # if all elements examined,
quit
add $t2,$t1,$t1 # compute 2i in $t2
add $t2,$t2,$t2 # compute 4i in $t2
add $t2,$t2,$s1 # form address of A[i] in $t2
lw $t3,0($t2) # load value of A[i] into $t3
slt $t4,$t0,$t3 # maximum < A[i]?
beq $t4,$zero,loop# if not, repeat with no
change
addi $t0,$t3,0 # if so, A[i] is the
new maximum
j loop # change completed; now repeat
Page  31done: ... # continuation of the program
© NTK 2009
The 20 MiniMIPS Instructions
Covered So Far Instruction Usage op fn
Copy Load upper immediate lui rt,imm 15
Add add rd,rs,rt 0 32
Subtract sub rd,rs,rt 0 34
Arithmetic Set less than slt rd,rs,rt 0 42
Add immediate addi rt,rs,imm 8
Set less than immediate slti rd,rs,imm 10
AND and rd,rs,rt 0 36
OR or rd,rs,rt 0 37
XOR xor rd,rs,rt 0 38
Logic NOR nor rd,rs,rt 0 39
AND immediate andi rt,rs,imm 12
OR immediate ori rt,rs,imm 13
XOR immediate xori rt,rs,imm 14
Load word lw rt,imm(rs) 35
Memory access Store word sw rt,imm(rs) 43
Jump j L 2
Jump register jr rs 0 8
Control transfer Branch less than 0 bltz rs,L 1
Branch equal beq rs,rt,L 4
Table 5.1 Branch not equal bne rs,rt,L 5

Page  32 © NTK 2009

2. Instruction Set Architecture

 2.1. Instructions and addressing

 2.2. Procedures and Data

Page  33 © NTK 2009

2.2. Procedures and Data

 2.2.1 Simple Procedure Calls

 2.2.2 Using the Stack for Data Storage
 2.2.3 Parameters and Results
 2.2.4 Data Types
 2.2.5 Arrays and Pointers
 2.2.6 Additional Instructions

Page  34 © NTK 2009

2.2.1 Simple Procedure Calls

#Laboratory Exercise 4, Home Assignment 1

#include <iregdef.h>
.text
.set noreorder
.globl start
.ent start
start:
li a0,-45 #load input parameter
jal abs #jum and link to abs procedure
nop
.end start
.ent abs
abs:
sub v0,zero,a0 #put -(a0) in v0; in case (a0)<0

bltz a0,done #if (a0)<0 then done

nop
add v0,a0,zero #else put (a0) in v0
done:
jr ra
.end abs
Page  35 © NTK 2009
2.2.1 Simple Procedure Calls

 A procedure is: a subprogram that when called

(initiated, invoked) performs a specific task,
perhaps leading to one or more results, based on the
input parameters (arguments) with which it is
provided and returns to the point of call, having perturbed
nothing else.

 In assembly language, a procedure is associated with a

symbolic name that denotes its starting address. The
jal instruction in MIPS is intended specifically for
procedure calls:
– it performs the control transfer (unconditional jump) to the starting
address of the procedure,
– while also saving the return address in register $ra.
Page  36 © NTK 2009
Illustrating a Procedure Call

main
Prepare
to call
PC jal proc
Prepare
to continue proc
Save, etc.

Restore
jr $ra

Relationship between the main program and a procedure.

Page  37 © NTK 2009

2.2.1 Simple Procedure Calls

Using a procedure involves the following sequence of actions:

1. Put arguments in places known to procedure (reg’s $a0-$a3)
2. Transfer control to procedure, saving the return address (jal)
3. Acquire storage space, if required, for use by the procedure
4. Perform the desired task
5. Put results in places known to calling program (reg’s $v0-$v1)
6. Return control to calling point (jr)

MiniMIPS instructions for procedure call and return from procedure:

jal proc # jump to loc “proc” and link;
# “link” means “save the return
# address” (PC)+4 in $ra ($31)
jr rs # go to loc addressed by rs

Page  38 © NTK 2009

$0 0 $zero A 4-b yte word
$1 $at Reserved for assembler use 3Recalling Register
sits in consecutive
$2 $v0 2
Procedure results
memory addresses Conventions
$3 $v1 according to the 1
$4 $a0 big-endian order 0
$5 $a1 Procedure (most significant
Saved byte has the
$6 $a2 arguments
lowest address)
$7 $a3
$8 $t0
$9 $t1 Byte numbering: 3 2 1 0
$10 $t2 When loading
$11 $t3 Temporary a byte into a
$12 $t4 values register, it goes
$13 $t5 in the low end Byte
$14 $t6
$15 $t7 Word
$16 $s0 Doublew ord
$17 $s1
$18 $s2 Saved
$19 $s3 across
Operands procedure
$20 $s4
$21 $s5 calls
$22 $s6
$23 $s7 A doubleword
$24 $t8 More sits in consecutive
$25 $t9 temporaries registers or
$26 $k0 memory locations
Registers and
$27 $k1 Reserved for OS (kernel) according to the data sizes in
$28 $gp Global pointer big-endian order
$29 $sp Stack pointer (most significant MiniMIPS.
Saved word comes first)
$30 $fp Frame pointer
$31 $ra Return address

Page  39 © NTK 2009

A Simple MiniMIPS Procedure

Example
Procedure to find the absolute value of an integer.
$v0  |($a0)|

Solution
The absolute value of x is –x if x < 0 and x otherwise.
abs: sub $v0,$zero,$a0 # put -($a0) in $v0;
# in case ($a0) < 0
bltz $a0,done # if ($a0)<0 then done
add $v0,$a0,$zero # else put ($a0) in $v0
done: jr $ra # return to calling program

In practice, we seldom use such short procedures because of the

overhead that they entail. In this example, we have 3-4
instructions of overhead for 3 instructions of useful computation.

Page  40 © NTK 2009

Nested Procedure Calls

main
Prepare
to call
PC jal abc Procedure
Prepare abc
to continue abc Procedure
Save xyz
xyz

jal xyz

Restore
Text version jr $ra jr $ra
is incorrect

Example of nested procedure calls.

Page  41 © NTK 2009

2.2.2 Using the Stack for Data Storage

sp b
a
Push c Pop x

sp c
b b
a sp a x = mem[sp]
sp = sp – 4
sp = sp + 4
mem[sp] = c

Effects of push and pop operations on a stack.

push: addi $sp,$sp,-4 pop: lw $t5,0($sp)

sw $t4,0($sp) addi $sp,$sp,4

Page  42 © NTK 2009

Memory Map in MiniMIPS
Hex address 00000000
Reserved 1 M words
00400000

Text segment
Program
63 M words

10000000
Addressable Static data
with 16-bit 10008000
signed offset Data segment
1000ffff
Dynamic data

$gp 448 M words

$28 $sp
$29
$30
$fp 80000000
Stack Stack segment

7ffffffc

Second half of address

space reserved for
memory-mapped I/O
Page  43
Overview of the memory© NTK 2009
address space in MiniMIPS.
2.2.3 Parameters and Results

Stack allows us to pass/return an arbitrary number of values

$sp z
Local y Frame for
..
variables . current
Saved procedure
registers
Old ($fp)
$sp c $fp c
b Frame for b Frame for
a current a previous
.. procedure .. procedure
. .
$fp

Before calling After calling

Use of the stack by a procedure.

Page  44 © NTK 2009

Example of Using the Stack

Saving $fp, $ra, and $s0 onto the stack and restoring
them at the end of the procedure

proc: sw $fp,-4($sp) # save the old frame pointer

addi $fp,$sp,0 # save ($sp) into $fp
addi $sp,$sp,–12 # create 3 spaces on top of stack
sw $ra,-8($fp) # save ($ra) in 2nd stack element
sw $s0,-12($fp) # save ($s0) in top stack element
$sp .
($s0)
($ra) .
($fp) .
$sp lw $s0,-12($fp) # put top stack element in $s0
$fp lw $ra,-8($fp) # put 2nd stack element in $ra
addi $sp,$fp, 0 # restore $sp to original state
$fp lw $fp,-4($sp) # restore $fp to original state
jr $ra # return from procedure

Page  45 © NTK 2009

2.2.4 Data Types

Data size (number of bits), data type (meaning assigned to bits)

Signed integer: byte word
Unsigned integer: byte word
Floating-point number: word doubleword
Bit string: byte word doubleword

Converting from one size to another

Type 8-bit number Value 32-bit version of the number

Unsigned 0010 1011 43 0000 0000 0000 0000 0000 0000 0010 1011
Unsigned 1010 1011 171 0000 0000 0000 0000 0000 0000 1010 1011

Signed 0010 1011 +43 0000 0000 0000 0000 0000 0000 0010 1011
Signed 1010 1011 –85 1111 1111 1111 1111 1111 1111 1010 1011

Page  46 © NTK 2009

ASCII Characters

ASCII (American standard code for information interchange)

0 1 2 3 4 5 6 7 8-9 a-f
NUL DLE SP 0 @ P ` p
0 More More
SOH DC1 ! 1 A Q a q
1 controls symbols
STX DC2 “ 2 B R b r
2
ETX DC3 # 3 C S c s
3
EOT DC4 $ 4 D T d t
4
ENQ NAK % 5 E U e u
5
ACK SYN & 6 F V f v
6
BEL ETB ‘ 7 G W g w
8-bit ASCII code
7
8 BS CAN ( 8 H X h x (col #, row #)hex
HT EM ) 9 I Y i y
9
LF SUB * : J Z j z
e.g., code for +
a
b VT ESC + ; K [ k { is (2b) hex or
c FF FS , < L \ l | (0010 1011)two
d CR GS - = M ] m }

e SO RS . > N ^ n ~

f SI US / ? O _ o DEL

Page  47 © NTK 2009

Loading and Storing Bytes

Bytes can be used to store ASCII characters or small integers.

MiniMIPS addresses refer to bytes, but registers hold words.
lb $t0,8($s3) # load rt with mem[8+($s3)]
# sign-extend to fill reg
lbu $t0,8($s3) # load rt with mem[8+($s3)]
# zero-extend to fill reg
sb $t0,A($s3) # LSB of rt to mem[A+($s3)]

op rs rt immediate / offset
31 25 20 15 0
I 1 0 x x 0 0 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
lb = 32 Base Data Address offset
lbu = 36 register register
sb = 40

Load and store instructions for byte-size data elements.

Page  48 © NTK 2009

Meaning of a Word in Memory

Bit pattern 0000 0010 0001 0001 0100 0000 0010 0000
(02114020) hex

00000010000100010100000000100000
Add instruction

00000010000100010100000000100000
Positive integer
00000010000100010100000000100000
Four-character string

A 32-bit word has no inherent meaning and can be interpreted in

a number of equally valid ways in the absence of other cues
(e.g., context) for the intended meaning.

Page  49 © NTK 2009

2.2.5 Arrays and Pointers

Index: Use a register that holds the index i and increment the register in
each step to effect moving from element i of the list to element i + 1
Pointer: Use a register that points to (holds the address of) the list element
being examined and update it in each step to point to the next element

Array index i Base Array A Pointer to A[i] Array A

Add 1 to i; Add 4 to get

Compute 4i; the address
Add 4i to base of A[i + 1]
A[i] A[i]
A[i + 1] A[i + 1]

Stepping through the elements of an array using the indexing

method and the pointer updating method.

Page  50 © NTK 2009

Selection Sort

To sort a list of numbers, repeatedly perform the following:

Find the max element, swap it with the last item, move up
the “last” pointer
A A A
first first first

max x y

last
last last y x
Start of iteration Maximum identified End of iteration

One iteration of selection sort.

Page  51 © NTK 2009
Selection Sort Using the Procedure max

A A A
first first first

In $a0 max
Inputs to
x
Outputs from
y

In $v0 In $v1 proc max

proc max
In $a1 last
last last y x
Start of iteration Maximum identified End of iteration

sort: beq $a0,$a1,done # single-element list is sorted

jal max # call the max procedure
lw $t0,0($a1) # load last element into $t0
sw $t0,0($v0) # copy the last element to max loc
sw $v1,0($a1) # copy max value to last element
addi $a1,$a1,-4 # decrement pointer to last element
j sort # repeat sort for smaller list
done: ... # continue with rest of program

Page  52 © NTK 2009

2.2.6 Additional Instructions

MiniMIPS instructions for multiplication and division:

mult $s0, $s1 # set Hi,Lo to ($s0)($s1)
div $s0, $s1 # set Hi to ($s0)mod($s1)
# and Lo to ($s0)/($s1)
mfhi $t0 # set $t0 to (Hi)
mflo $t0 # set $t0 to (Lo)
op rs rt rd sh fn
31 25 20 15 10 5 0
R 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 x 0
ALU Source Source Unused Unused mult = 24
instruction register 1 register 2 div = 26

The multiply (mult) and divide (div) instructions of MiniMIPS.

op rs rt rd sh fn
31 25 20 15 10 5 0
R 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 x 0
ALU Unused Unused Destination Unused mfhi = 16
instruction register mflo = 18

MiniMIPS instructions for copying the contents of Hi and Lo registers into

general registers .
Page  53 © NTK 2009
Logical Shifts

MiniMIPS instructions for left and right shifting:

sll $t0,$s1,2 # $t0=($s1) left-shifted by 2
srl $t0,$s1,2 # $t0=($s1) right-shifted by 2
sllv $t0,$s1,$s0 # $t0=($s1) left-shifted by ($s0)
srlv $t0,$s1,$s0 # $t0=($s1) right-shifted by ($s0)
op rs rt rd sh fn
31 25 20 15 10 5 0
R 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 x 0
ALU Unused Source Destination Shift sll = 0
instruction register register amount srl = 2

op rs rt rd sh fn
31 25 20 15 10 5 0
R 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 x 0
ALU Amount Source Destination Unused sllv = 4
instruction register register register srlv = 6

The four logical shift instructions of MiniMIPS.

Page  54 © NTK 2009

Unsigned Arithmetic and Miscellaneous Instructions

MiniMIPS instructions for unsigned arithmetic (no overflow exception):

addu $t0,$s0,$s1 # set $t0 to ($s0)+($s1)
subu $t0,$s0,$s1 # set $t0 to ($s0)–($s1)
multu $s0,$s1 # set Hi,Lo to ($s0)($s1)
divu $s0,$s1 # set Hi to ($s0)mod($s1)
# and Lo to ($s0)/($s1)
addiu $t0,$s0,61 # set $t0 to ($s0)+61;
# the immediate operand is
# sign extended

To make MiniMIPS more powerful and complete, we introduce later:

sra $t0,$s1,2 # sh. right arith (Sec. 10.5)
srav $t0,$s1,$s0 # shift right arith variable
syscall # system call (Sec. 7.6)

Page  55 © NTK 2009

The 20 MiniMIPS Instructions
from Chapter 6 Instruction Usage op fn
(40 in all so far) Move from Hi mfhi rd
0 16
mflo rd
Copy Move from Lo
Add unsigned addu rd,rs,rt
0 18
Subtract unsigned subu rd,rs,rt 0 33
Multiply mult rs,rt 0 35
multu rs,rt
Multiply unsigned
0 24
Divide div rs,rt
Arithmetic Divide unsigned divu rs,rt
0 25
Add immediate unsigned addiu rs,rt,imm 0 26
Shift left logical sll rd,rt,sh 0 27
Table 6.2 (partial) Shift right logical srl rd,rt,sh 9
Shift right arithmetic sra rd,rt,sh
0 0
Shift left logical variable sllv rd,rt,rs
Shift right logical variable srlv rt,rd,rs
0 2
Shift right arith variable srav rd,rt,rd 0 3
Shift Load byte lb rt,imm(rs) 0 4
lbu rt,imm(rs)
Load byte unsigned
0 6
Store byte sb rt,imm(rs)
Jump and link jal L
0 7
System call syscall 32
Memory access 36
40
3
Control transfer 0 12

Page  56 © NTK 2009

The 37 + 3 MiniMIPS Instructions Covered So Far

Instruction Usage Instruction Usage

Load upper immediate lui rt,imm Move from Hi mfhi rd

Add add rd,rs,rt Move from Lo mflo rd

Subtract sub rd,rs,rt Add unsigned addu rd,rs,rt

Set less than slt rd,rs,rt Subtract unsigned subu rd,rs,rt

Add immediate addi rt,rs,imm Multiply mult rs,rt

Set less than immediate slti rd,rs,imm Multiply unsigned multu rs,rt

AND and rd,rs,rt Divide div rs,rt

OR or rd,rs,rt Divide unsigned divu rs,rt

XOR xor rd,rs,rt Add immediate unsigned addiu rs,rt,imm

NOR nor rd,rs,rt Shift left logical sll rd,rt,sh

AND immediate andi rt,rs,imm Shift right logical srl rd,rt,sh

OR immediate ori rt,rs,imm Shift right arithmetic sra rd,rt,sh

XOR immediate xori rt,rs,imm Shift left logical variable sllv rd,rt,rs

Load word lw rt,imm(rs) Shift right logical variable srlv rd,rt,rs

Store word sw rt,imm(rs) Shift right arith variable srav rd,rt,rs

Jump j L Load byte lb rt,imm(rs)

Jump register jr rs Load byte unsigned lbu rt,imm(rs)

Branch less than 0 bltz rs,L Store byte sb rt,imm(rs)

Branch equal beq rs,rt,L Jump and link jal L

Page  57 © NTK 2009
Branch not equal bne rs,rt,L System call syscall
Content

1. Introduction - Computer system technology and Computer

Performance
2. Instruction Set Architecture
3. Arithmetic for Computer
4. CPU Organization

Page  58 © NTK 2009

3. Arithmetic for Computer

1. Introduction
2. Unsigned Numbers
3. Signed Number
4. Addition and Subtraction
5. Multiplication
6. Division
7. Floating Point

Page  59 © NTK 2009

3.1. Introduction

Page  60 © NTK 2009

Number Representation

 Numbers are normally represented using a positional

number system:
N (b )  an an 1an  2 ...a1a0 .a1a 2 ...am

– Base/radix: b (the number of digits)

– Digits: 0..(b-1)
• 0 ≤ ai ≤ (b-1)

– Binary: b=2, digits:0,1

– Decimal: b=10, digits: 0,1,2,3,4,5,6,7,8,9
– Octal: b=8, digits: 0,1,2,3,4,5,6,7
– Hexadecimal: b=16, digits: 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F
Page  61 © NTK 2009
Number Representation

N (b )  an an 1an  2 ...a1a0 .a1a 2 ...am

N(10)  an .bn  an1.bn1  ...  a1.b1  a0 .b0  a1.b1  ...  am .bm

n
N (10)   a .b
i  m
i
i

11101.11(2) = 1x24+1x23+1x22+0x21+1x20+1x2-1+1x2-2= 29.75(10)

Page  62 © NTK 2009

Number Representation

Decimal:
– b=10
– Digits: 0,1,2,3,4,5,6,7,8,9

N (10)  an an 1an  2 ...a1a0 .a1a2 ...a m ai = 0..9

– Eg:
N (10)   a .10
i  m
i
i

539.45(10) = 5x102+3x101+9x100+4x10-1+5x10-2

Page  63 © NTK 2009

Number Representation

Binary:
– b=2
bit – binary digit
– Digits: 0,1

N ( 2)  an an 1an  2 ...a1a0 .a1a 2 ...am ai = 0,1

n
N (10)   a .2
i  m
i
i

– Eg:
1011.011(2) = 1x23 + 0x22 + 1x21 + 1x20 + 0x2-1 + 1x2-2 + 1x2-3
Page  64 © NTK 2009
Number Representation

 Binary (cnt’)
– n-bit binary number can represent which range?
• an-1...a1a0 from 0 to 2n-1

– MSB – Most Significant Bit

N ( 2)  an 1an  2 ...a1a0
– LSB – Least Significant Bit
MSB LSB

0001 = 1 0110 = 6 1011 = 11

0010 = 2 0111 = 7 1100 = 12
0011 = 3 1000 = 8 1101 = 13
0100 = 4 1001 = 9 1110 = 14
0101 = 5 1010 = 10 1111 = 15

Page  65 © NTK 2009

Number Representation

Octal:
– b=8
N (8)  an an 1...a1a0 .a1a 2 ...am
– Digits: 0,1,2,3,4,5,6,7
ai = 0..7
– Eg:
503.071(8) = 5x82 + 0x81 + 3x80 + 0x8-1 + 7x8-2 + 1x8-3

 Hexadecimal: N (16)  an an 1...a1a0 .a1a2 ...a m

– b=16 ai = 0..F
– Digits: 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F
– Eg:
503.071(16) = 5x162 + 0x161 + 3x160 + 0x16-1 + 7x16-2 + 1x16-3

Page  66 © NTK 2009

Convert from base b to base 10

Base b to base 10 conversion

N (b )  an an 1an  2 ...a1a0 .a1a 2 ...am

N(10)  an .bn  an1.bn1  ...  a1.b1  a0 .b0  a1.b1  ...  am .bm

Eg:
– 1010.11(2)= 1x23+0x22+1x21+0x20+1x2-1+1x2-2=10.75(10)
– 1010.11(8)=?
– A12(16)=?
Page  67 © NTK 2009
Convert from base 10 to base b

Base 10 to base b conversion

– For integer part:
• Divide integer part by b until the result is 0
• Write remainders in reverse order to get the converted
result.
– For the odd part after “.”
• Multiply by b until the result is 0

Page  68 © NTK 2009

Convert from base 10 to base 2

 Eg1: 6.625(10) = ?(2)

– The integer part – The odd part after “.”
• 0.625 x 2 = 1.25
6 2
• 0.25 x 2 = 0.5
0 3 2
• 0.5 x 2 = 1.0
1 1 2

1 0 6.625(10) = 110.101(2)

 Eg2: 120.5625(10) = 1111000.1001(2)

Page  69 © NTK 2009
Convert from base 2 to base 2n

 Group from right to left n-bit groups and replace the

equivalent values in base 2n
 Eg:

 101011(2) = ?(8) 1010.110(2)=12.6(8)

 101011(2) = ?(16) 1010.110(2)=A.C(16)

Page  70 © NTK 2009

Convert from base 2n to base 2

Each digit in base 2n is replaced by n bit in base 2.

Eg:

37A.B(16)=?(2)

Page  71 © NTK 2009

Convert from base i to base j

If both i and j are powers of 2, use base 2 as an

intermediate base:
– Eg: base 8  base 2  base 16
– 735.37(8)=?(16)

Else, use base 10 as an intermediate base:

– Eg: base 5  base 10  base 2

Page  72 © NTK 2009

3. Arithmetic for Computer

1. Introduction
2. Unsigned Numbers
3. Signed Number
4. Addition and Subtraction
5. Multiplication
6. Division
7. Floating Point

Page  73 © NTK 2009

3.2. Unsigned Numbers

 The general form of signed number A:

an-1an-2...a2a1a0

 Value of A: A  an 1 2 n 1  an 2 2 n 2  ...  a1 21  a0 20
n 1
A   ai 2i
i 0
 Range of representation:
– Use n bit to represent 2’s complement numbers
– Range: 0 => 2n-1

Page  74 © NTK 2009

3. Arithmetic for Computer

1. Introduction
2. Unsigned Numbers
3. Signed Number
4. Addition and Subtraction
5. Multiplication
6. Division
7. Floating Point

Page  75 © NTK 2009

3.3. Signed Numbers

 1’s complement and 2’s complement number

– A binary integer A is represented by n bit:
• 1’s complement number of A is (2n - 1) – A
• 2’s complement number of A is 2n - A
• Notes: 2’s complement number of A = 1’s complement number + 1
– Eg:
• n=8, A = 00110101
• 1’s complement number of A is (28 - 1) - 00110101=
• 2’s complement number of A is 28 - 00110101=

Page  76 © NTK 2009

2’s complement representation of signed numbers

 Most left bit is sign bit

 Positive and 0 numbers are expressed in usual binary format.
– The largest number can be represented is 2n-1-1
– n=8 => largest signed number: 28-1-1 = 127
 Negative number a is stored as the binary equivalent of 2n-A
in one n-bit system.
– -3 is stored as 28-3=11111101 in a 8-bit system
– The most negative number can be stored is -2n-1

Page  77 © NTK 2009

2’s complement representation of signed numbers

 The general form of signed number A:

an-1an-2...a2a1a0

n2
 Value of A: A  an 1 2 n 1
  ai 2i
i 0

 Range of representation:
– Use n bit to represent 2’s complement numbers
– Range: -2n-1 => 2n-1-1

Page  78 © NTK 2009

2’s complement representation of signed numbers

 +10 = 0000 1010

 - 10 = 28-10 = 1 0000 0000
– 0000 1010
1111 0110
- 10 = 1111 0110

 +10 + (-10) = ?

Page  79 © NTK 2009

MIPS signed number representation

 32 bit signed numbers:

0000 0000 0000 0000 0000 0000 0000 0000two = 0(10)
0000 0000 0000 0000 0000 0000 0000 0001two = + 1(10)
0000 0000 0000 0000 0000 0000 0000 0010two = + 2(10)
...
0111 1111 1111 1111 1111 1111 1111 1110two = + 2,147,483,646(10)
0111 1111 1111 1111 1111 1111 1111 1111two = + 2,147,483,647(10)
1000 0000 0000 0000 0000 0000 0000 0000two = – 2,147,483,648(10)
1000 0000 0000 0000 0000 0000 0000 0001two = – 2,147,483,647(10)
1000 0000 0000 0000 0000 0000 0000 0010two = – 2,147,483,646(10)
...
1111 1111 1111 1111 1111 1111 1111 1101two = – 3(10)
1111 1111 1111 1111 1111 1111 1111 1110two = – 2(10)
1111 1111 1111 1111 1111 1111 1111 1111two = – 1(10)

Page  80 © NTK 2009

2’s complement representation of signed numbers

 Procedure to find binary representation of negative number in

2’s complement:
– Find the binary equivalent of the magnitude
– Complement each bit (0=>1, 1=>0)
– Add 1

 Eg: find representation of -13 in 8-bit signed number system

using 2’s complement:
• Magnitude: 13 = 0000 1101
• 1’s complement: 1111 0010
• Add 1: + 1
• -13 = 1111 0011

Page  81 © NTK 2009

2’s complement representation of signed numbers

 To find the magnitude of a negative number:

– Complement each bit
– Add 1

 Eg:
-5 11111011 -1 11111111
00000100 00000000
1 1
5 00000101 1 00000001

Page  82 © NTK 2009

3. Arithmetic for Computer

1. Introduction
2. Unsigned Numbers
3. Signed Number
4. Addition and Subtraction
5. Multiplication
6. Division
7. Floating Point

Page  83 © NTK 2009

3.4. Addition & Subtraction

 3.4.1. Addition of unsigned numbers

 3.4.2. Addition of signed numbers
 3.4.3. Subtraction of signed numbers

Page  84 © NTK 2009

3.4.1. Addition of Unsigned Numbers

 Unsigned binary addition similar to decimal addition.

decimal binary
carry 1100 11110
A 2565 10110
B 6754 11011
sum 9319 110001

Eg: 10101(2) + 11011(2) = ? (2)

Page  85 © NTK 2009

Addition of Unsigned Numbers

 Overflow
– Occur when the result of addition is out of range of representation (the
result can not be stored in the predefined number of bits)

– Eg:
X = 1001 0110 = 150 X = 1100 0101 = 197
Y = 0001 0011 = 19 Y = 0100 0110 = 70
S = 1010 1001 = 169 S = 0000 1011=11 267
Cout = 0 Cout = 1  carry-out

 Overflow occurs when Cout = 1

Page  86 © NTK 2009

3.4.2. Addition of Signed Numbers

 The reason that 2’s complement is so popular is the simplicity

of addition.
 To add any two numbers, no matter what the sign of each is,
we just do binary addition on their representation.

-5 1011 -5 1011 -5 1011

+7 0111 +5 0101 +3 0011

+2 0010 0 0000 -2 1110

Page  87 © NTK 2009

Addition of Signed Numbers

 Overflow
– Occur when the result of addition is out of range of representation (the
result can not be stored in the predefined number of bits)
– Occur when?
• Add two numbers of the opposite sign?
• Add two positive numbers?
• Add two negative numbers?

 Overflow occurs when adding two numbers with the same

sign and the result is in different sign

Page  88 © NTK 2009

3.4.3. Subtraction of Signed Numbers

 Principle:
– Subtraction is addition of negative number.
a – b = a + (-b)

 Eg: 7 – 5 = ?
5 0101 7 0111

1010 -5 +1011

+ 1 2 0010

-5 1011

Page  89 © NTK 2009

 Overflow when adding or subtracting signed numbers:

Page  90 © NTK 2009

Addition & Subtraction in MIPS

 Unsigned integers are commonly used for memory addresses

where overflow ignored, so MIPS provide two kinds of
addition and subtraction:
– add, addi, sub cause exception when overflow
– addu, addiu, subu do not cause exception when overflow

– MIPS has a register called exception program counter (EPC) to store

the address of the instruction that caused the exception.
– Instruction mfc0 is used to copy EPC into a general purpose register
mfc0 s1,epc

Page  91 © NTK 2009

3. Arithmetic for Computer

1. Introduction
2. Unsigned Numbers
3. Signed Number
4. Addition and Subtraction
5. Multiplication
6. Division
7. Floating Point

Page  92 © NTK 2009

3.5. Multiplication

 3.5.1. Multiplication of unsigned numbers

 3.5.2. Multiplication of signed numbers

Page  93 © NTK 2009

3.5.1. Multiplication of unsigned numbers

1000 Multiplicand (+8)

x 1001 Multiplier (+9)
1000
0000
0000
1000
1001000 Product (+72)

 Multiplication of two n-bit unsigned numbers, the product is

one 2n-bit unsigned number.

Page  94 © NTK 2009

Multiplication implementation - 1st version

 Multiplicand, ALU, product are 64 bit

 Multiplier is 32 bit

Page  95 © NTK 2009

Multiplication algorithm

 If each step take one

clock cycle, this
multiplication algorithm
will require almost 100
clock cycles to multiply
two 32-bit numbers.

Page  96 © NTK 2009

Multiplication implementation – 2nd version

 Multiplicand, ALU, multiplier are 32 bit

 Product is 64 bit

Page  97 © NTK 2009

3.5.2. Multiplication of signed numbers

 Use unsigned multiplication:

– Convert multiplicand & multiplier to positive numbers
– Multiply using unsigned multiplication algorithm
– Change the sign of product:
• If the signs of multiplicand and multiplier are the same, the product is
the result of step 2.
• If the signs disagree, the product is two’s complement of the result of
step 2.

Page  98 © NTK 2009

Faster multiplication

Page  99 © NTK 2009

Multiply in MIPS

 Product is stored in a pair of 32-bit special registers, called

Hi & Lo
 To produce properly signed and unsigned product, MIPS has
two multiplication instructions:
– mult – multiply signed
– multu – multiply unsigned
 To fetch the 32-bit product, MIPS provide two instructions:
– mfhi
– mflo

Page  100 © NTK 2009

3. Arithmetic for Computer

1. Introduction
2. Unsigned Numbers
3. Signed Number
4. Addition and Subtraction
5. Multiplication
6. Division
7. Floating Point

Page  101 © NTK 2009

3.6. Division

 3.6.1. Division of unsigned numbers

 3.6.2. Division of signed numbers

Page  102 © NTK 2009

3.6.1. Division of unsigned numbers

 Divide 1001010(10) by 1000(10)

Page  103 © NTK 2009

Division implementation – 1st version

 Divisor, ALU, remainder are 64 bit

 Quotient is 32 bit

Page  104 © NTK 2009

Division algorithm

Page  105 © NTK 2009

Division implementation – 2nd version

Page  106 © NTK 2009

3.6.2. Division of signed numbers

 Use unsigned division algorithm:

– Convert dividend and divisor to positive numbers
– Divide using unsigned division algorithm
– Change the sign of product:
• Dividend Divisor Quotient Remainder
+ + Keep Keep
+ - Negate Keep
- + Negate Negate
- - Keep Negate

Page  107 © NTK 2009

Faster division

Page  108 © NTK 2009

Divide in MIPS

 Remainder is stored in Hi, quotient is stored in Lo

 To produce properly signed and unsigned result, MIPS has
two division instructions:
– div – divide signed
– divu – divide unsigned
 To fetch the 32-bit result, MIPS provide two instructions:
– mfhi
– mflo

Page  109 © NTK 2009

3. Arithmetic for Computer

1. Introduction
2. Unsigned Numbers
3. Signed Number
4. Addition and Subtraction
5. Multiplication
6. Division
7. Floating Point

Page  110 © NTK 2009

3.7. Floating Point

Floating-point numbers can be represented in the

form:
(-1)sx1.xxxxx(2)x2yyyy
s: sign (s=0 => positive, s=1 => negative)
x,y={0,1}

15.75(10) =1111.11(2) = 1.11111x23

Page  111 © NTK 2009

Floating-point number representation

Page  112 © NTK 2009

IEEE 754 standard

 To represent floating-point numbers

 Base b=2
 Three basic forms:
– Single-precision number representation, 32 bit
– Double-precision number representation, 64 bit
– Extended-precision number representation, 80 bit
 Formats:
31 30 23 22 0
S e m

63 62 52 51 0
S e m

79 78 64 63 0
S e m
Page  113 © NTK 2009
IEEE 754 standard

31 30 23 22 0
S e m

63 62 52 51 0
S e m

79 78 64 63 0
S e m

X = (-1)S x 1.m x 2e-b

 s: sign bit (s=0 => positive, s=1 => negative)
 e: excess
 b: bias
– Single-precision 32-bit : b = 127
– Double-precision 64-bit : b = 1023
– Extended-precisioin 80-bit : b = 16383

Page  114 © NTK 2009

IEEE 754 standard

 Example: One real number X is represented using IEEE 754

standard with the following format:
1100 0001 0101 0110 0000 0000 0000 0000
Show the decimal value of X

 Solution: 1100 0001 0101 0110 0000 0000 0000 0000

s=1 e=130 m

X = (-1)S x 1.m x 2e-b

= (-1)1 x 1.1010110 x 2130-127
= -1.101011 x 23= -1101.011 = -13.375(10)
Page  115 © NTK 2009
IEEE 754 standard

 Example: One real number X is represented using IEEE 754

standard with the following format:
0011 1111 1000 0000 0000 0000 0000 0000
Show the decimal value of X

Solution:
X = (-1)S x 1.m x 2e-127
= (-1)0 x 1.0 x 2127-127
=1

Page  116 © NTK 2009

IEEE 754 standard

 Example: Represent the real number -19.7890625(10) using

32 bit IEEE 754 standard

 19.7890625 = 10011.1100101(2) = 1.00111100101 x 24

 -19.7890625 = (-1)1 x 1.00111100101 x 2131-127
= 1 1000 0011 001111001010...0

Page  117 © NTK 2009

Special convention

Page  118 © NTK 2009

Range of Representation

underflow
overflow overflow

¥ -b -a -0 +0 a b ¥

 32 bit: a = 2-127 ≈ 10-38 b = 2+127 ≈ 10+38

 64 bit: a = 2-1023 ≈ 10-308 b = 2+1023 ≈ 10+308
 80 bit: a = 2-16383 ≈ 10-4932 b = 2+16383 ≈ 10+4932

Page  119 © NTK 2009

Floating-point addition

Page  120 © NTK 2009

Block
diagram
of
an arithmetic
unit
dedicated to
floating-point
addition

Page  121 © NTK 2009

Floating-point multiplication

Page  122 © NTK 2009

Floating-point instructions in MIPS

Operations Single Double

Addition add.s add.d
Subtraction sub.s sub.d
Multiplication mul.s mul.d
Division div.s div.d
Comparison c.x.s c.x.d x = eq, neq, lt, le, gt, ge

bclt Branch if true

bclf Branch if false

Page  123 © NTK 2009

Floating-point instructions in MIPS

 MIPS has dedicated registers and instructions used for

floating-point operations:
– 32 floating-point registers: $f0,$f1,$f2...,$f31
– Separate load and store for floating-point registers: lwc1, swc1,
lwc1 f4,0(sp) # Load 32 bit F.P number into f4
lwc1 f6,4(sp) # Load 32 bit F.P number into f6
add.s f2,f4,f6 # f2 = f4 + f6
swc1 f2,8(sp) # Store F.P number from f2

Page  124 © NTK 2009

Floating-point instructions in MIPS

 A double precision register is combination of an even-old pair

of single precision registers, using even register as its name

lwc1 f4,0(sp) # Load upper part of 64 bit F.P number into f4

lwc1 f5,4(sp) # Load lower part of 64 bit F.P number into f5
lwc1 f6,8(sp) # Load upper part of 64 bit F.P number into f6
lwc1 f7,12(sp) # Load lower part of 64 bit F.P number into f7
add.d f2,f4,f6 # (f2,f3) = (f4,f5) + (f6,f7)
swc1 f2,16(sp) # Store upper part of 64 bit F.P number from f2
swc1 f2,20(sp) # Store upper part of 64 bit F.P number from f2

Page  125 © NTK 2009

MIPS floating-point assembly language

Page  126 © NTK 2009

Content

1. Introduction - Computer system technology and Computer

Performance
2. Instruction Set Architecture
3. Arithmetic for Computer
4. CPU Organization

Page  127 © NTK 2009

4. CPU Organization

 4.1. Building a datapath

 4.2. Single cycle implementation
 4.3. Multi cycle implementation
 4.4. Exceptions
 4.5. Pipelining

Page  128 © NTK 2009

4.1. Building a datapath

 A basic MIPS implementation

 Implement a subset of the core MIPS instruction set:
– Memory-reference instructions: lw, sw
– Arithmetic-logical instructions: add, sub, and, or, slt
– Branch instructions: beq, j

Page  129 © NTK 2009

Review of instruction classes

op rs rt rd sh fn
31 25 20 15 10 5 0
R 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
Opcode Source Source Destination Shift Opcode
register 1 register 2 register amount extension

op rs rt operand / offset
31 25 20 15 0
I 6 bits 5 bits 5 bits 16 bits
Opcode Source Destination Immediate operand
or base or data or address offset

op jump target address

31 25 0
J 6 bits 1 0 0 0 0 0 0 0 0 0 0 0 26
0 bits
0 0 0 0 0 0 0 1 1 1 1 0 1
Opcode Memory word address (byte address divided by 4)

Page  130 © NTK 2009

A basic MIPS implementation

 For every instruction, the first two steps are identical:

– Send the program counter PC to instruction memory and fetch the instruction
from that memory.
– Read one or two registers, using fields of the instruction to select the registers
to read.
– Instructions class except for jump use ALU:
• Memory-reference instructions use ALU for address calculation
• Arithmetic-logical instructions use ALU for operation execution
• Branch instructions use ALU for comparison
– Next actions differ according to instructions:
• Memory-reference instructions: access memory to write / read
• Arithmetic-logical instructions: write data from ALU back to registers
• Branch instructions: change next instruction address based in comparison

Page  131 © NTK 2009

Abstract view of a basic MIPS implementation – v1.0

3
1
2
Page  132 © NTK 2009
Abstract view of a basic MIPS implementation – v1.0

 Drawback in version 1.0 of the basic MIPS implementation:

– In several places, data going into a particular unit as coming from two
different sources.
• Value written into PC can come from one of two adders.
• Data written into registers can come from either ALU or date memory
– Several of the units must be controlled depending on the type of
instruction.
• Data memory must read on load and write on store.
• Registers must be written on a load and an arithmetic-logical
instruction.

Page  133 © NTK 2009

Abstract view of a basic MIPS implementation – v2.0

Page  134 © NTK 2009

Abstract view of a basic MIPS implementation – ver 2

 Benefits:
– Simple, easy to understand
– Single-cycle datapath

 Requirement:
– Instruction memory and data memory are separate because:
• Format of data and instruction is different
• Having separate memories is less expensive
• The processor operates in one cycle and cannot use single-portet
memory for two different access within that cycle.

Page  135 © NTK 2009

Logic design conventions

 To design the machine, how the logic implementing the

machine will operate and how the machine is clocked needed
to be decided.

 MIPS design consists of two types of logic elements:

– Combinational elements
– Sequential elements:
• Flip-flops, memories, registers
Please revise
Digital Logic Design

Page  136 © NTK 2009

Building a datapath

 Fetching instructions and incrementing PC

 Implement R-format ALU operations
 Loading and storing data
 Branching with beq

Page  137 © NTK 2009

Fetching instructions and incrementing PC

 Read data from instruction memory to output

 PC = PC + 4

Page  138 © NTK 2009

Implement R-format ALU operations

 R-format ALU instructions: add, sub, and, or, slt

add s1,s2,s3 # s1 = s2 + s3
1. Read s2,s3 from register file according to two register numbers to outputs
2. ALU execute operation add
3. Write result back to register s1,©need
Page  139 NTK 2009
write control signal
Loading and storing data

lw t1, offset_value(t2)
sw t1, offset_value(t2)
1. Compute memory address by adding the base register t2
with 16-bit sign extended field offset_value.
2. Read data from register t1 to write to calculated address in
data memory

or Read data from calculated

address in data memory to
write to register t1.

Page  140 © NTK 2009

Branching with beq

beq t1,t2,offset
 Compute branch target address by adding sign-extended
offset to PC
– Two notes in the definition of branch instruction:
• The instruction set architecture specifies that the base for branch
address calculation is the address of the instruction following the
branch. So we can always compute PC+4 in fetching period.
• The offset field is shifted left 2 bits so that it’s a word offset.
 Use ALU to evaluate branch condition:
– Read two registers t1, t2 from register file to inputs of ALU
– Subtract two inputs of ALU, assert control signal

Page  141 © NTK 2009

Branching with beq

Page  142 © NTK 2009

Creating a single datapath

 Single datapath:
– Execute every instruction in one clock cycle.
– No resource used more than once per instruction, so any elements
needed more than once must be duplicated.
• Separate memories for instruction and data.

Page  143 © NTK 2009

A single datapath for memory and R-type instructions

Page  144 © NTK 2009

A single datapath for a basic MIPS

Control signals
are not connected
Page  145 © NTK 2009
4. CPU Organization

 4.1. Building a datapath

 4.2. Single cycle implementation
 4.3. Multi cycle implementation
 4.4. Exceptions
 4.5. Pipelining

Page  146 © NTK 2009

4.2. Single cycle implementation

 ALU Control
 Main Control Unit

Page  147 © NTK 2009

ALU Control

 ALU has four control inputs:

 Depend on the instruction class, one of first five functions of ALU will be
performed (NOR is needed for other parts):
– Load, store instructions: use ALU to compute memory address by addition
– R-type instructions: ALU performs one of five actions (add, sub, and, or, slt)
based on 6-bit function field in the instruction.
– Branch beq: ALU performs subtraction

Page  148 © NTK 2009

ALU control inputs

Page  149 © NTK 2009

Truth table which uses don’t care values to have compact minimization form
Designing the Main Control Unit

Page  150 © NTK 2009

Three instruction classes

 The opcode is always contained in bits 31:26. We refer to this field as Opcode[5:0]
 Two registers to be read (R-type, beq, sw) are always at 25:21 and 20:16
 The base register for load and store instruction is always in bit 25:21
 16 bit offset for load/store/beq is always in 15:0
 The destination register is in one of two places:
– For load, it’s in 20:16
– For R-type, it’s in 15:11
Page  151 © NTK 2009
A simple datapath with all control lines identified

Single datapath
with control unit 153
Single control & datapath
extended to handle jump 154
Why a single-cycle implementation is not used today?

 Although single-cycle design will work correctly, it will not be

used in modern designs because it’s inefficient:
– Clock cycle must have the same length for every instruction in this
single-cycle design => Clock cycle is determined by the longest
possible path in the machine.
• Longest instruction: Load instruction: uses 5 functional units in series:
instruction memory, register file, ALU, data memory, register file.
=> Not good since several instruction could fit in a shorter clock cycle
– Some functional units (ALU) must be duplicated
=> Hardware cost

4. CPU Organization

 4.1. Building a datapath

 4.2. Single cycle implementation
 4.3. Multicycle implementation
 4.4. Exceptions
 4.5. Pipelining

4.3. Multicycle implementation

 Multicycle implementation:
– Break each instruction into a series of steps corresponding to the
functional unit operations needed.
• Each step in the execution will take one clock cycle. Each instruction
will take different numbers of clock cycles.
• Allow a functional unit to be used more than once per instruction =>
hardware share.

High-level view of multicycle datapath

 Key elements:
– A shared memory unit for both instructions & data
– A single ALU
– Require additional registers: IR, Memory data register, A,B, ALUout
– Require additional multiplexers© NTK 2009
Page  158
Multicycle implementation

 At the end of a clock cycle, all data used in subsequent clock

cycle must be stored in a state element:
– Data used by subsequent instructions in a later clock cycle is stored in
one of the programmer-visible state elements: register file, PC, memory
– Data used by the same instruction in a later clock cycle is stored in one
of additional registers.

 One clock cycle can accommodate at most one of the

following operations: A memory access, A register file access
(two reads and one write), An ALU operation
=> Any data produced by these three functional units must be saved
into a temporary register for use in later cycle. If not saved, timing race.

Additional registers

 Instruction Register (IR) and Memory data register (MDR) are

added to save the output of memory for instruction read and a
data read. Two separate registers are used since both values
are needed during the same clock.
 A,B registers are used to hold values read from register file.
 ALUout register holds the output of ALU.

Additional multiplexers

 Replacing three ALUs of single-cycle datapath by a single ALU requires:

– An additional multiplexer: {A,PC} => the first ALU input.
– An additional multiplexer: {B,4,Sign extend, shift left 2} => the second ALU input

Multicycle datapath with control lines

Branch and jump instructions

 With jump and branch instructions, three possible sources of

values to be written into PC:
– The output of ALU, PC + 4 => PC directly
– Register ALUout – the address of branch target after it is computed
– Address of jump target = Lower 26 bit of IR << 2 concatenated with 4
upper bits of the incremented PC

Complete datapath for multicycle implementation

op rs rt rd sh fn
31 25 20 15 10 5 0
R 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
Opcode Source Source Destination Shift Opcode
register 1 register 2 register amount extension

op rs rt operand / offset
31 25 20 15 0
I 6 bits 5 bits 5 bits 16 bits
Opcode Source Destination Immediate operand
or base or data or address offset

op jump target address

31 25 0
J 6 bits 1 0 0 0 0 0 0 0 0 0 0 0 26
0 bits
0 0 0 0 0 0 0 1 1 1 1 0 1
Opcode Memory word address (byte address divided by 4)

ftp address to download materials

ftp://dce.hut.edu.vn/kiennt/

Action of the 1-bit control signals

Action of the 2-bit control signals

Breaking instruction execution into clock cycles

 Each MIPS instruction needs from three to five of these steps

– 1. Instruction fetch step.
Same for all instructions
– 2. Instruction decode and register fetch step.
– 3. Execution, memory address computation or branch completion.
– 4. Memory access or R-type instruction completion step.
– 5. Memory read completion step.

1. Instruction fetch step

 Fetch the instruction from memory and compute the address

of the next sequential instructions:
– IR <= Memory[PC];
• Assert the control signals MemRead and IRWrite and set IorD = 0 to
select PC as the source of address.
– PC <= PC + 4;
• ALUSrcA = 0 (sending PC to ALU input)
• ALUSrcB = 01 (sending 4 to ALU input)
• ALUOp = 00 (to make ALU add)
• PCSource = 00 (to select output of ALU addition as source)
• assert PCWrite to write back to PC

2. Instruction decode and register fetch step

 Read two registers in rs and rt instruction fields and compute

the branch target address with ALU
– A <= Reg[IR[25:21]] Are these operations
– B <= Reg[IR[20:16]] necessary for all
instructions?
– ALUOut <= PC + (sign-extend (IR[15:0])<<2
• ALUSrcA = 0 to select PC as ALU input
• ALUSrcB = 11 to select sign-extended and shifted offset field as ALU
input
• ALUOp = 00 to make ALU add.

3. Execution, memory address computation or branch
completion step

 This is the first cycle during which the datapath operation is

determined by the instruction class
– Memory reference
• ALUOut <= A + sign-extend (IR[15:0])
– Arithmetic-logical instruction (R-type)
• ALUOut <= A op B
– Branch
• if (A==B) PC <= ALUOut
– Jump
• PC <= {PC[31:28], (IR[25:0],2’b00)};

4. Memory access or R-type instruction completion

 During this step, a load or store instruction accesses memory

and an arithmetic-logical instruction writes its result.
– Memory reference:
• MDR <= Memory [ALUOut];
• or Memory [ALUOut] <= B;
– Arithmetic-logical instruction:
• Reg[ IR[15:11] ] <= ALUOut;

5. Memory read completion step

 During this step, loads complete by writing back the value

from memory
– Reg [IR[20:16]] <= MDR;

Summary of steps taken to execute any instruction class

Defining the control

 We use state machine to describe the operation of MIPS

Figure 5.32. FSM for instruction fetch and decode

Figure 5.33. FSM for controlling memory-reference instructions

Figure 5.34. FSM for R-type instruction

Figure 5.35. FSM for branch instruction

Figure 5.36. FSM for jump instruction

Complete FSM

4. CPU Organization

 4.1. Building a datapath

 4.2. Single cycle implementation
 4.3. Multicycle implementation
 4.4. Exceptions

4.4. Exceptions

 Exception:
– Is an unexpected event from within the processor
– Ex: arithmetic overflow, using an undefined instruction...

 Interrupt:
– Is an event that causes an unexpected change in control flow but
comes from outside of the processor.
– Interrupts are used by IO devices to communicate with the processor.
– Ex: timer interrupt...

How exceptions are handled?

 Two types of exceptions which our current implementation can generate:

– Execution of an undefined instruction
– Execution of an arithmetic overflow

 Basic action of machine when an exception occurs:

– Save address of offending instruction in the Exception Program Counter (EPC).
– Transfer control to the OS at some specific address.
• The OS can then take appropriate action which may involve providing some
service to the user program, taking predefined action in response to and
overflow, or stopping the execution of program and reporting errors.
• Then the OS can terminate the program or may continue its execution, using
EPC to determine the address to restart the program.

How exceptions are handled?

 We can implement the processing required for exception by

adding a few extra registers and control signals to our basic
implementation and by slightly extending the FSM.
– Two additional registers:
• EPC: 32-bit register used to hold address of the affected instruction
• Cause: 32-bit register used to record the cause of exception
– Cause register = 0: undefined instruction
– Cause register = 1: arithmetic overflow
– Two control signals used to cause EPC and Cause register to be
written: EPCWrite, CauseWrite.
– One-bit signal to set value 0/1 to Cause register.
– Write exception address of handling code to PC (8000 0180(16))
Page  186 © NTK 2009
The multicycle datapath with exception handling

How control checks for exceptions?

 Undefined instruction exception:

– This is detected when no next state is defined from state 1 for op value
– We handle this exception by defining the next-state value for all op
values rather than lw, sw, 0 (R-type), j and beq as state 10.
 Arithmetic overflow exception:
– The ALU designed includes logic to detect overflow and a signal called
Overflow is provided as an output from ALU. This signal is used to
specify additional possible next state (state 11) for state 7.

FSM
with
exception
detection

Midterm exam (70’)

 Using multicycle datapath implementation of MIPS, explain

the operation of the following instructions:
– a. load/store t1,15(s0)
– b. add/sub/and/or/slt s0,t1,t2
– c. beq t1,t2,Label
– d. jump Label

4. CPU Organization

 4.1. Building a datapath

 4.2. Single cycle implementation
 4.3. Multicycle implementation
 4.4. Exceptions
 4.5. Pipelining

4.5. Pipelining

 Introduction
 Pipelined Datapath
 Pipelined Control
 Pipelined Hazards

4.5. Pipelining

 Introduction
 Pipelined Datapath
 Pipelined Control
 Pipelined Hazards

Introduction

 Pipelining is an implementation technique used to enhance

performance in which multiple instructions are overlapped in
execution.
 The idea of pipelining is: Divide an instruction into smaller
steps which can be executed concurrently.

Introduction

 MIPS instructions classically take five steps:

– 1. Fetch instruction from memory.
– 2. Read registers while decoding the instruction. The format of MIPS
instruction allow reading and decoding to occur simultaneously.
– 3. Execute the operation or calculate the address.
– 4. Access an operand in memory.
– 5. Write the result into a register.

Pipelining

1 2 3 4 5 6 7 8 9 10 11 12

FI DI EX FO WR
Inst 1
Inst 2
Inst 3
Inst 4
Inst 5
Inst 6
Inst 7

Single-cycle versus Pipelined performance

 In this example, we limit our attention to 8 instructions: lw,

sw, add, sub, and, or, slt, beq.

 Total time for each instruction:

Single-cycle versus Pipelined performance

Pipelined performance

 If the stages are perfectly balanced, then the time between

instructions in pipelined processor when the number of
instruction is large is equal to:

4.5. Pipelining

 Introduction
 Pipelined Datapath
 Pipelined Control
 Pipelined Hazards

A pipelined datapath

 The division of an instruction into five stages means a five-

stage pipeline, which in turn means five instructions will be in
execution during any single clock cycle:

Pipelined execution in single-cycle datapath of MIPS

 Because each resource is used during only one of the five stages
of an instruction, allowing it to be shared by other instructions
during the other four stages.
=> To retain the value of an individual instruction for its other four
Page  202 © NTK 2009
stages, the value must be saved in a register.
Pipelined datapath of MIPS

Registers
Registers must be wide enough to store all data corresponding to the lines
Page  203 © NTK 2009
that go through them. IF/ID:64, ID/EX:128, EX/MEM:97,MEM/WB:64
How are portions of datapath used during an
instruction?
– Example of a load instruction

IF: first stage of an instruction

ID: second stage of an instruction

EX: third stage of an instruction

MEM: forth stage of an instruction

WB: fifth stage of an instruction

Write register number needed to be saved for WB stage
Corrected datapath to handle a load instruction

4.5. Pipelining

 Introduction
 Pipelined Datapath
 Pipelined Control
 Pipelined Hazards

Pipelined datapath of MIPS with control

 For pipelined datapath, we can divide control lines into five

groups according to the pipeline stage:
– IF: The control signals to read instruction memory and to write the PC
are always asserted, so there is nothing special to control in this
pipeline stage.
– ID: As in the previous stage, the same thing happens at every clock
cycle, so there are no optional control lines to set.
– EX: The signals to be set are RegDst, ALUOp, ALUSrc. The signals
select Result register, the ALU operation, and either read data 2 or a
sign-extended immediate for ALU.
– MEM: The control lines set in this stage are Branch, MemRead,
MemWrite. These signals are set by the branch equal, load, store
instructions.
– WB: two control lines are MemtoReg, which decides between sending
the ALU result or the memory value to the register file, and RegWrite,
which writes the chosen value.
Page  212 © NTK 2009
Pipelined datapath of MIPS with control

9 control lines for final three stages

 Four control lines for EX stage

 Three control lines for MEM stage
 Two control lines for WB stage
Page  214 © NTK 2009
The pipelined datapath with control signals connected

Designing Instruction sets for pipelining

 The design of MIPS was designed for pipeline execution:

– 1. All MIPS instructions are the same length.
» Easy to fetch instruction in 1st stage and decode it in 2nd stage.
– 2. MIPS only has a few instruction formats, with the source register fields being
located in the same place in each instruction.
» In 2nd stage, register file can be read at the same time that hardware is
deciding what type of instruction was fetched.
– 3. Memory operands only appear in load and store in MIPS.

– 4. Operands are aligned in memory.

– RISC – Reduced Instruction Set Computer

– CISC – Complex Instruction Set Computer

4.5. Pipelining

 Introduction
 Pipelined Datapath
 Pipelined Control
 Pipelined Hazards

Pipeline Hazards

 There are situations in pipelining when the next instruction

can not execute in the following clock cycle => hazards.
 Three types of hazards:
– Structural Hazards
– Data Hazards
• Forwarding
• Stall
– Control Hazards

Hazards

 Structural Hazards
 Data Hazards
– Forwarding
– Stall
 Control Hazards

Structural Hazards

 Structural Hazards occur when the hardware cannot support

the combination of instructions that we want to execute in the
same clock cycle.
 MIPS is designed to avoid structural harzards when designing
a pipeline.
– Ex: suppose we have a single memory instead of two mem (instruction
memory + data memory)

Conflict when
both read
memory

Hazards

 Structural Hazards
 Data Hazards
– Forwarding
– Stall
 Control Hazards

Data Hazards

 Data Hazards occur when the pipeline must be stalled

because one step must wait for another to complete.
– In a computer pipeline, data hazards arise from the dependence of one
instruction on an ealier one that is still in pipeline.
add s0, t0, t1
sub t2, s0, t3
# add instruction doesn’t write its result to s0 until fifth stage
# sub instruction read s0 register in second stage
 Solution:
– Observation: we don’t need to wait for instruction to complete before
trying to resolve the data hazards. For code above, as soon as the ALU
creates sum for addition, we can supply it as input for substraction.
– Adding extra hardware to to retrieve missing item early from the internal
Page  222
resources is called forwading or bypassing.
© NTK 2009
Representation of instruction pipeline

 IF: Instruction fetch

 ID: Instruction decode + register file read
 EX: Execution
 MEM: Memory access
 WB: Write back

 The shading indicates the element is used by instruction

– MEM: white => add doesn’t access memory
– Shading on the right half: Read
–  223
Page Shading on the left half: Write © NTK 2009
Hazards

 Structural Hazards
 Data Hazards
– Forwarding
– Stall
 Control Hazards

Data hazards and forwarding

 Instead of waiting until the fifth stage of add instruction for the
result in s0 register, forwarding uses extra hardware to write
the output of EX stage of add instruction to the input of EX
stage for sub instruction. (extra hardware to create connection)

Data hazards and forwarding

 Forwarding paths are valid if only the destination stage is

later in time than the source stage.
– Ex: lw s0,20(t1)
sub t2,s0,t3
there cannot be a valid forwarding path from the output of memory access
stage in the first instruction to the input of the execution stage of the
following , since that would mean going backward in time.

pipeline
stall

Data hazards and forwarding

pipeline
stall

 Pipeline stall or bubble is added to solve data hazards when

an R-format instruction following a load tries to use the data.

Reordering code to avoid pipeline stalls

Forwarding

 Forwarding yields another insight into the MIPS architecture.

Each MIPS instruction writes at most one result and does so
near the end of the pipeline. Forwarding is harder if there are
multiple results to forward per instruction or they need to write
a result early on in instruction execution.

 Notes:
– The name “forwarding” comes from the idea that the result is passed
forward from an earlier instruction to a later instruction. “Bypassing”
comes from passing the result by register file to the desired unit.

ALU and pipeline registers without forwarding

ALU and pipeline registers with forwarding

Datapath modified to resolve hazards via forwarding

Hazards

 Structural Hazards
 Data Hazards
– Forwarding
– Stall
 Control Hazards

Data hazards and Stalls

Forwarding can not resolve

this program because the
destination stage is ealier in
time than the source stage.

Harzards detection unit used for stalls

 Harzards detection unit is used to solve hazard when an

instruction tries to read a register following a load instruction
that writes the same register.
 Checking for load instruction:
– if (ID/EX.MemRead and load instruction or not?
the destination register
(( ID/EX.RegisterRt = IF/ID.RegisterRs) or field of load instruction
in EX stage matches
( ID/EX.RegisterRt = IF/ID.RegisterRt))) either source register
of the instruction in ID
stall the pipeline stage

Hazards

 Structural Hazards
 Data Hazards
– Forwarding
– Stall
 Control Hazards

Control Hazards (Branch Hazards)

 Control hazards or branch hazards occur when the proper

instruction can not execute in the proper clock cycle because
the instruction that was fetched is not the one that is needed,
that is the flow of instruction addresses is not what the
pipeline expected.

 In branch instruction:
– We need to fetch the instruction following the branch on the next clock
cycle to allow pipeline.
– But the pipeline cannot possibly know what the next instruction should
be, since it only just received the branch instruction from memory.

Impact of pipeline on the branch instruction

if branch is taken,
we need to discard
(flush) these instructions

Solution 1: Stall

 If we can move branch decision up earlier, we have less delay.

 Assume that we put in enough extra hardware so that we can test
registers, calculate the branch address, and update PC during the second
stage of pipeline. Even with this highly costed extra hardware, we still
have stall:
– lw instruction, executed if the branch fails, is stalled one extra 200ps clock
cycle before starting.
Page  242 The cost of ©extra
NTK 2009hw for most computer is too high
Solution 2: Branch Prediction

 To solve branch hazards, we use branch prediction:

– One simple approach is to always predict that branches will be
undertaken.
• When branches are correctly undertaken, the pipeline proceeds at
full speed.
• When branches are taken, we have some stalls.

Branch prediction is solution to control hazards

Branch prediction

 A more sophisticated version of branch prediction would have

some branches predicted as undertaken, and some as taken:
– For usual branch like previous example, we predict branch as
undertaken.
– For branches at the loops, branches are usually jump back to the top of
loop. So branches are predicted as taken.
– Dynamic hardware predictor:
• May change predictions for a branch over the life of a program.
• Keeping a history table for each branch as taken or not taken, the
using the past behavior to predict the future.

Solution 3: Delayed decision (used by MIPS)

 The delayed branch always executes the next sequential

instruction, with branch taking place after that one instruction
delay.
 Compiler and assembler try to place an instruction that
always executes after the branch in the branch delay slot.

add t1,t2,t4 beq s0,s1,LABEL

beq s0,s1,LABEL add t1,t2,t4
or t5,t6,t7 or t5,t6,t7
LABEL: LABEL:
.... ....

Scheduling the branch delay slot

Pipeline summary

 We have seen three models of execution: single cycle,

multicycle, and pipelined.
 Pipelined control strives for 1 clock cycle per instruction, like
single cycle, but also for a fast clock cycle, like multicycle.

FINISH

ITWS02
No ratings yet
ITWS02
330 pages
Onur Comparch Fall2017 Lecture3 Afterlecture
No ratings yet
Onur Comparch Fall2017 Lecture3 Afterlecture
219 pages
CAO - M01 - Introduction To Computer Architecture and Organization
No ratings yet
CAO - M01 - Introduction To Computer Architecture and Organization
100 pages
Oodp Unit 1
No ratings yet
Oodp Unit 1
217 pages
Linux Unit III (1)
No ratings yet
Linux Unit III (1)
74 pages
computer hardware
No ratings yet
computer hardware
227 pages
Java Script Part1 PPT-Unit2 MSD
No ratings yet
Java Script Part1 PPT-Unit2 MSD
135 pages
Lis 211 Quiz
No ratings yet
Lis 211 Quiz
270 pages
Comparch 04
No ratings yet
Comparch 04
73 pages
Storage and File Structure
No ratings yet
Storage and File Structure
104 pages
COPA
No ratings yet
COPA
469 pages
Operating System - Unit 1
No ratings yet
Operating System - Unit 1
145 pages
Distributed System PDF
No ratings yet
Distributed System PDF
148 pages
Module 1 - Lecture - 7CSE1
No ratings yet
Module 1 - Lecture - 7CSE1
124 pages
Lab20 - Understanding Table Storage - Azure
No ratings yet
Lab20 - Understanding Table Storage - Azure
22 pages
Unit5 Cryptography
No ratings yet
Unit5 Cryptography
156 pages
DSA Unit-5
No ratings yet
DSA Unit-5
230 pages
ME 157 Full Course
No ratings yet
ME 157 Full Course
203 pages
PHP Basics SF - 15 04 24
No ratings yet
PHP Basics SF - 15 04 24
118 pages
21 Mongo DB
No ratings yet
21 Mongo DB
104 pages
Vue Js
No ratings yet
Vue Js
70 pages
Lab9 - Understanding Managed Disks - Azure
No ratings yet
Lab9 - Understanding Managed Disks - Azure
33 pages
2024 - FCJ - Week 1 - Addons
No ratings yet
2024 - FCJ - Week 1 - Addons
146 pages
Lab16 - Understanding Zone Redundant Storage (ZRS) - Azure
No ratings yet
Lab16 - Understanding Zone Redundant Storage (ZRS) - Azure
17 pages
Python
No ratings yet
Python
323 pages
Digital Literacy New
No ratings yet
Digital Literacy New
311 pages
Mongo DB
No ratings yet
Mongo DB
297 pages
Cse - 2014 Se Module 2 V1
No ratings yet
Cse - 2014 Se Module 2 V1
154 pages
Unit2_WT
No ratings yet
Unit2_WT
204 pages
Php BCSFinalPn
No ratings yet
Php BCSFinalPn
140 pages
Operating Systems Concepts
No ratings yet
Operating Systems Concepts
147 pages
Lab7 - Understanding Features of Network Security Group - Azure
No ratings yet
Lab7 - Understanding Features of Network Security Group - Azure
88 pages
Devops Sheet
No ratings yet
Devops Sheet
286 pages
Mct702 All Units
No ratings yet
Mct702 All Units
747 pages
UNIT - 3 - OS Theory
No ratings yet
UNIT - 3 - OS Theory
158 pages
1-3
No ratings yet
1-3
184 pages
Road Traffic Rules Republic of Lithuania With Annexes 2020-01-10
No ratings yet
Road Traffic Rules Republic of Lithuania With Annexes 2020-01-10
145 pages
FALLSEM2024-25 BCSE324L TH VL2024250101403 2024-07-16 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE324L TH VL2024250101403 2024-07-16 Reference-Material-I
141 pages
Introduction To Laravel: Presenter: Mohammad Adil
No ratings yet
Introduction To Laravel: Presenter: Mohammad Adil
105 pages
Module 1
No ratings yet
Module 1
156 pages
CDD Aws Storage 2022 05 25
No ratings yet
CDD Aws Storage 2022 05 25
116 pages
Module - 2
No ratings yet
Module - 2
130 pages
PC2102-Module-1-4 2
No ratings yet
PC2102-Module-1-4 2
146 pages
Unit 1 - Fundamental of OOP - Final
No ratings yet
Unit 1 - Fundamental of OOP - Final
122 pages
Information Technology 2
No ratings yet
Information Technology 2
147 pages
DBMS
No ratings yet
DBMS
334 pages
Day 1 - Intro To DSA
No ratings yet
Day 1 - Intro To DSA
20 pages
Chapter 1
No ratings yet
Chapter 1
46 pages
My Laravel PDF 1708710166
No ratings yet
My Laravel PDF 1708710166
123 pages
WT UNIT-IV
No ratings yet
WT UNIT-IV
201 pages
Cpe 112 - Intro To Computer Engineering
No ratings yet
Cpe 112 - Intro To Computer Engineering
155 pages
Week-4 Lecture Notes
No ratings yet
Week-4 Lecture Notes
57 pages
togaf-v91-m6-architecture-content-framework
No ratings yet
togaf-v91-m6-architecture-content-framework
15 pages
PHP Chapter 2
No ratings yet
PHP Chapter 2
259 pages
Module 1
No ratings yet
Module 1
111 pages
Os Full Slides
No ratings yet
Os Full Slides
368 pages
Chapter 1
No ratings yet
Chapter 1
325 pages
ITT 05103 - 2023mimi Internet Programming-1
No ratings yet
ITT 05103 - 2023mimi Internet Programming-1
142 pages
PP2 Curriculum Design
No ratings yet
PP2 Curriculum Design
221 pages
IT3280 ThuchanhKTMT
No ratings yet
IT3280 ThuchanhKTMT
47 pages
ESP Pro Active Monitoring
No ratings yet
ESP Pro Active Monitoring
50 pages
E Commerce Ch02
No ratings yet
E Commerce Ch02
35 pages
Operating System Chapter-00
No ratings yet
Operating System Chapter-00
9 pages
Chapter 1 Part A: Data Communications and Networks Overview
No ratings yet
Chapter 1 Part A: Data Communications and Networks Overview
79 pages
CWNA Guide To Wireless LAN's Second Edition - Chapter 5
100% (1)
CWNA Guide To Wireless LAN's Second Edition - Chapter 5
58 pages
CWNA Guide To Wireless LAN's Second Edition - Chapter 8
100% (1)
CWNA Guide To Wireless LAN's Second Edition - Chapter 8
41 pages
Session 02: Control Statements & Storage Specifiers
No ratings yet
Session 02: Control Statements & Storage Specifiers
32 pages
System Software Lab Manual
No ratings yet
System Software Lab Manual
31 pages
ASSEMBLER
No ratings yet
ASSEMBLER
11 pages
05 Basic Computer Part2
No ratings yet
05 Basic Computer Part2
141 pages
Microprocessor Slide
No ratings yet
Microprocessor Slide
60 pages
Mpi V Sem It Guess Paper Solutions
No ratings yet
Mpi V Sem It Guess Paper Solutions
52 pages
MCQ of Coa
0% (1)
MCQ of Coa
22 pages
It Is The Mark of An Educated Mind To Be Able To Entertain A Thought Without Accepting It. Aristotle
No ratings yet
It Is The Mark of An Educated Mind To Be Able To Entertain A Thought Without Accepting It. Aristotle
29 pages
2 - Computer Architecture
No ratings yet
2 - Computer Architecture
45 pages
Z80.Instruction Set
No ratings yet
Z80.Instruction Set
212 pages
MIC U-III (Instruction Set of 8086) PDF
No ratings yet
MIC U-III (Instruction Set of 8086) PDF
109 pages
03 Assembler (UPDATED)
100% (1)
03 Assembler (UPDATED)
12 pages
Assembler Module 1-1
No ratings yet
Assembler Module 1-1
23 pages
CS501-Mid Term Solved MCQs With References by Moaaz
0% (1)
CS501-Mid Term Solved MCQs With References by Moaaz
16 pages
Solution ST1 Set A
No ratings yet
Solution ST1 Set A
12 pages
Implementation of Pass One of A Two
No ratings yet
Implementation of Pass One of A Two
3 pages
CH 02.machine Structure Machine Language Assembly Language
No ratings yet
CH 02.machine Structure Machine Language Assembly Language
11 pages
8086 Microprocessor MASM Programs
100% (2)
8086 Microprocessor MASM Programs
9 pages
Oops With JAVA PDF
No ratings yet
Oops With JAVA PDF
335 pages
Assembly Language Programming and Addressing Modes
No ratings yet
Assembly Language Programming and Addressing Modes
27 pages
Primitive Graphics
No ratings yet
Primitive Graphics
22 pages
8051 MC Note
No ratings yet
8051 MC Note
26 pages
PIC 18F452 Instruction Set
No ratings yet
PIC 18F452 Instruction Set
51 pages
Chapter 3-Instruction Cycle
No ratings yet
Chapter 3-Instruction Cycle
8 pages
M1 SSCD
No ratings yet
M1 SSCD
72 pages
Unit 3
No ratings yet
Unit 3
14 pages
UNIT 1.6 - Arithmetic Instructions
No ratings yet
UNIT 1.6 - Arithmetic Instructions
24 pages
Wa0030 PDF
No ratings yet
Wa0030 PDF
6 pages
System Software Notes 5TH Sem Vtu
89% (19)
System Software Notes 5TH Sem Vtu
40 pages
Chapter 4
No ratings yet
Chapter 4
17 pages
How To Draw Timing Diagram
No ratings yet
How To Draw Timing Diagram
6 pages