Computer Architecture Kin Truc May Tinh
Computer Architecture Kin Truc May Tinh
Text Books:
– Computer architecture - Behrooz Parhami
– Computer organization and design - John L. Hennessy & David A.
Patterson
Reference books
– Computer Architecture and Organization - John P. Hayes
– Computer Organization and Architecture – Designing for Performance -
William Stallings
– Computer Architecture: Single and Parallel Systems - Mehdi R.
Zargham
– Assembly Language Programming and Organization of the IBM-PC -
Ytha Yu, Charles Marut
Page 3 © NTK 2009
About
Lecturer:
– Nguyen Thanh Kien
– Department of Computer Engineering, Faculty of Information
Technology, Hanoi University of Technology
– Mobile: +84 983588135
– Email: kiennt-fit@mail.hut.edu.vn
thanhkien84@yahoo.com
– Address:
• Room 322, C1, Hanoi University of Technology
• No.1, Dai Co Viet, Hai Ba Trung, Hanoi.
...
Loc 0 Loc 4 Loc 8 m 2 32
4 B / location Memory
up to 2 30 words Loc Loc
m 8 m 4
...
$31 $31
Integer FP
ALU
mul/div arith
Hi Lo
TMU BadVaddr Trap &
(Coproc. 0) Status memory
Cause unit
Chapter Chapter Chapter EPC
10 11 12
Byte =Byte
8 bits
Halfword= 2 bytes
Halfword
Word =Word
4 bytes
Doubleword = 8 bytes
Doubleword
Register Register
Instruction file Data cache file
cache (not used)
P $17 ALU
C $18
$24
op rs rt rd sh fn
31 25 20 15 10 5 0
R 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
Opcode Source Source Destination Shift Opcode
register 1 register 2 register amount extension
op rs rt operand / offset
31 25 20 15 0
I 6 bits 5 bits 5 bits 16 bits
Opcode Source Destination Immediate operand
or base or data or address offset
The arithmetic instructions add and sub have a format that is common
to all two-operand ALU instructions. For these, the fn field specifies the
arithmetic/logic operation to be performed.
op rs rt operand / offset
31 25 20 15 0
I 1 0 x 0 1 1 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0
lw = 35 Base Data Offset relative to base
sw = 43 register register
0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Content of $s0 after the instruction is executed
The lui instruction allows us to load an arbitrary 16-bit value into the
upper half of a register while setting its lower half to 0s.
Example
Show how each of these bit patterns can be loaded into $s0:
0010 0001 0001 0000 0000 0000 0011 1101
1111 1111 1111 1111 1111 1111 1111 1111
Solution
The first bit pattern has the hex representation: 0x2110003d
lui $s0,0x2110 # put the upper half in $s0
ori $s0, $s0, 0x003d# put the lower half in $s0
Same can be done, with immediate values changed to 0xffff
for the second bit pattern. But, the following is simpler and faster:
nor $s0,$zero,$zero # because (0 0) = 1
x x x x 0 0 0 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
From PC
Effective target address (32 bits)
op rs rt rd sh fn
31 25 20 15 10 5 0
R 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
ALU Source Unused Unused Unused jr = 8
instruction register
op rs rt operand / offset
31 25 20 15 0
I 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1
bltz = 1 Source Zero Relative branch distance in words
op rs rt operand / offset
31 25 20 15 0
I 0 0 0 1 0 x 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1
beq = 4 Source 1 Source 2 Relative branch distance in words
bne = 5
op rs rt rd sh fn
31 25 20 15 10 5 0
R 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 1 1 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0
ALU Source 1 Source 2 Destination Unused slt = 42
instruction register register
op rs rt operand / offset
31 25 20 15 0
I 0 0 1 0 1 0 1 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1
slti = 10 Source Destination Immediate operand
Example
Show a sequence of MiniMIPS instructions corresponding to:
if (i<=j) x = x+1; z = 1; else y = y–1; z = 2*z
Solution
Example
The simple while loop: while (A[i]==k) i=i+1;
Assuming that: i, A, k are stored in $s1,$s2,$s3
Solution
Example
The simple switch beq s1,t0,case_0
switch(test) { beq s1,t1,case_1
case 0: beq s1,t2,case_2
a=a+1; break; b default
case 1: case_0:
a=a-1; break; addi s2,s2,1 #a=a+1
case 2: b continue
b=2*b; break; case_1:
default: sub s2,s2,t1 #a=a-1
} b continue
case_2:
add s3,s3,s3 #b=2*b
Assuming that: test,a,b are b continue
stored in $s1,$s2,$s3 default:
continue:
Page 29 © NTK 2009
2.1.6 Addressing Modes
Addressing mode is the method by which the location of an operand
is specified within an instruction. MiniMIPS uses sixs addr modes:
Addressing Instruction Other elements involved Operand
Immediate Extend,
if required addi, ori…
Reg spec Reg file Reg data
Register
Base
Constant offset Mem
lw, sw…
Reg base Reg Add addr Mem
Reg file data Memory data
Pseudodirect PC Mem j
addr Memory Mem
data
Page 30 © NTK 2009
Schematic representation of addressing modes in MiniMIPS.
Finding the Maximum Value in a List of Integers
Example
List A is stored in memory beginning at the address given in $s1.
List length is given in $s2.
Find the largest integer in the list and copy it into $t0.
Solution
lw $t0,0($s1) # initialize maximum to A[0]
addi $t1,$zero,0 # initialize index i to 0
loop: add $t1,$t1,1 # increment index i by 1
beq $t1,$s2,done # if all elements examined,
quit
add $t2,$t1,$t1 # compute 2i in $t2
add $t2,$t2,$t2 # compute 4i in $t2
add $t2,$t2,$s1 # form address of A[i] in $t2
lw $t3,0($t2) # load value of A[i] into $t3
slt $t4,$t0,$t3 # maximum < A[i]?
beq $t4,$zero,loop# if not, repeat with no
change
addi $t0,$t3,0 # if so, A[i] is the
new maximum
j loop # change completed; now repeat
Page 31done: ... # continuation of the program
© NTK 2009
The 20 MiniMIPS Instructions
Covered So Far Instruction Usage op fn
Copy Load upper immediate lui rt,imm 15
Add add rd,rs,rt 0 32
Subtract sub rd,rs,rt 0 34
Arithmetic Set less than slt rd,rs,rt 0 42
Add immediate addi rt,rs,imm 8
Set less than immediate slti rd,rs,imm 10
AND and rd,rs,rt 0 36
OR or rd,rs,rt 0 37
XOR xor rd,rs,rt 0 38
Logic NOR nor rd,rs,rt 0 39
AND immediate andi rt,rs,imm 12
OR immediate ori rt,rs,imm 13
XOR immediate xori rt,rs,imm 14
Load word lw rt,imm(rs) 35
Memory access Store word sw rt,imm(rs) 43
Jump j L 2
Jump register jr rs 0 8
Control transfer Branch less than 0 bltz rs,L 1
Branch equal beq rs,rt,L 4
Table 5.1 Branch not equal bne rs,rt,L 5
main
Prepare
to call
PC jal proc
Prepare
to continue proc
Save, etc.
Restore
jr $ra
Example
Procedure to find the absolute value of an integer.
$v0 |($a0)|
Solution
The absolute value of x is –x if x < 0 and x otherwise.
abs: sub $v0,$zero,$a0 # put -($a0) in $v0;
# in case ($a0) < 0
bltz $a0,done # if ($a0)<0 then done
add $v0,$a0,$zero # else put ($a0) in $v0
done: jr $ra # return to calling program
main
Prepare
to call
PC jal abc Procedure
Prepare abc
to continue abc Procedure
Save xyz
xyz
jal xyz
Restore
Text version jr $ra jr $ra
is incorrect
sp b
a
Push c Pop x
sp c
b b
a sp a x = mem[sp]
sp = sp – 4
sp = sp + 4
mem[sp] = c
Text segment
Program
63 M words
10000000
Addressable Static data
with 16-bit 10008000
signed offset Data segment
1000ffff
Dynamic data
7ffffffc
$sp z
Local y Frame for
..
variables . current
Saved procedure
registers
Old ($fp)
$sp c $fp c
b Frame for b Frame for
a current a previous
.. procedure .. procedure
. .
$fp
Saving $fp, $ra, and $s0 onto the stack and restoring
them at the end of the procedure
Unsigned 0010 1011 43 0000 0000 0000 0000 0000 0000 0010 1011
Unsigned 1010 1011 171 0000 0000 0000 0000 0000 0000 1010 1011
Signed 0010 1011 +43 0000 0000 0000 0000 0000 0000 0010 1011
Signed 1010 1011 –85 1111 1111 1111 1111 1111 1111 1010 1011
e SO RS . > N ^ n ~
f SI US / ? O _ o DEL
op rs rt immediate / offset
31 25 20 15 0
I 1 0 x x 0 0 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
lb = 32 Base Data Address offset
lbu = 36 register register
sb = 40
Bit pattern 0000 0010 0001 0001 0100 0000 0010 0000
(02114020) hex
00000010000100010100000000100000
Add instruction
00000010000100010100000000100000
Positive integer
00000010000100010100000000100000
Four-character string
Index: Use a register that holds the index i and increment the register in
each step to effect moving from element i of the list to element i + 1
Pointer: Use a register that points to (holds the address of) the list element
being examined and update it in each step to point to the next element
max x y
last
last last y x
Start of iteration Maximum identified End of iteration
A A A
first first first
In $a0 max
Inputs to
x
Outputs from
y
op rs rt rd sh fn
31 25 20 15 10 5 0
R 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 x 0
ALU Unused Unused Destination Unused mfhi = 16
instruction register mflo = 18
op rs rt rd sh fn
31 25 20 15 10 5 0
R 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 x 0
ALU Amount Source Destination Unused sllv = 4
instruction register register register srlv = 6
Set less than immediate slti rd,rs,imm Multiply unsigned multu rs,rt
XOR immediate xori rt,rs,imm Shift left logical variable sllv rd,rt,rs
1. Introduction
2. Unsigned Numbers
3. Signed Number
4. Addition and Subtraction
5. Multiplication
6. Division
7. Floating Point
N(10) an .bn an1.bn1 ... a1.b1 a0 .b0 a1.b1 ... am .bm
n
N (10) a .b
i m
i
i
Decimal:
– b=10
– Digits: 0,1,2,3,4,5,6,7,8,9
– Eg:
N (10) a .10
i m
i
i
539.45(10) = 5x102+3x101+9x100+4x10-1+5x10-2
Binary:
– b=2
bit – binary digit
– Digits: 0,1
n
N (10) a .2
i m
i
i
– Eg:
1011.011(2) = 1x23 + 0x22 + 1x21 + 1x20 + 0x2-1 + 1x2-2 + 1x2-3
Page 64 © NTK 2009
Number Representation
Binary (cnt’)
– n-bit binary number can represent which range?
• an-1...a1a0 from 0 to 2n-1
Octal:
– b=8
N (8) an an 1...a1a0 .a1a 2 ...am
– Digits: 0,1,2,3,4,5,6,7
ai = 0..7
– Eg:
503.071(8) = 5x82 + 0x81 + 3x80 + 0x8-1 + 7x8-2 + 1x8-3
N(10) an .bn an1.bn1 ... a1.b1 a0 .b0 a1.b1 ... am .bm
Eg:
– 1010.11(2)= 1x23+0x22+1x21+0x20+1x2-1+1x2-2=10.75(10)
– 1010.11(8)=?
– A12(16)=?
Page 67 © NTK 2009
Convert from base 10 to base b
1 0 6.625(10) = 110.101(2)
Eg:
37A.B(16)=?(2)
1. Introduction
2. Unsigned Numbers
3. Signed Number
4. Addition and Subtraction
5. Multiplication
6. Division
7. Floating Point
Value of A: A an 1 2 n 1 an 2 2 n 2 ... a1 21 a0 20
n 1
A ai 2i
i 0
Range of representation:
– Use n bit to represent 2’s complement numbers
– Range: 0 => 2n-1
1. Introduction
2. Unsigned Numbers
3. Signed Number
4. Addition and Subtraction
5. Multiplication
6. Division
7. Floating Point
n2
Value of A: A an 1 2 n 1
ai 2i
i 0
Range of representation:
– Use n bit to represent 2’s complement numbers
– Range: -2n-1 => 2n-1-1
+10 + (-10) = ?
Eg:
-5 11111011 -1 11111111
00000100 00000000
1 1
5 00000101 1 00000001
1. Introduction
2. Unsigned Numbers
3. Signed Number
4. Addition and Subtraction
5. Multiplication
6. Division
7. Floating Point
Overflow
– Occur when the result of addition is out of range of representation (the
result can not be stored in the predefined number of bits)
– Eg:
X = 1001 0110 = 150 X = 1100 0101 = 197
Y = 0001 0011 = 19 Y = 0100 0110 = 70
S = 1010 1001 = 169 S = 0000 1011=11 267
Cout = 0 Cout = 1 carry-out
Overflow
– Occur when the result of addition is out of range of representation (the
result can not be stored in the predefined number of bits)
– Occur when?
• Add two numbers of the opposite sign?
• Add two positive numbers?
• Add two negative numbers?
Principle:
– Subtraction is addition of negative number.
a – b = a + (-b)
Eg: 7 – 5 = ?
5 0101 7 0111
1010 -5 +1011
+ 1 2 0010
-5 1011
1. Introduction
2. Unsigned Numbers
3. Signed Number
4. Addition and Subtraction
5. Multiplication
6. Division
7. Floating Point
1. Introduction
2. Unsigned Numbers
3. Signed Number
4. Addition and Subtraction
5. Multiplication
6. Division
7. Floating Point
1. Introduction
2. Unsigned Numbers
3. Signed Number
4. Addition and Subtraction
5. Multiplication
6. Division
7. Floating Point
63 62 52 51 0
S e m
79 78 64 63 0
S e m
Page 113 © NTK 2009
IEEE 754 standard
31 30 23 22 0
S e m
63 62 52 51 0
S e m
79 78 64 63 0
S e m
s=1 e=130 m
Solution:
X = (-1)S x 1.m x 2e-127
= (-1)0 x 1.0 x 2127-127
=1
underflow
overflow overflow
¥ -b -a -0 +0 a b ¥
op rs rt rd sh fn
31 25 20 15 10 5 0
R 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
Opcode Source Source Destination Shift Opcode
register 1 register 2 register amount extension
op rs rt operand / offset
31 25 20 15 0
I 6 bits 5 bits 5 bits 16 bits
Opcode Source Destination Immediate operand
or base or data or address offset
3
1
2
Page 132 © NTK 2009
Abstract view of a basic MIPS implementation – v1.0
Benefits:
– Simple, easy to understand
– Single-cycle datapath
Requirement:
– Instruction memory and data memory are separate because:
• Format of data and instruction is different
• Having separate memories is less expensive
• The processor operates in one cycle and cannot use single-portet
memory for two different access within that cycle.
lw t1, offset_value(t2)
sw t1, offset_value(t2)
1. Compute memory address by adding the base register t2
with 16-bit sign extended field offset_value.
2. Read data from register t1 to write to calculated address in
data memory
beq t1,t2,offset
Compute branch target address by adding sign-extended
offset to PC
– Two notes in the definition of branch instruction:
• The instruction set architecture specifies that the base for branch
address calculation is the address of the instruction following the
branch. So we can always compute PC+4 in fetching period.
• The offset field is shifted left 2 bits so that it’s a word offset.
Use ALU to evaluate branch condition:
– Read two registers t1, t2 from register file to inputs of ALU
– Subtract two inputs of ALU, assert control signal
Single datapath:
– Execute every instruction in one clock cycle.
– No resource used more than once per instruction, so any elements
needed more than once must be duplicated.
• Separate memories for instruction and data.
Control signals
are not connected
Page 145 © NTK 2009
4. CPU Organization
ALU Control
Main Control Unit
Depend on the instruction class, one of first five functions of ALU will be
performed (NOR is needed for other parts):
– Load, store instructions: use ALU to compute memory address by addition
– R-type instructions: ALU performs one of five actions (add, sub, and, or, slt)
based on 6-bit function field in the instruction.
– Branch beq: ALU performs subtraction
The opcode is always contained in bits 31:26. We refer to this field as Opcode[5:0]
Two registers to be read (R-type, beq, sw) are always at 25:21 and 20:16
The base register for load and store instruction is always in bit 25:21
16 bit offset for load/store/beq is always in 15:0
The destination register is in one of two places:
– For load, it’s in 20:16
– For R-type, it’s in 15:11
Page 151 © NTK 2009
A simple datapath with all control lines identified
Multicycle implementation:
– Break each instruction into a series of steps corresponding to the
functional unit operations needed.
• Each step in the execution will take one clock cycle. Each instruction
will take different numbers of clock cycles.
• Allow a functional unit to be used more than once per instruction =>
hardware share.
Key elements:
– A shared memory unit for both instructions & data
– A single ALU
– Require additional registers: IR, Memory data register, A,B, ALUout
– Require additional multiplexers© NTK 2009
Page 158
Multicycle implementation
op rs rt operand / offset
31 25 20 15 0
I 6 bits 5 bits 5 bits 16 bits
Opcode Source Destination Immediate operand
or base or data or address offset
ftp://dce.hut.edu.vn/kiennt/
Exception:
– Is an unexpected event from within the processor
– Ex: arithmetic overflow, using an undefined instruction...
Interrupt:
– Is an event that causes an unexpected change in control flow but
comes from outside of the processor.
– Interrupts are used by IO devices to communicate with the processor.
– Ex: timer interrupt...
Introduction
Pipelined Datapath
Pipelined Control
Pipelined Hazards
Introduction
Pipelined Datapath
Pipelined Control
Pipelined Hazards
1 2 3 4 5 6 7 8 9 10 11 12
FI DI EX FO WR
Inst 1
Inst 2
Inst 3
Inst 4
Inst 5
Inst 6
Inst 7
Introduction
Pipelined Datapath
Pipelined Control
Pipelined Hazards
Because each resource is used during only one of the five stages
of an instruction, allowing it to be shared by other instructions
during the other four stages.
=> To retain the value of an individual instruction for its other four
Page 202 © NTK 2009
stages, the value must be saved in a register.
Pipelined datapath of MIPS
Registers
Registers must be wide enough to store all data corresponding to the lines
Page 203 © NTK 2009
that go through them. IF/ID:64, ID/EX:128, EX/MEM:97,MEM/WB:64
How are portions of datapath used during an
instruction?
– Example of a load instruction
Introduction
Pipelined Datapath
Pipelined Control
Pipelined Hazards
Introduction
Pipelined Datapath
Pipelined Control
Pipelined Hazards
Structural Hazards
Data Hazards
– Forwarding
– Stall
Control Hazards
Conflict when
both read
memory
Structural Hazards
Data Hazards
– Forwarding
– Stall
Control Hazards
Structural Hazards
Data Hazards
– Forwarding
– Stall
Control Hazards
Instead of waiting until the fifth stage of add instruction for the
result in s0 register, forwarding uses extra hardware to write
the output of EX stage of add instruction to the input of EX
stage for sub instruction. (extra hardware to create connection)
pipeline
stall
pipeline
stall
Notes:
– The name “forwarding” comes from the idea that the result is passed
forward from an earlier instruction to a later instruction. “Bypassing”
comes from passing the result by register file to the desired unit.
Structural Hazards
Data Hazards
– Forwarding
– Stall
Control Hazards
?
Page 235 © NTK 2009
Stalls are inserted into pipeline
Structural Hazards
Data Hazards
– Forwarding
– Stall
Control Hazards
In branch instruction:
– We need to fetch the instruction following the branch on the next clock
cycle to allow pipeline.
– But the pipeline cannot possibly know what the next instruction should
be, since it only just received the branch instruction from memory.
if branch is taken,
we need to discard
(flush) these instructions