Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Lect2 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 51

Instructions: Language of the Machine

Instructions: Overview

Language of the machine


More primitive than higher level languages, e.g., no sophisticated
control flow such as while or for loops
Very restrictive
e.g., MIPS arithmetic instructions
Well be working with the MIPS instruction set architecture
inspired most architectures developed since the 80's
used by NEC, Nintendo, Silicon Graphics, Sony
the name is not related to millions of instructions per second !
it stands for microcomputer without interlocked pipeline
stages !
Design goals: maximize performance and minimize cost and

reduce design time

MIPS Arithmetic

All MIPS arithmetic instructions have 3 operands


Operand order is fixed (e.g., destination first)

Example:

C code:

A = B + C

MIPS code:

add $s0, $s1, $s2

compilers job to associate


variables with registers

MIPS Arithmetic

Design Principle 1: simplicity favors regularity.


Translation: Regular instructions make for simple hardware!

Simpler hardware reduces design time and manufacturing cost.


Of course this complicates some things...
C code:

A = B + C + D;
E = F - A;

MIPS code
(arithmetic):

add $t0, $s1, $s2


add $s0, $t0, $s3
sub $s4, $s5, $s0

Allowing variable number


of operands would
simplify the assembly
code but complicate the
hardware.

Performance penalty: high-level code translates to denser


machine code.

MIPS Arithmetic

Operands must be in registers only 32 registers provided

(which require 5 bits to select one register). Reason for small


number of registers:

Design Principle 2: smaller is faster.

Why?

Electronic signals have to travel further on a physically larger chip


increasing clock cycle time.
Smaller is also cheaper!

Registers vs. Memory

Arithmetic instructions operands must be in registers

MIPS has 32 registers

Compiler associates variables with registers


What about programs with lots of variables (arrays, etc.)? Use
memory, load/store operations to transfer data from memory to
register if not enough registers spill registers to memory

MIPS is a load/store architecture

Control

Input
Memory

Datapath
Processor

Output
I/O

Memory Organization

Viewed as a large single-dimension array with access by address


A memory address is an index into the memory array
Byte addressing means that the index points to a byte of
memory, and that the unit of memory accessed by a load/store
is a byte
0
1

8 bits of data

8 bits of data

3
4
5
6

8 bits of data

...

8 bits of data

8 bits of data
8 bits of data
8 bits of data

Memory Organization

Bytes are load/store units, but most data items use larger words
For MIPS, a word is 32 bits or 4 bytes.

32 bits of data

4
8
12

32 bits of data

Registers correspondingly hold 32 bits of data

32 bits of data
32 bits of data

...

232 bytes with byte addresses from 0 to 232-1


230 words with byte addresses 0, 4, 8, ... 232-4

i.e., words are aligned


what are the least 2 significant bits of a word address?

Load/Store Instructions

Load and store instructions


Example:
C code:

A[8] = h + A[8];

MIPS code
(load):
(arithmetic):
(store):

value

offset

address

lw $t0, 32($s3)
add $t0, $s2, $t0
sw $t0, 32($s3)

Load word has destination first, store has destination last


Remember MIPS arithmetic operands are registers, not memory
locations

therefore, words must first be moved from memory to registers


using loads before they can be operated on; then result can be
stored back to memory

A MIPS Example

Can we figure out the assembly code?


swap(int v[], int k);
{ int temp;
temp
= v[k];
v[k]
= v[k+1];
v[k+1] = temp;
}

$5=k
$4=Base
swap:
muli
add
lw
lw
sw
sw
jr

address of v[]
$2,
$2,
$15,
$16,
$16,
$15,
$31

$5,
4
$4,
$2
0($2)
4($2)
0($2)
4($2)

So far weve learned:

MIPS

loading words but addressing bytes


arithmetic on registers only

Instruction

Meaning

add $s1, $s2, $s3


sub $s1, $s2, $s3
lw $s1, 100($s2)
sw $s1, 100($s2)

$s1 = $s2 + $s3


$s1 = $s2 $s3
$s1 = Memory[$s2+100]
Memory[$s2+100]= $s1

Machine Language

Instructions, like registers and words of data, are also 32 bits long

Example: add $t0, $s1, $s2

registers are numbered, e.g., $t0 is 8, $s1 is 17, $s2 is 18

Instruction Format R-type (R for aRithmetic):


000000 10001 10010 01000 00000
op

opcode
operation

6 bits

rs

first
register
source
operand

5 bits

rt

rd

shamt

100000
funct

second
register
source
operand

register shift
destin- amount
ation
operand

function field selects variant


of operation

5 bits

5 bits

6 bits

5 bits

MIPS Encoding: R-Type


31

26 25

opcode

21 20

rs

16 15

rt

11 10

rd

shamt

funct

rd
rt

add $4, $3, $2


rs
31

26 25

21 20

16 15

11 10

0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
opcode

rs

rt

rd

shamt

funct

0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0

Encoding = 0x00622020
13

Machine Language

Consider the load-word and store-word instructions,

what would the regularity principle have us do?

we would have only 5 or 6 bits to determine the offset from a base


register - too little

Design Principle 3: Good design demands a compromise


Introduce a new type of instruction format

I-type (I for Immediate) for data transfer instructions


Example: lw $t0, 1002($s2)

100011 10010
6 bits

5 bits

op

rs

01000
5 bits

rt

0000001111101010
16 bits

16 bit offset

MIPS Encoding: I-Type


31

26 25

opcode

21 20

rs

16 15

rt

Immediate Value

rt

Immediate
lw $5, 3000($2)
rs
31

26 25

21 20

16 15

1 0 0 0 1 1 0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 1 1 1 0 1 1 1 0 0 0
opcode

rs

rt

Immediate Value

1 0 0 0 1 1 0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 1 1 1 0 1 1 1 0 0 0

Encoding = 0x8C450BB8
15

MIPS Encoding: I-Type


31

26 25

opcode

21 20

rs

16 15

rt

Immediate Value

rt

Immediate
sw $5, 3000($2)
rs
31

26 25

21 20

16 15

1 0 1 0 1 1 0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 1 1 1 0 1 1 1 0 0 0
opcode

rs

rt

Immediate Value

1 0 1 0 1 1 0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 1 1 1 0 1 1 1 0 0 0

Encoding = 0xAC450BB8
The immediate value is signed

16

Stored Program Concept

Instructions are bit sequences, just like data

Programs are stored in memory

to be read or written just like data

Processor

Memory
memory for data, programs,
compilers, editors, etc.

Fetch & Execute Cycle

instructions are fetched and put into a special register


bits in the register control the subsequent actions (= execution)
fetch the next instruction and repeat

SPIM the MIPS simulator

SPIM (MIPS spelt backwards!) is a MIPS simulator that

reads MIPS assembly language files and translates to machine


language

executes the machine language instructions


shows contents of registers and memory
works as a debugger (supports break-points and single-stepping)
provides basic OS-like services, like simple I/O

SPIM is freely available on-line

Memory Organization:
Big/Little Endian Byte Order

Bytes in a word can be numbered in two ways:

byte 0 at the leftmost (most significant) to byte 3 at the rightmost


(least significant), called big-endian 0 1 2 3
byte 3 at the leftmost (most significant) to byte 0 at the rightmost
(least significant), called little-endian 3 2 1 0
Big-endian
Memory

Little-endian
Memory

Byte 0 Byte 1 Byte 2 Byte 3 Word 0

Byte 3 Byte 2 Byte 1 Byte 0 Word 0

Byte 4 Byte 5 Byte 6 Byte 7 Word 1

Byte 7 Byte 6 Byte 5 Byte 4 Word 1

Control: Conditional Branch

Decision making instructions

alter the control flow,

i.e., change the next instruction to be executed

MIPS conditional branch instructions:

bne $t0, $t1, Label


beq $t0, $t1, Label
000100 01000 01001

Example:

I-type instructions

0000000000011001

if (i==j) h = i + j;

bne $s0, $s1, Label


add $s3, $s0, $s1
Label:
....

beq $t0, $t1, Label


(= addr.100)
word-relative addressing:
25 words = 100 bytes;
also PC-relative (more)

Addresses in Branch

Instructions:
bne $t4,$t5,Label
beq $t4,$t5,Label

Format:
I

Next instruction is at Label if $t4 != $t5


Next instruction is at Label if $t4 = $t5

op

rs

rt

16 bit offset

16 bits is too small a reach in a 232 address space


Solution: specify a register (as for lw and sw) and add it to
offset

use PC (= program counter), called PC-relative addressing, based


on
principle of locality: most branches are to instructions near current
instruction (e.g., loops and if statements)

Addresses in Branch

Further extend reach of branch by observing all MIPS


instructions are a word (= 4 bytes), therefore word-relative
addressing:
MIPS branch destination address = (PC + 4) + (4 * offset)
Because hardware typically increments PC early
in execute cycle to point to next instruction

so offset = (branch destination address PC 4)/4

BEQ/BNE uses I-Type


31

26 25

opcode

21 20

rs

16 15

rt

Signed Offset Value


(encoded in words, e.g. 4-bytes)

rs

Offset
Encoded by
40/4 = 10

beq $0, $9, 40


rt
31

26 25

21 20

16 15

0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0
opcode

rs

rt

Immediate Value

0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0

Encoding = 0x1009000A
23

Control: Unconditional Branch (Jump)

MIPS unconditional branch instructions:


j Label

Example:
if (i!=j)
h=i+j;
else
h=i-j;

beq $s4, $s5, Lab1


add $s3, $s4, $s5
j Lab2
Lab1: sub $s3, $s4, $s5
Lab2: ...

J-type (J for Jump) instruction format

Example: j Label # addr. Label = 100

000010

00000000000000000000011001

6 bits

26 bits

op

26 bit number

word-relative
addressing:

25 words = 100 bytes

Addresses in Jump

Word-relative addressing also for jump instructions


J

op

26 bit address

MIPS jump j instruction replaces lower 28 bits of the PC with


A00 where A is the 26 bit address; it never changes upper 4 bits

Example: if PC = 1011X (where X = 28 bits), it is replaced with


1011A00
there are 16(=24) partitions of the 232 size address space, each
partition of size 256 MB (=228), such that, in each partition the upper
4 bits of the address is same.
if a program crosses an address partition, then a j that reaches a
different partition has to be replaced by jr with a full 32-bit address
first loaded into the jump register
therefore, OS should always try to load a program inside a single
partition

Constants

Small constants are used quite frequently (50% of operands)


e.g.,
A = A + 5;
B = B + 1;
C = C - 18;
Solutions? Will these work?

create hard-wired registers (like $zero) for constants like 1


put program constants in memory and load them as required

MIPS Instructions:
addi $29, $29, 4
slti $8, $18, 10
andi $29, $29, 6
ori $29, $29, 4

How to make this work?

Immediate Operands

Make operand part of instruction itself!

Design Principle 4: Make the common case fast

Example: addi $sp, $sp, 4 # $sp = $sp + 4


001000
6 bits

op

11101

11101

5 bits

5 bits

rs

rt

0000000000000100
16 bits

16 bit number

How about larger constants?

First we need to load a 32 bit constant into a register


Must use two instructions for this: first new load upper immediate
instruction for upper 16 bits
lui $t0, 1010101010101010
filled with zeros
1010101010101010

ori

0000000000000000

Then get lower 16 bits in place:


ori $t0, $t0, 1010101010101010
1010101010101010

0000000000000000

0000000000000000

1010101010101010

1010101010101010

1010101010101010

Now the constant is in place, use register-register arithmetic

So far

Instruction

Format

add $s1,$s2,$s3
sub $s1,$s2,$s3
lw $s1,100($s2)
sw $s1,100($s2)
bne $s4,$s5,Lab1
beq $s4,$s5,Lab2
j Lab3

R
R
I
I
I
I
J

Meaning
$s1 = $s2 + $s3
$s1 = $s2 $s3
$s1 = Memory[$s2+100]
Memory[$s2+100] = $s1
Next instr. is at Lab1 if $s4 != $s5
Next instr. is at Lab2 if $s4 = $s5
Next instr. is at Lab3

Formats:
R

op

rs

rt

rd

op

rs

rt

16 bit address

op

shamt

26 bit address

funct

Logical Operations
Shift Logical Left (SLL $S1,$S2,10)
Shift Logical Right (SRL $S1,$S2,10)
AND (AND $S1,$S2,$S3)

OR (OR $S1,$S2,$S3)
NOR (NOR $S1,$S2,$S3)

ANDI (ANDI $S1,$S2,100)


ORI (ORI $S1,$S2,100)

Shift Operations

rs

rt

rd

shamt

funct

6 bits

5 bits

5 bits

5 bits

5 bits

6 bits

shamt: how many positions to shift


Shift left logical

op

Shift left and fill with 0 bits


sll by i bits multiplies by 2i

Shift right logical

Shift right and fill with 0 bits


srl by i bits divides by 2i (unsigned only)

AND Operations

Useful to mask bits in a word

Select some bits, clear others to 0

and $t0, $t1, $t2


$t2 0000 0000 0000 0000 0000 1101 1100 0000
$t1 0000 0000 0000 0000 0011 1100 0000 0000
$t0 0000 0000 0000 0000 0000 1100 0000 0000

OR Operations

Useful to include bits in a word

Set some bits to 1, leave others unchanged

or $t0, $t1, $t2


$t2 0000 0000 0000 0000 0000 1101 1100 0000
$t1 0000 0000 0000 0000 0011 1100 0000 0000
$t0 0000 0000 0000 0000 0011 1101 1100 0000

NOT Operations

Useful to invert bits in a word

Change 0 to 1, and 1 to 0

MIPS has NOR 3-operand instruction

a NOR b == NOT ( a OR b )

nor $t0, $t1, $zero

Register 0:
always read as
zero

$t1 0000 0000 0000 0000 0011 1100 0000 0000


$t0 1111 1111 1111 1111 1100 0011 1111 1111

Control Flow

We have: beq, bne. What about branch-if-less-than?

New instruction:
slt $t0, $s1, $s2

$s1 < $s2 then


$t0 = 1
else
$t0 = 0

Can use this instruction to build blt $s1, $s2, Label

if

how? We generate more than one instruction pseudo-instruction


can now build general control structures
Slti is also available

The assembler needs a register to manufacture instructions


from pseudo-instructions
There is a convention (not mandatory) for use of registers

Branch Instruction Design

Why not blt, bge, etc?


Hardware for <, , slower than =,

Combining with branch involves more work


per instruction, requiring a slower clock
All instructions penalized!

beq and bne are the common case


This is a good design compromise

Signed vs. Unsigned

Signed comparison: slt, slti


Unsigned comparison: sltu, sltui
Example

$s0 = 1111 1111 1111 1111 1111 1111 1111 1111


$s1 = 0000 0000 0000 0000 0000 0000 0000 0001
slt $t0, $s0, $s1 # signed

1 < +1 $t0 = 1

sltu $t0, $s0, $s1

# unsigned

+4,294,967,295 > +1 $t0 = 0

Policy-of-Use Convention for Registers


Name Register number
$zero
0
$v0-$v1
2-3
$a0-$a3
4-7
$t0-$t7
8-15
$s0-$s7
16-23
$t8-$t9
24-25
$gp
28
$sp
29
$fp
30
$ra
31

Usage
the constant value 0
values for results and expression evaluation
arguments
temporaries
saved
more temporaries
global pointer
stack pointer
frame pointer
return address

Register 1, called $at, is reserved for the assembler; registers 26-27,


called $k0 and $k1 are reserved for the operating system.

Assembly Language vs. Machine Language

Assembly provides convenient symbolic representation

Machine language is the underlying reality

e.g., destination is no longer first

Assembly can provide pseudo-instructions

much easier than writing down numbers


regular rules: e.g., destination first

e.g., move $t0, $t1 exists only in assembly


would be implemented using add $t0, $t1, $zero

When considering performance you should count actual number


of machine instructions that will execute

Procedure Calling

Steps required
1.
2.
3.
4.
5.
6.

Place parameters in registers


Transfer control to procedure
Acquire storage for procedure
Perform procedures operations
Place result in register for caller
Return to place of call

Procedure Call Instructions

Procedure call: jump and link


jal ProcedureLabel
Address of following instruction put in $ra
Jumps to target address

Procedure return: jump register


jr $ra
Copies $ra to program counter
Can also be used for computed jumps

e.g., for case/switch statements

Using a Frame Pointer


High address
$fp

$fp

$sp

$sp
$fp

Saved argument
registers (if any)
Saved return address
Saved saved
registers (if any)

Local arrays and


structures (if any)
$sp

Low address
a.

b.

c.

Variables that are local to a procedure but do not fit into registers (e.g., local arrays, structures, etc.) are also stored in the stack. This area of the stack is the frame. The frame pointer
$fp points to the top of the frame and the stack pointer to the bottom. The frame pointer does
not change during procedure execution, unlike the stack pointer, so it is a stable base
register from which to compute offsets to local variables.
Use of the frame pointer is optional. If there are no local variables to store in the stack it is
not efficient to use a frame pointer.

Byte/Halfword Operations

Could use bitwise operations


MIPS byte/halfword load/store

String processing is a common case

lb rt, offset(rs)

Sign extend to 32 bits in rt

lbu rt, offset(rs)

lhu rt, offset(rs)

Zero extend to 32 bits in rt

sb rt, offset(rs)

lh rt, offset(rs)

sh rt, offset(rs)

Store just rightmost byte/halfword

MIPS Addressing Modes


1. Immediate addressing
op

rs

rt

Immediate

2. Register addressing
op

rs

rt

rd

...

funct

Registers
Register

3. Base addressing
op

rs

rt

Memory

Address

Register

Byte

Halfword

4. PC-relative addressing
op

rs

rt

Memory

Address

PC

Word

5. Pseudodirect addressing
op

Address

PC

Memory

Word

Word

Overview of MIPS

Simple instructions all 32 bits wide


Very structured no unnecessary baggage
Only three instruction formats

op

rs

rt

rd

shamt

op

rs

rt

16 bit address

op

26 bit address

funct

Summarize MIPS:
MIPS operands
Name

Example
Comments
$s0-$s7, $t0-$t9, $zero, Fast locations for data. In MIPS, data must be in registers to perform
arithmetic. MIPS register $zero always equals 0. Register $at is
32 registers $a0-$a3, $v0-$v1, $gp,
$fp, $sp, $ra, $at
reserved for the assembler to handle large constants.
Memory[0],
30

Accessed only by data transfer instructions. MIPS uses byte addresses, so

2 memory Memory[4], ...,

sequential words differ by 4. Memory holds data structures, such as arrays,

words

and spilled registers, such as those saved on procedure calls.

Memory[4294967292]

add

MIPS assembly language


Example
Meaning
add $s1, $s2, $s3
$s1 = $s2 + $s3

Three operands; data in registers

subtract

sub $s1, $s2, $s3

$s1 = $s2 - $s3

Three operands; data in registers

$s1 = $s2 + 100


$s1 = Memory[$s2 + 100]
Memory[$s2 + 100] = $s1
$s1 = Memory[$s2 + 100]
Memory[$s2 + 100] = $s1

Used to add constants

Category

Arithmetic

Instruction

addi $s1, $s2, 100


lw $s1, 100($s2)
load word
sw $s1, 100($s2)
store word
lb $s1, 100($s2)
load byte
sb $s1, 100($s2)
store byte
load upper immediate lui $s1, 100
add immediate

Data transfer

Conditional
branch

Unconditional jump

$s1 = 100 * 2

16

Comments

Word from memory to register


Word from register to memory
Byte from memory to register
Byte from register to memory
Loads constant in upper 16 bits

branch on equal

beq

$s1, $s2, 25

if ($s1 == $s2) go to
PC + 4 + 100

Equal test; PC-relative branch

branch on not equal

bne

$s1, $s2, 25

if ($s1 != $s2) go to
PC + 4 + 100

Not equal test; PC-relative

set on less than

slt

$s1, $s2, $s3

if ($s2 < $s3) $s1 = 1;


else $s1 = 0

Compare less than; for beq, bne

set less than


immediate

slti

jump

j
jr
jal

jump register
jump and link

$s1, $s2, 100 if ($s2 < 100) $s1 = 1;

Compare less than constant

else $s1 = 0

2500
$ra
2500

Jump to target address


go to 10000
$ra
For switch, procedure return
go to
$ra = PC + 4; go to 10000 For procedure call

Alternative Architectures

Design alternative:

provide more powerful operations

goal is to reduce number of instructions executed

danger is a slower cycle time and/or a higher CPI

Sometimes referred to as R(educed)ISC vs. C(omplex)ISC

A dominant architecture:
80x86

1978: The Intel 8086 is announced (16 bit architecture)


1980: The 8087 floating point coprocessor is added
1982: The 80286 increases address space to 24 bits,
+instructions
1985: The 80386 extends to 32 bits, new addressing modes
1989-1995: The 80486, Pentium, Pentium Pro add a few
instructions (mostly designed for higher performance)
1997: MMX is added

A dominant architecture: 80x86

Complexity

instructions from 1 to 17 bytes long


one operand must act as both a source and destination
one operand may come from memory
several complex addressing modes

Saving grace:

the most frequently used instructions are not too difficult to build
compilers avoid the portions of the architecture that are slow

Summary

Instruction complexity is only one variable

lower instruction count vs. higher CPI / lower clock rate

Design Principles:

simplicity favors regularity


smaller is faster
good design demands compromise
make the common case fast

You might also like