Lect3 ISAReview PDF

Lecture-3
Instruction Set Architecture (ISA)

Classifications and Addressing Modes
Recap
EE/CS520- Comp. Archi.
9/6/2012
Whats Performance?
Two common measures
Latency (Total time taken to do task X)

Also called response time and execution time
Interesting for a desktop user (generally)
Throughput (how often can it do X in a given time)

Interesting for a large data processing center admin
9/6/2012
Measuring Performance
Benchmarks
Real applications and application suites
E.g., SPEC CPU2000, SPEC2006, TPC-C. etc.
Kernels
Key pieces of real applications
Easier and quicker to set up and run
Often not really representative of the entire app
Toy programs, synthetic benchmarks, etc.
Not very useful for reporting
Sometimes used to test/stress specific functions/features
Synthetic benchmarks
Fake programs designed to imitate real applications
Last 3 are discredited as they can be conspired to
4
make a product faster !!
9/6/2012
Amdahls Law
Speedup =
Execution Time without Enhancement Execution Time old

=
Execution Time with Enhancement
Execution Time new
What if enhancement does not enhance everything?

Speedup =
Execution Time new
Execution Time without using Enhancement at all

Execution Time using Enhancement when Possible
Fraction Enhanced
= Execution Time old (1 Fraction Enhanced ) +
Speedup Enhanced
Overall Speedup =
Fraction Enhanced
(1 Fraction Enhanced ) +
Speedup Enhanced
9/6/2012
Price Vs. Performance
9/6/2012
CPU Performance: Example 2

Freq. of FP ops = 25%
CPIavg of FP ops = 4.0
CPIavg of other ops = 1.33

Freq. of FPSqr = 2%
CPI of FPSqr = 20
Either
decrease CPI of FPSqr = 2

decrease CPI of FP ops = 2.5
Result: Option-2 is better than option-1

7
EE/CS-520: Comp. Archi.
9/6/2012
Price vs. Performance Trade-Off

Without optimized FPSqr
System costs PKR. 40,000 to manufacture
Selling price is PKR. 55,000 15K profit per system

If we sell 10,000 systems, thats PKR. 150M in profit
With FPSqr
System costs extra PKR. 10,000
Selling price is PKR. 70,000 20K profit per system

But only a few people care for buying that system:
We only sell 4000 systems and make PKR.80M in profit
9/6/2012

How much effective performance do I get out of it?
10x speedup for small fraction of instructions isnt that efficient
How much more do I have to invest in it?
R&D, testing, marketing costs
How much more can I charge for it?
Does the market even care?
How does the price change affect the volume?
9/6/2012
10
($3346)
($3099)
($2907)
($5201)
($2145)
9/6/2012
Instruction Set Architecture
11
9/6/2012
Classes of Computer
Desktop Computing
First and still the largest market in monetary terms

Well organized in terms of applications and benchmarks
Price vs. Performance is the most critical comparison
Servers
large-scale & more reliable computing services

Dependability
Scalability
Computational capacity, storage, I/O bandwidth, memory
Embedded Computing
The fastest growing portion of computing market

Numerous applications, stringent constraints:
Power consumption, price, memory (area), response time
12
9/6/2012
Instruction Set Architecture

ISA is the portion of computer visible to the
programmer or the compiler writer

Basic domains of computer applications
Desktop computing
Concentrates on integer and floating point (FP) ops.
Little regards to program size or power
Servers
Concentrates on integer ops and character strings
Embedded computing
Targets code size (memory footprint) and power
FP ops can be omitted if not-needed
13
ISAs for all three are pretty similar

Mostly MIPS serves for all of them
9/6/2012
Hybrid ISAs
Example : 80x86 (CISC) and RISC
Pentium 4 uses HW to translate 80x86 into RISC

Programmer writes an 80x86 program code
(externally)
Processor executes RISC insts (internally)
14
9/6/2012
ISA Classifications
Stack based
Accumulator based
Register-based
Register-Register (aka Load-Store) based

Register-Memory based
Memory-Memory based
15
9/6/2012
Stack based ISA

Implicit operands on Top of the Stack (ToS)
B,C
A
Output
Input
C= A+B
PUSH A
PUSH B
ADD
POP C
16
9/6/2012
Accumulator based ISA

One implicit operand is accumulator itself
A,C
Output
Input
C=A+B
B
17
Load A
Add B
# mem to accum
Store C # accum to mem
9/6/2012
Register-Memory based ISA

R3
R1=A
Explicit operands are used

Output
Input
C=A+B
Load R1, A
18
Add R3, R1, B

Store R3, C
9/6/2012
Register-Register based ISA

R3=C
R1=A
R2=B
Explicit operands are used
Code C=A+B
Load R1, A
Load R2, B
Add R3, R1, R2

Store R3, C
19
9/6/2012
Whats the Popular Choice?

Register-based architecture (aka GPR* architecture)
Load-Store (Register-Register)
Virtually every architecture since 1980
Why?
Registers are internal to processor, so faster than memory
Registers can hold variables
Once the variables are loaded in regs, memory traffic is reduced
Program code density improves, as regs can be named with
fewer bits than memory
o e.g. 32 regs (encoded in 5-bits) while 128MB memory
(encoded in 28-bits)
20
*GPR = General Purpose Register
9/6/2012
GPR-Architecture (1)
Two major ISA characteristics
No. of operands supported by ALU (2 or 3)

Add R1, R2
# R1 is both src1 and dst
Add R1, R2, R3 # R1 is dst, R2&R3 are src
How many operands may be memory addresses
# of
operands
# of mem adr.
Load-Store
Mem-Mem
2
2
21
1
2
Archi. type
Reg-Mem
Mem-Mem
Examples
Alpha, ARM, MIPS, PowerPC
IBM 360/370, Intel 80x86

VAX (obsolete)
VAX (obsolete)
9/6/2012
GPR-Architecture (2)
Type
(#mem, #ops)
Reg-Reg
(0,3)
Reg-Mem
(1,2)
Mem-Mem
(2,2) or (3,3)
22
Advantages
Disadvantages
Simple, Fixed-length, Equal CPI Higher IC, larger program size
Extra load inst not needed,

easy to encode, good code
density
Most compact
Operands are not equivalent since

one op is destroyed, CPI vary by
operand location
Large variations in inst size, large
CPI, memory bottleneck
9/6/2012
Memory Addressing (1)

How memory addresses are interpreted
Little Endian
Byte at xx000 is put at least significant position
7
Big Endian
Byte at xx000 is put at most significant position
0
Little Endian ordering fails when dealing with

23
character strings, strings are organized in Big Endian

fashion
9/6/2012

Accesses to the objects larger than a byte must be aligned
An access to an object of s bytes at byte address A is said to
be aligned when A mod s = 0
E.g. a 32-bit (4B) object has to be placed at an address that is
completely divisible by 4 i.e. 0x2000, 0x2004, and not 0x2001
Complications with misaligned memory access??
Memories are aligned on multiple of word or double word
boundaries
A misaligned reference is inefficient as it needs multiple
aligned memory reference to implement a single access
24
9/6/2012

width
1B
Aligned Aligned
2B (HW)
2B (HW)
4B (W)
1
Aligned
4B (W)
4B (W)
4B (W)
8B(DW)
8B(DW)
8B(DW)
8B(DW)
8B(DW)
8B(DW)
25
8B(DW)
8B(DW)
Aligned
Aligned
Aligned
Aligned
Aligned
Aligned
Misaligned
Aligned
Aligned
Misaligned
Misaligned
Aligned
Misaligned
Misaligned
Aligned
Misaligned
Aligned
Misaligned
Misa
Misaligned
Aligned
Misaligned
Misa
Misaligned
Misaligned
Misaligned
Misaligned
Misaligned
Misa
9/6/2012
Addressing Modes (1)
26
9/6/2012
27
Addressing modes can reduce the Inst Count by generating a complex inst,
however they can increase the resultant CPI and hardware complexity
9/6/2012

Immediate and
Displacement are the
dominant modes
28
Memory addressing mode frequency on VAX for three different program code
9/6/2012
Displacement Mode
add R1, 100(R2)
Percentage of displacement
Displacement values are widely distributed
29
No. of bits of displacement
9/6/2012
Immediate Mode (1)

Widely used in arithmetic ops
In comparisons for example
cmp R1, #400
Also in moves
When constant value is needed in a reg.

Both constants written in the code and address constants
mov R1, #400
30
9/6/2012
Immediate Mode (2)
31
About of Loads and ALU ops use immediate mode for

integer programs, an overall 1/5 of all instructions
9/6/2012
Percentage of immediate
Immediate Mode (3)
32
Small imm. values are mostly

used. Large imm. values are
seldom used
No. of bits needed for immediate
9/6/2012
Operations in the ISA

Operator type
Examples
Arith. Or Logical
Integer arithmetic: add, sub, and, or, shift
Data transfer
Control
Branch, jump, procedure call, procedure return
Decimal
Decimal add, decimal mul, decimal-to-character

conversion
System
FP
String
Graphics
33
Load, store
OS call, virtual memory management inst

FP ops: add, sub, mul
String move, string compare, string search
Pixels and vertex operations, (de)compression
9/6/2012
Top 10 instructions in 80x86
34
9/6/2012
Inst Frequency
Occurrence of RISC Instructions
load
35
store
add
sub
or
and
xor
sl
sr
Different RISC Insts
mult
div
sqrt
9/6/2012
RISC Machines
90 10 Rule :
By profiling the program performance, we note that
Only 10% of instructions are used 90% of the time
90% of unused instructions are costly in time and silicon area
Idea : we limit the number of instructions in an ISA
To those that are most frequently used

We will carry out the complex instructions by combination of simple
instructions
Instructions executable in 1 cycle
higher clock frequency
Multiple instructions
in 1 cycle
36
1 complex instruction
in n cycles
Processor RISC
Reduced Instruction Set Computer
9/6/2012
Assignment -1: Review of Assembly Language

Assignment:
Some C codes to be converted into MIPS assembly codes

Will be available on LMS under assignments tab on Thursday, 6th
September 2012 (today) by 12:00 pm
Helping material:
A PDF about basics of assembly language and C->MIPS conversion

Will be available on LMS under reading material
You can discuss it with TA during tutorial slot on Friday
Submission Deadline: Thursday, 13th September 2012 12:00 pm

Late assignments: 25% marks deduction per day
Submission Format: Hard copy to TA or me in my office
37
9/6/2012

Lect3 ISAReview PDF

Uploaded by

Copyright:

Available Formats

Lect3 ISAReview PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lect3 ISAReview PDF

Uploaded by

Copyright:

Available Formats

Lecture-3

Instruction Set Architecture (ISA)

EE/CS520- Comp. Archi.

Latency (Total time taken to do task X)

Throughput (how often can it do X in a given time)

EE/CS520- Comp. Archi.

make a product faster !!

EE/CS520- Comp. Archi.

Execution Time without Enhancement Execution Time old

What if enhancement does not enhance everything?

Execution Time new

Execution Time without using Enhancement at all

EE/CS520- Comp. Archi.

Price Vs. Performance

EE/CS520- Comp. Archi.

CPU Performance: Example 2

CPIavg of other ops = 1.33

decrease CPI of FPSqr = 2

Result: Option-2 is better than option-1

EE/CS-520: Comp. Archi.

Price vs. Performance Trade-Off

System costs PKR. 40,000 to manufacture

Selling price is PKR. 55,000 15K profit per system

System costs extra PKR. 10,000

Selling price is PKR. 70,000 20K profit per system

EE/CS520- Comp. Archi.

Price vs. Performance Trade-Off

How does the price change affect the volume?

EE/CS520- Comp. Archi.

Price vs. Performance Trade-Off

EE/CS520- Comp. Archi.

Instruction Set Architecture

EE/CS520- Comp. Archi.

First and still the largest market in monetary terms

large-scale & more reliable computing services

The fastest growing portion of computing market

EE/CS520- Comp. Archi.

Instruction Set Architecture

programmer or the compiler writer

ISAs for all three are pretty similar

EE/CS520- Comp. Archi.

Pentium 4 uses HW to translate 80x86 into RISC

EE/CS520- Comp. Archi.

Register-Register (aka Load-Store) based

EE/CS520- Comp. Archi.

Stack based ISA

EE/CS520- Comp. Archi.

Accumulator based ISA

EE/CS520- Comp. Archi.

Store C # accum to mem

Register-Memory based ISA

Explicit operands are used

EE/CS520- Comp. Archi.

Add R3, R1, B

Register-Register based ISA

Explicit operands are used

Add R3, R1, R2

EE/CS520- Comp. Archi.

Whats the Popular Choice?