Lect3 ISAReview PDF
Lect3 ISAReview PDF
Lect3 ISAReview PDF
Recap
9/6/2012
Whats Performance?
Two common measures
9/6/2012
Measuring Performance
Benchmarks
Real applications and application suites
E.g., SPEC CPU2000, SPEC2006, TPC-C. etc.
Kernels
Key pieces of real applications
Easier and quicker to set up and run
Often not really representative of the entire app
Toy programs, synthetic benchmarks, etc.
Not very useful for reporting
Sometimes used to test/stress specific functions/features
Synthetic benchmarks
Fake programs designed to imitate real applications
Last 3 are discredited as they can be conspired to
4
9/6/2012
Amdahls Law
Speedup =
Fraction Enhanced
= Execution Time old (1 Fraction Enhanced ) +
Speedup Enhanced
Overall Speedup =
Fraction Enhanced
(1 Fraction Enhanced ) +
Speedup Enhanced
9/6/2012
9/6/2012
9/6/2012
With FPSqr
9/6/2012
9/6/2012
10
($3346)
($3099)
($2907)
($5201)
($2145)
9/6/2012
11
9/6/2012
Classes of Computer
Desktop Computing
Servers
Embedded Computing
12
9/6/2012
Desktop computing
Concentrates on integer and floating point (FP) ops.
Little regards to program size or power
Servers
Concentrates on integer ops and character strings
Embedded computing
Targets code size (memory footprint) and power
FP ops can be omitted if not-needed
13
9/6/2012
Hybrid ISAs
Example : 80x86 (CISC) and RISC
(externally)
Processor executes RISC insts (internally)
14
9/6/2012
ISA Classifications
Stack based
Accumulator based
Register-based
Memory-Memory based
15
9/6/2012
Output
Input
C= A+B
PUSH A
PUSH B
ADD
POP C
16
9/6/2012
Output
Input
C=A+B
B
17
Load A
Add B
# mem to accum
9/6/2012
R1=A
C=A+B
Load R1, A
18
9/6/2012
Code C=A+B
Load R1, A
Load R2, B
19
9/6/2012
Why?
Registers are internal to processor, so faster than memory
Registers can hold variables
Once the variables are loaded in regs, memory traffic is reduced
Program code density improves, as regs can be named with
fewer bits than memory
o e.g. 32 regs (encoded in 5-bits) while 128MB memory
(encoded in 28-bits)
20
9/6/2012
GPR-Architecture (1)
Two major ISA characteristics
# of
operands
# of mem adr.
Load-Store
Mem-Mem
2
2
21
1
2
Archi. type
Reg-Mem
Mem-Mem
Examples
9/6/2012
GPR-Architecture (2)
Type
(#mem, #ops)
Reg-Reg
(0,3)
Reg-Mem
(1,2)
Mem-Mem
(2,2) or (3,3)
22
Advantages
Disadvantages
9/6/2012
Little Endian
Byte at xx000 is put at least significant position
7
Big Endian
Byte at xx000 is put at most significant position
0
9/6/2012
boundaries
A misaligned reference is inefficient as it needs multiple
aligned memory reference to implement a single access
24
9/6/2012
1B
Aligned Aligned
2B (HW)
2B (HW)
4B (W)
1
Aligned
4B (W)
4B (W)
4B (W)
8B(DW)
8B(DW)
8B(DW)
8B(DW)
8B(DW)
8B(DW)
25
8B(DW)
8B(DW)
Aligned
Aligned
Aligned
Aligned
Aligned
Aligned
Misaligned
Aligned
Aligned
Misaligned
Misaligned
Aligned
Misaligned
Misaligned
Aligned
Misaligned
Aligned
Misaligned
Misa
Misaligned
Aligned
Misaligned
Misa
Misaligned
Misaligned
Misaligned
Misaligned
Misaligned
Misa
9/6/2012
26
9/6/2012
27
Addressing modes can reduce the Inst Count by generating a complex inst,
however they can increase the resultant CPI and hardware complexity
9/6/2012
28
Memory addressing mode frequency on VAX for three different program code
9/6/2012
Displacement Mode
Percentage of displacement
29
9/6/2012
Also in moves
30
9/6/2012
31
9/6/2012
Percentage of immediate
32
9/6/2012
Examples
Arith. Or Logical
Data transfer
Control
Decimal
System
FP
String
Graphics
33
Load, store
9/6/2012
34
9/6/2012
Inst Frequency
load
35
store
add
sub
or
and
xor
sl
sr
mult
div
sqrt
9/6/2012
RISC Machines
90 10 Rule :
By profiling the program performance, we note that
36
1 complex instruction
in n cycles
Processor RISC
9/6/2012
Helping material:
9/6/2012