CMSC 611: Advanced Computer Architecture

CMSC 611: Advanced Computer Architecture
Instruction Set Architecture
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science
Lecture Overview
Last Week
Different performance metrics (response time, throughput, CPU time) Performance reports, summary and comparison (Experiment reproducibility, arithmetic and weighted arithmetic means) Widely used benchmark programs (SPEC, Whetstone and Dhrystone) Example industry metrics (e.g. MIPS, MFLOP, etc.)
This Week
Classifications of instruction set architectures Different addressing modes Instruction types, operands and operations
Introduction
software instruction set hardware
To command a computer's hardware, you must speak its language Instructions: the words of a machine's language Instruction set: its vocabulary The MIPS instruction set is used as a case study
Figure: Dave Patterson
Instruction Set Architecture

Once you learn one machine language, it is easy to pick up others:
Common fundamental operations All designer have the same goals: simplify building hardware, maximize performance, minimize cost
Goals:
Introduce design alternatives Present a taxonomy of ISA alternatives
+ some qualitative assessment of pros and cons
Present and analyze some instruction set measurements Address the issue of languages and compilers and their bearing on instruction set architecture Show some example ISAs
Interface Design
A good interface:
Lasts through many implementations (portability, compatibility) Is used in many different ways (generality) Provides convenient functionality to higher levels Permits an efficient implementation at lower levels
Design decisions must take into account:

Technology Machine organization Programming languages Compiler technology Operating systems
use imp 1
Interface
use use imp 2
Time
imp 3
Slide: Dave Patterson
Memory ISAs
Terms
Result = Operand <operation> Operand
Stack
Operate on top stack elements, push result back on stack
Memory-Memory
Operands (and possibly also result) in memory
Register ISAs
Accumulator Architecture
Common in early stored-program computers when hardware was expensive Machine has only one register (accumulator) involved in all math & logic operations Accumulator = Accumulator op Memory
Extended Accumulator Architecture (8086)

Dedicated registers for specific operations, e.g stack and array index registers, added
General-Purpose Register Architecture (MIPS)

Register flexibility Can further divide these into:
Register-memory: allows for one operand to be in memory Register-register (load-store): all operands in registers
ISA Operations
Famous ISA
Stack Memory-Memory Accumulator Architecture Extended Accumulator Architecture General-Purpose Register Architecture
Machine
Motorola 6800 DEC VAX Intel 8086 Motorola 68000 Intel 80386 PowerPC DEC Alpha
# general-purpose registers
2 16 1 16 32 32 32
Architecture style
Accumulator Register-memory, memory-memory Extended accumulator Register-memory Register-memory Load-store Load-store
Year
1974 1977 1978 1980 1985 1992 1992
Other types of Architecture

High-Level-Language Architecture
In the 1960s, systems software was rarely written in high-level languages
virtually every commercial operating system before Unix was written in assembly
Some people blamed the code density on the instruction set rather than the programming language A machine design philosophy advocated making the hardware more like high-level languages The effectiveness of high-level languages, memory size limitation and lack of efficient compilers doomed this philosophy to a historical footnote
Other types of Architecture

Reduced Instruction Set Architecture
With the recent development in compiler technology and expanded memory sizes less programmers are using assembly level coding Drives ISA to favor benefit for compilers over ease of manual programming
RISC architecture favors simplified hardware design over rich instruction set
Rely on compilers to perform complex operations
Virtually all new architecture since 1982 follows the RISC philosophy:
fixed instruction lengths, load-store operations, and limited addressing mode
Compact Code
Scarce memory or limited transmit time (JVM) Variable-length instructions (Intel 80x86)
Match instruction length ot operand specification Minimize code size
Stack machines abandon registers altogether

Stack machines simplify compilers Lend themselves to a compact instruction encoding BUT limit compiler optimization
Evolution of Instruction Sets

Single Accumulator (EDSAC 1950) Accumulator + Index Registers
(Manchester Mark I, IBM 700 series 1953)
Separation of Programming Model from Implementation

High-level Language Based (B5000 1963) Concept of a Family (IBM 360 1964)
General Purpose Register Machines

Complex Instruction Sets (Vax, Intel 432 1977-80) Load/Store Architecture (CDC 6600, Cray 1 1963-76) RISC (MIPS,SPARC,IBM RS6000, . . .1987)
Slide: Dave Patterson
Register-Memory Arch
# memory addresses
0 1 2 3
Max. number of operands

3 2 2 3
Examples
SPARC, MIPS, PowerPC, ALPHA Intel 60X86, Motorola 68000 VAX (also has 3 operands format) VAX (also has 2 operands format)
Effect of the number of memory operands:

Type
Reg-Reg (0,3)
Advantages
- Fixed length instruction encoding - Simple code generation model - Similar execution time (pipeline)
Disadvantages
- Higher instruction count - Some instructions are short leading to wasteful bit encoding - Can restrict # register available for use - Clocks per instr. varies by operand type - Source operands are destroyed - Less potential for compiler optimization - Can create memory access bottleneck
Reg-Mem (1,2)
- Direct access without loading - Easy instruction encoding - No temporary register usage - Compact code
Mem-Mem (3,3)
Memory Addressing
The address of a word matches the byte address of one of its 4 bytes The addresses of sequential words differ by 4 (word size in byte) Words' addresses are multiple of 4 (alignment restriction)
Misalignment (if allowed) complicates memory access and causes programs to run slower
12 100 10 101 1 Data
Object addressed Byte Half word Word Double word
Aligned at byte offsets 1,2,3,4,5,6,7 0,2,4,6 0,4 0
Misaligned at byte offsets Never 1,3,5,7 1,2,3,5,6,7 1,2,3,4,5,6,7

Processor
8 4 0 Address Memory
Byte Order
Given N bytes, which is the most significant, which is the least significant?
Big Endian
Leftmost / most significant byte = word address
Little Endian
Rightmost / least significant byte = word address
Byte ordering can be as problem when exchanging data among different machines Can also affect array index calculation or any other operation that treat the same data a both byte and word.
Addressing Modes
How to specify the location of an operand (effective address) Addressing modes have the ability to:
Significantly reduce instruction counts Increase the average CPI Increase the complexity of building a machine
VAX machine is used for benchmark data since it supports wide range of memory addressing modes Can classify based on:
source of the data (register, immediate or memory) the address calculation (direct, indirect, indexed)
Example of Addressing Modes

Address. mode
Register Immediate Register indirect Direct or absolute Displacement Indexed Autoincrement
Example
ADD R4, R3 ADD R4, #3 ADD R4, (R1) ADD R4, (1001) ADD R4, 100 (R1) ADD R4, (R1 + R2) ADD R4, (R2) +
Meaning
Regs[R4] = Regs[R4] + Regs[R3] Regs[R4] = Regs[R4] + 3 Regs[R4] = Regs[R4] + Mem[Regs[R1] ] Regs[R4] = Regs[R4] + Mem[ 1001 ] Regs[R4] = Regs[R4] + Mem[ 100 + Regs[R1] ] Regs[R4] = Regs[R4] + Mem[Regs[R1] + Regs[R2]] Regs[R4] = Regs[R4] + Mem[Regs[R2] ] Regs[R2] = Regs[R2] + d Regs[R2] = Regs[R2] d Regs[R4] = Regs[R4] + Mem[Regs[R2] ] Regs[R4] = Regs[R4] + Mem[100 + Regs[R2] + Regs[R3] * d]
When used
When a value is in a register For constants Accessing using a pointer or a computed address Sometimes useful for accessing static data; address constant may need to be large Accessing local variables Sometimes useful in array addressing: R1 = base of the array: R2 = index amount Useful for stepping through arrays within a loop. R2 points to start of the array; each reference increments R2 by d. Same use as autoincrement. Autodecrement/increment can also act as push/pop to implement a stack Used to index arrays.
Auto decrement
ADD R4, -(R2)
Scaled
ADD R4, 100 (R2) [R3]
Addressing Mode Use

Focus on immediate and displacement modes since they are used the most
Based on SPEC89 on VAX
Percentage of displacement
The range of displacement supported affects the length of the instruction

Data is based on SPEC2000 on Alpha (only 16 bit displacement allowed)
Displacement Addressing Modes
Number of bits needed for a displacement value in SPEC2000 benchmark
Immediate Addressing Modes

Immediate values for what operations?
Statistics are based on SPEC2000 benchmark on Alpha
Range affects instruction length

Percentage of Immediate Values
Distribution of Immediate Values
Similar measurements on the VAX (with 32-bit immediate values) showed that 20-25% of immediate values were longer than 16-bits
Measurements were taken on Alpha (only 16 bit immediate value allowed)
Number of bits needed for a immediate values in SPEC2000 benchmark
DSP offers special addressing modes to better serve popular algorithms Special features requires either hand coding or a compiler that uses such features
Addressing Mode for Signal Processing
Modulo addressing:
Addressing Mode for Signal Processing

Since DSP deals with continuous data streams, circular buffers common Circular or modulo addressing: automatic increment and decrement / reset pointer at end of buffer Fast Fourier Transform 0 (0002) 0 (0002) 1 (0012) 4 (1002) 2 (0102) 2 (0102) 3 (0112) 6 (1102) 4 (1002) 1 (0012) 5 (1012) 5 (1012) 6 (1102) 3 (0112) 7 (1112) 7 (1112)
Reverse addressing:
Address is the reverse order of the current address Expedites access / otherwise require a number of logical instructions or extra memory accesses
Summary of MIPS Addressing Modes

1. Immediate addressing op rs rt Immediate 2. Register addressing op rs rt rd . .. funct Registers Register
3. Base addressing op rs rt Address Memor y
Register
Byte
Halfword
Word
4. PC-relative addressing op rs rt Address Memor y
PC
Word
5. Pseudodirect addressing op Address
Concatenation
Memor y
PC
Word
Operations of the Computer Hardware

There must certainly be instructions for performing the fundamental arithmetic operations.
Burkes, Goldstine and Von Neumann, 1947
MIPS assembler allows only one instruction/line and ignore comments following # until end of line
Example:
Translation of a segment of a C program to MIPS assembly instructions: C: f = (g + h) - (i + j) t0, g, h t1, i, j f, t0, t1 # temp. variable t0 contains "g + h" # temp. variable t1 contains "i + j" # f = t0 - t1 = (g + h) - (i + j) (pseudo)MIPS: add add sub
Operator type Arithmetic and logical Data Transfer Control System Floating point Decimal String Graphics
Operations in the Instruction Set

Examples Integer arithmetic and logical operations: add, and, subtract , or Loads-stores (move instructions on machines with memory addressing) Branch, jump, procedure call and return, trap Operating system call, Virtual memory management instructions Floating point instructions: add, multiply Decimal add, decimal multiply, decimal to character conversion String move, string compare, string search Pixel operations, compression/decompression operations
Arithmetic, logical, data transfer and control are almost standard categories for all machines System instructions are required for multiprogramming environment although support for system functions varies Others can be primitives (e.g. decimal and string on IBM 360 and VAX), provided by a co-processor, or synthesized by compiler.
Partitioned Add:
Operations for Media & Signal Process.

Partition a single register into multiple data elements (e.g. 4 16-bit words in 1 64-bit register) Perform the same operation independently on each Increases ALU throughput for multimedia applications
Paired single operations

Perform multiple independent narrow operations on one wide ALU (e.g. 2 32-bit float ops) Handy in dealing with vertices and coordinates
Multiply and accumulate

Very handy for calculating dot products of vectors (signal processing) and matrix multiplication
The most widely executed instructions are the simple operations of an instruction set Average usage in SPECint92 on Intel 80x86:
Rank 1 2 3 4 5 6 7 8 9 10 80x86 Instruction Load Conditional branch Compare Store Add And Sub Move register-register Call Return Total Integer Average (% total executed) 22% 20% 16% 12% 8% 6% 5% 4% 1% 1% 96%
Frequency of Operations Usage
Make the common case fast by focusing on these operations
Control Flow Instructions
Data is based on SPEC2000 on Alpha
Jump: unconditional change in the control flow Branch: conditional change in the control flow Procedure calls and returns
Destination Address Definition

Data is based SPEC2000 on Alpha
PC-relative addressing
Good for short position-independent forward & backward jumps
Register indirect addressing

Good for dynamic libraries, virtual functions & packed case statements
Condition Evaluation
Name
Condition Code (CC) Condition register Compare & branch
How condition is tested

Special bits are set by ALU operations, possibly under program control Test arbitrary register with the result of a comparison Compare is part of the branch.
Advantages
Sometimes condition is set for free Simple One instruction rather than two for a branch
Disadvantages
CC is extra state. Condition codes constrain instructions ordering since they pass info. from one instruction to a branch Uses up a register May be too much work per instruction
Based on SPEC92 on MIPS
Remember to focus Remember to focus on the common case on the common case
Frequency of Types of Comparison

Data is based on SPEC2000 on Alpha Different benchmark and machine set new design priority
DSPs support repeat instruction for for loops (vectors) using 3 registers
Type and Size of Operands

Operand type encoded in instruction opcode
The type of an operand effectively gives its size
Common types include character, half word and word size integer, single- and doubleprecision floating point
Characters are almost always in ASCII, though 16bit Unicode (for international characters) is gaining popularity Integers in 2s complement Floating point in IEEE 754
Unusual Types
Business Applications
Binary Coded Decimal (BCD)
Exactly represents all decimal fractions (binary doesnt!)
8-bit exponent 24-bit mantissa
DSP
Fixed point
Good for limited range numbers: more mantissa bits
Block floating point

Single shared exponent for multiple numbers
fixed exponent 32-bit mantissa
Graphics
4-element vector operations (RGBA or XYZW)
8-bit, 16-bit or singleprecision floating point
Size of Operands
Frequency of reference by size based on SPEC2000 on Alpha
Double-word: double-precision floating point + addresses in 64-bit machines Words: most integer operations + addresses in 32-bit machines For the mix in SPEC, word and double-word data types dominates
Instruction Representation
All data in computer systems is represented in binary Instructions are no exception The program that translates the human-readable code to numeric form is called an Assembler Hence machine-language or assembly-language
Example: Assembly: M/C language (binary): M/C language (hex): ADD $t0, $s1, $s2
000000 00001 00010 00000 00000 100000 0000 0000 0010 0010 0000 0000 0010 0000 0x00220020
Note: MIPS compiler by default maps $s0,,$s7 to reg. 16-23 and $t0,,$t7 to reg. 8-15
Encoding an Instruction Set

Affects the size of the compiled program Also complexity of the CPU implementation Operation in one field called opcode Addressing mode in opcode or separate field Must balance:
Desire to support as many registers and addressing modes as possible Effect of operand specification on the size of the instruction (and program) Desire to simplify instruction fetching and decoding during execution
Fixed size instruction encoding simplifies CPU design but limits addressing choices
Encoding Examples
MIPS Instruction Formats

opcodes
000 001 010 011 100 101 110 111 000 R-type j jal beq bne blez bgtz 001 addi addiu slti sltiu andi ori xori 010 011 llo lhi trap 100 lb lh lw lbu lhu 101 sb sh sw 110 111
funct codes
000 001 010 011 100 101 110 111 000 001 010 sll srl jr jalr mfhi mthi mflo mult multu div add addu sub slt 011 100 sra sllv mtlo divu subu sltu 101 110 111 srlv srav
and
or
xor
nor
The Stored Program Concept

Todays computers are build on two key principles :
Instructions are represented as numbers Programs can be stored in memory to be read or written just like numbers
Memory Accounting program (machine code) Editor program (machine code) C compiler (machine code) Payroll data Book text Source code in C for editor program
Memory can contain:

the source code for an editor the compiled m/c code for the editor the text that the compiled program is using the compiler that generated the code
Processor
Conclusion
Summary
Type and size of operands
(common data types, effect of operand size on complexity)
Encoding the instruction set

(Fixed, variable and hybrid encoding, stored program)
Next Week
Role and effect of compilers on ISA Pipelined execution of instructions Pipeline hazards

CMSC 611: Advanced Computer Architecture

Uploaded by

Copyright:

Available Formats

CMSC 611: Advanced Computer Architecture

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CMSC 611: Advanced Computer Architecture

Uploaded by

Copyright:

Available Formats

CMSC 611: Advanced Computer Architecture

Instruction Set Architecture

Instruction Set Architecture

Design decisions must take into account:

Slide: Dave Patterson

Extended Accumulator Architecture (8086)

General-Purpose Register Architecture (MIPS)

Other types of Architecture

Other types of Architecture

Stack machines abandon registers altogether

Evolution of Instruction Sets

Separation of Programming Model from Implementation

General Purpose Register Machines

Max. number of operands

Effect of the number of memory operands:

Object addressed Byte Half word Word Double word

Aligned at byte offsets 1,2,3,4,5,6,7 0,2,4,6 0,4 0

Misaligned at byte offsets Never 1,3,5,7 1,2,3,5,6,7 1,2,3,4,5,6,7

Example of Addressing Modes

ADD R4, -(R2)

ADD R4, 100 (R2) [R3]

Addressing Mode Use

Based on SPEC89 on VAX

The range of displacement supported affects the length of the instruction

Displacement Addressing Modes

Number of bits needed for a displacement value in SPEC2000 benchmark

Immediate Addressing Modes

Statistics are based on SPEC2000 benchmark on Alpha

Range affects instruction length

Distribution of Immediate Values

Measurements were taken on Alpha (only 16 bit immediate value allowed)

Number of bits needed for a immediate values in SPEC2000 benchmark

Addressing Mode for Signal Processing

Addressing Mode for Signal Processing

Summary of MIPS Addressing Modes

3. Base addressing op rs rt Address Memor y

4. PC-relative addressing op rs rt Address Memor y

5. Pseudodirect addressing op Address

Operations of the Computer Hardware

Operations in the Instruction Set

Operations for Media & Signal Process.

Paired single operations

Multiply and accumulate

Frequency of Operations Usage

Make the common case fast by focusing on these operations

Control Flow Instructions

Data is based on SPEC2000 on Alpha

Destination Address Definition

Register indirect addressing

How condition is tested

Based on SPEC92 on MIPS

Frequency of Types of Comparison

Type and Size of Operands

8-bit exponent 24-bit mantissa

Block floating point

fixed exponent 32-bit mantissa

Encoding an Instruction Set

MIPS Instruction Formats

The Stored Program Concept