Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Isas and Y86-64: Samira Khan

Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

ISAs and Y86-64

Samira Khan
Agenda
• ISA vs Microarchitecture
• ISA Tradeoffs
• Y86-64 ISA
• Y86-64 Format
• Y86-64 Encoding/Decoding
LEVELS OF TRANSFORMATION
• ISA
• Agreed upon interface between software and
hardware
• SW/compiler assumes, HW promises
• What the software writer needs to know to
write system/user programs Problem

• Microarchitecture Algorithm
Program/Language
• Specific implementation of an ISA
ISA
• Not visible to the software
Microarchitecture
• Microprocessor Logic
• ISA, uarch, circuits Circuits
• “Architecture” = ISA + microarchitecture

3
ISA VS. MICROARCHITECTURE
• What is part of ISA vs. Uarch?
• Gas pedal: interface for “acceleration”
• Internals of the engine: implements “acceleration”
• Add instruction vs. Adder implementation

• Implementation (uarch) can be various as long as it


satisfies the specification (ISA)
• Bit serial, ripple carry, carry lookahead adders
• x86 ISA has many implementations: 286, 386, 486, Pentium, Pentium Pro,

• Uarch usually changes faster than ISA


• Few ISAs (x86, SPARC, MIPS, Alpha) but many uarchs
• Why?

4
ISA
• Instructions
• Opcodes, Addressing Modes Data Types
• Instruction Types and Formats
• Registers, Condition Codes

• Memory
• Address space, Addressability, Alignment
• Virtual memory management
• Call, Interrupt/Exception Handling
• Access Control, Priority/Privilege
• I/O
• Task Management
• Power and Thermal Management
• Multi-threading support, Multiprocessor support

5
Example ISAs
• x86 — dominant in desktops, servers
• ARM — dominant in mobile devices
• POWER — Wii U, IBM supercomputers and some servers
• MIPS — common in consumer wifi access points
• SPARC — some Oracle servers, Fujitsu supercomputers
• z/Architecture — IBM mainframes
• Z80 — TI calculators
• SHARC — some digital signal processors
• Itanium — some HP servers (being retired)
• RISC V — some embedded
• …
Agenda
• ISA vs Microarchitecture
• ISA Tradeoffs
• Y86-64 ISA
• Y86-64 Format
• Y86-64 encoding/decoding
ISA: INSTRUCTION LENGTH
• Fixed length: Length of all instructions the same
+ Easier to decode single instruction in hardware
+ Easier to decode multiple instructions concurrently
-- Wasted bits in instructions (Why is this bad?)
-- Harder-to-extend ISA (how to add new instructions?)

• Variable length: Length of instructions different (determined by


opcode and sub-opcode)
+ Compact encoding (Why is this good?)
Intel 432: Huffman encoding (sort of). 6 to 321 bit instructions. How?
-- More logic to decode a single instruction
-- Harder to decode multiple instructions concurrently

8
ISA: ADDRESSING MODES
• Addressing mode specifies how to obtain an operand of an instruction
• Register
• Immediate
• Memory (displacement, register indirect, indexed, absolute, memory indirect,
autoincrement, autodecrement, …)

• x86-64: 10(%r11,%r12,4)
• ARM: %r11 << 3 (shift register value by constant)
• VAX: ((%r11)) (register value is pointer to pointer)

9
ISA: Condition Codes
cmpq %r11, %r12
je somewhere

• could do:
/* _Branch if _EQual */
beq %r11, %r12, somewhere
ISA-LEVEL TRADEOFFS: SEMANTIC GAP
• Where to place the ISA? Semantic gap
• Closer to high-level language (HLL) or closer to hardware control
signals? à Complex vs. simple instructions
• RISC vs. CISC vs. HLL machines
• FFT, QUICKSORT, POLY, FP instructions?
• VAX INDEX instruction (array access with bounds checking)
• e.g., A[i][j][k] one instruction with bound check

11
SEMANTIC GAP
High-Level Language

Software
Semantic Gap

ISA

Hardware

Control Signals

12
SEMANTIC GAP
High-Level Language

Software
Semantic Gap
ISA
CISC

RISC

Hardware

Control Signals

13
ISA-LEVEL TRADEOFFS:
SEMANTIC GAP
• Where to place the ISA? Semantic gap
• Closer to high-level language (HLL) or closer to hardware
control signals? à Complex vs. simple instructions
• RISC vs. CISC vs. HLL machines
• FFT, QUICKSORT, POLY, FP instructions?
• VAX INDEX instruction (array access with bounds checking)
• Tradeoffs:
• Simple compiler, complex hardware vs. complex compiler, simple
hardware
• Burden of backward compatibility
• Performance?
• Optimization opportunity: Example of VAX INDEX instruction: who
(compiler vs. hardware) puts more effort into optimization?
• Instruction size, code size

14
SMALL SEMANTIC GAP EXAMPLES IN VAX
• FIND FIRST
• Find the first set bit in a bit field
• Helps OS resource allocation operations
• SAVE CONTEXT, LOAD CONTEXT
• Special context switching instructions
• INSQUEUE, REMQUEUE
• Operations on doubly linked list
• INDEX
• Array access with bounds checking
• STRING Operations
• Compare strings, find substrings, …
• Cyclic Redundancy Check Instruction
• EDITPC
• Implements editing functions to display fixed format output

• Digital Equipment Corp., “VAX11 780 Architecture Handbook,” 1977-78.

15
CISC vs. RISC

X:
REPMOVS MOV
ADD
x86: REP MOVS DEST SRC COMP
MOV
ADD
JMP X

Which one is easy to optimize?


16
SMALL VERSUS LARGE SEMANTIC GAP
• CISC vs. RISC
• Complex instruction set computer à complex instructions
• Initially motivated by “not good enough” code generation
• Reduced instruction set computer à simple instructions
• John Cocke, mid 1970s, IBM 801
• Goal: enable better compiler control and optimization

• RISC motivated by
• Memory stalls (no work done in a complex instruction when
there is a memory stall?)
• When is this correct?
• Simplifying the hardware à lower cost, higher frequency
• Enabling the compiler to optimize the code better
• Find fine-grained parallelism to reduce stalls

17
Typical RISC ISA properties
• fewer, simpler instructions
• separate instructions to access memory
• fixed-length instructions
• more registers
• no instructions with two memory operands
• few addressing modes
Agenda
• ISA vs Microarchitecture
• ISA Tradeoffs
• Y86-64 ISA
• Y86-64 Format
• Y86-64 encoding/decoding
Y86-64 instruction set
• based on x86
• omits most of the 1000+ instructions
addq jmp pushq
subq jCC popq
andq cmovCC movq (renamed)
xorq call hlt (renamed)
nop ret
• much, much simpler encoding
Y86-64: movq

• irmovq immovq iimovq


• rrmovq rmmovq rimovq
• mrmovq mmmovq mimovq
Y86-64: cmovCC
• conditional move
• (Conditionally) copy value from source to destination register
• Y86-64: register-to-register only
• instead of:
jle skip_move
rrmovq %rax, %rbx
skip_move:
• // ...
• can do:
cmovg %rax, %rbx
Y86-64: halt
• (x86-64 instruction called hlt)
• Y86-64 instruction halt
• stops the processor
• otherwise — something’s in memory “after” program!
• real processors: reserved for OS
Y86-64: specifying addresses
• rmmovq %r11, 10(%r12)
• memory[10 + r12] ß r11

• r12 ß memory[10 + r11] + r12


mrmovq 10(%r11), %r11
/* overwrites %r11 */
addq %r11, %r12
Y86-64: accessing memory
• r12 ß memory[10 + 8 * r11] + r12
/* replace %r11 with 8*%r11 */
addq %r11, %r11
addq %r11, %r11
addq %r11, %r11
mrmovq 10(%r11), %r11
addq %r11, %r12
Y86-64 constants
• irmovq $100, %r11
• only instruction with non-address constant operand

• r12 ß r12 + 1
• Invalid: addq $1, %r12
• Instead, need an extra register:
irmovq $1, %r11
addq %r11, %r12
Y86-64: condition codes
• ZF — value was zero?
• SF — sign bit was set? i.e. value was negative?
• this course: no OF, CF (to simplify assignments)
• set by addq, subq, andq, xorq
• not set by anything else
Y86-64: using condition codes

subq SECOND, FIRST (value = FIRST - SECOND)


j__ or cmov__ condition code bit test value test
le SF = 1 or ZF = 1 value <= 0
l SF = 1 value < 0
e ZF = 1 value = 0
ne ZF = 0 value != 0
ge SF = 0 value >= 0
g SF = 0 and ZF = 0 value > 0
push/pop
pushq %rbx
%rsp ß %rsp − 8
memory[%rsp] ß %rbx

popq %rbx
%rbx ß memory[%rsp]
%rsp ß %rsp + 8
Agenda
• ISA vs Microarchitecture
• ISA Tradeoffs
• Y86-64 ISA
• Y86-64 Format
• Y86-64 encoding/decoding
Y86-64 Instruction Set #1
Byte 0 1 2 3 4 5 6 7 8 9

halt 0 0

nop 1 0

cmovXX rA, rB 2 fn rA rB

irmovq V, rB 3 0 F rB V

rmmovq rA, D(rB) 4 0 rA rB D

mrmovq D(rB), rA 5 0 rA rB D

OPq rA, rB 6 fn rA rB

jXX Dest 7 fn Dest

call Dest 8 0 Dest

ret 9 0

pushq rA A 0 rA F

popq rA B 0 rA F
Y86-64 Instruction Set #2 rrmovq 2 0
Byte 0 1 2 3 4 5 6 7 8 9
cmovle 2 1
halt 0 0
cmovl 2 2
nop 1 0
cmove 2 3
cmovXX rA, rB 2 fn rA rB
cmovne 2 4
irmovq V, rB 3 0 F rB V
cmovge 2 5
rmmovq rA, D(rB) 4 0 rA rB D
cmovg 2 6
mrmovq D(rB), rA 5 0 rA rB D

OPq rA, rB 6 fn rA rB

jXX Dest 7 fn Dest

call Dest 8 0 Dest

ret 9 0

pushq rA A 0 rA F

popq rA B 0 rA F
Y86-64 Instruction Set #3
Byte 0 1 2 3 4 5 6 7 8 9

halt 0 0

nop 1 0

cmovXX rA, rB 2 fn rA rB

irmovq V, rB 3 0 F rB V

rmmovq rA, D(rB) 4 0 rA rB D


addq 6 0
mrmovq D(rB), rA 5 0 rA rB D
subq 6 1
OPq rA, rB 6 fn rA rB
andq 6 2
jXX Dest 7 fn Dest
xorq 6 3
call Dest 8 0 Dest

ret 9 0

pushq rA A 0 rA F

popq rA B 0 rA F
Y86-64 Instruction Set #4
Byte 0 1 2 3 4 5 6 7 jmp
8 97 0
halt 0 0
jle 7 1
nop 1 0
jl 7 2
cmovXX rA, rB 2 fn rA rB je 7 3
irmovq V, rB 3 0 F rB V jne 7 4
rmmovq rA, D(rB) 4 0 rA rB D jge 7 5
mrmovq D(rB), rA 5 0 rA rB D jg 7 6
OPq rA, rB 6 fn rA rB

jXX Dest 7 fn Dest

call Dest 8 0 Dest

ret 9 0

pushq rA A 0 rA F

popq rA B 0 rA F
Encoding Registers
• Each register has 4-bit ID
%rax 0 %r8 8
%rcx 1 %r9 9
%rdx 2 %r10 A
%rbx 3 %r11 B
%rsp 4 %r12 C
%rbp 5 %r13 D
%rsi 6 %r14 E
%rdi 7 No Register F

• Same encoding as in x86-64


• Register ID 15 (0xF) indicates “no register”
• Will use this in our hardware design in multiple places
Instruction Example
• Addition Instruction
Generic Form

Encoded Representation

addq rA, rB 6 0 rA rB

• Add value in register rA to that in register rB


• Store result in register rB
• Note that Y86-64 only allows addition to be applied to register data
• Set condition codes based on result
• e.g., addq %rax,%rsi Encoding: 60 06
• Two-byte encoding
• First indicates instruction type
• Second gives source and destination registers
Arithmetic and Logical Operations
Instruction Code Function Code
Add • Refer to generically as
“OPq”
addq rA, rB 6 0 rA rB
• Encodings differ only by
Subtract (rA from rB)
“function code”
• Low-order 4 bytes in first
subq rA, rB 6 1 rA rB instruction word
• Set condition codes as side
And effect
andq rA, rB 6 2 rA rB

Exclusive-Or

xorq rA, rB 6 3 rA rB
Move Operations
Register è Register
rrmovq rA, rB 2 0 rA rB

Immediate è Register
irmovq V, rB 3 0 F rB V

Register è Memory
rmmovq rA, D(rB) 4 0 rA rB D

Memory è Register
mrmovq D(rB), rA 5 0 rA rB D

• Like the x86-64 movq instruction


• Simpler format for memory addresses
• Give different names to keep them distinct
Conditional Move Instructions
Move Unconditionally
rrmovq rA, rB 2 0 rA rB • Refer to generically as
Move When Less or Equal “cmovXX”
cmovle rA, rB 2 1 rA rB • Encodings differ only by
Move When Less
“function code”
cmovl rA, rB 2 2 rA rB • Based on values of
Move When Equal
condition codes
cmove rA, rB 2 3 rA rB
• Variants of rrmovq
instruction
Move When Not Equal
• (Conditionally) copy value
cmovne rA, rB 2 4 rA rB from source to destination
Move When Greater or Equal register
cmovge rA, rB 2 5 rA rB

Move When Greater


cmovg rA, rB 2 6 rA rB
Jump Instructions
Jump (Conditionally)
jXX Dest 7 fn Dest

• Refer to generically as “jXX”


• Encodings differ only by “function code” fn
• Based on values of condition codes
• Same as x86-64 counterparts
• Encode full destination address
• Unlike PC-relative addressing seen in x86-64
Jump Instructions
Jump Unconditionally
jmp Dest 7 0 Dest

Jump When Less or Equal


jle Dest 7 1 Dest

Jump When Less


jl Dest 7 2 Dest

Jump When Equal


je Dest 7 3 Dest

Jump When Not Equal


jne Dest 7 4 Dest

Jump When Greater or Equal


jge Dest 7 5 Dest

Jump When Greater


jg Dest 7 6 Dest
Stack Operations
pushq rA A 0 rA F

• Decrement %rsp by 8
• Store word from rA to memory at %rsp
• Like x86-64

popq rA B 0 rA F

• Read word from memory at %rsp


• Save in rA
• Increment %rsp by 8
• Like x86-64
Subroutine Call and Return
call Dest 8 0 Dest

• Push address of next instruction onto stack


• Start executing instructions at Dest
• Like x86-64

ret 9 0

• Pop value from stack


• Use as address for next instruction
• Like x86-64
Miscellaneous Instructions
nop 1 0

• Don’t do anything

halt 0 0

• Stop executing instructions


• x86-64 has comparable instruction, but can’t execute it
in user mode
• We will use it to stop the simulator
• Encoding ensures that program hitting memory
initialized to zero will halt
Agenda
• ISA vs Microarchitecture
• ISA Tradeoffs
• Y86-64 ISA
• Y86-64 Format
• Y86-64 Encoding/Decoding
Y86-64 encoding
long addOne(long x) {
return x + 1;
}
• x86-64:
movq %rdi, %rax
addq $1, %rax
ret
• Y86-64:
irmovq $1, %rax
addq %rdi, %rax
ret
Byte 0 1 2 3 4 5 6 7 8 9
halt 0 0

Y86-64 encoding nop 1 0

cmovXX rA, rB 2 fn rA rB
irmovq V, rB 3 0 F rB V
rmmovq rA, D(rB)
4 0 rA rB D
mrmovq D(rB), 5rA0 rA rB D
OPq rA, rB 6 fn rA rB
addOne: jXX Dest 7 fn Dest
irmovq $1, %rax call Dest 8 0 Dest

addq %rdi, %rax ret 9 0


pushq rA A 0 rA F
ret popq rA B 0 rA F
Byte 0 1 2 3 4 5 6 7 8 9
halt 0 0

Y86-64 encoding
nop 1 0

cmovXX rA, rB 2 fn rA rB
irmovq V, rB 3 0 F rB V
rmmovq rA, D(rB)
4 0 rA rB D

doubleTillNegative: mrmovq D(rB), 5rA0 rA rB D


OPq rA, rB 6 fn rA rB
/* suppose at address 0x123 */
jXX Dest 7 fn Dest
addq %rax, %rax call Dest 8 0 Dest

jge doubleTillNegative ret 9 0


pushq rA A 0 rA F
popq rA B 0 rA F
Y86-64 decoding
20 10 60 20 61 37 72 84 00 00 00 00 00 00 00 20 12 20 01 70 68 00 00
00 00 00 00 00
rrmovq %rcx, %rax Byte 0 1 2 3 4 5 6 7 8 9
• 0 as cc: always halt 0 0
• 1 as reg: %rcx nop 1 0
• 0 as reg: %rax
cmovXX rA, rB 2 fn rA rB
addq %rdx, %rax irmovq V, rB 3 0 F rB V
subq %rbx, %rdi rmmovq rA, D(rB)
4 0 rA rB D
• 0 as fn: add
mrmovq D(rB), 5rA0 rA rB D
• 1 as fn: sub
jl 0x84 OPq rA, rB 6 fn rA rB

• 2 as cc: l (less than) jXX Dest 7 fn Dest

• hex 84 00… as little endian Dest: 0x84 call Dest 8 0 Dest

rrmovq %rcx, %rdx ret 9 0

rrmovq %rax, %rcx pushq rA A 0 rA F

jmp 0x68 popq rA B 0 rA F

You might also like