Emulation - Interpretation
Emulation - Interpretation
Emulation - Interpretation
Emulation Interpretation
basic, threaded, directed threaded other issues
Binary translation
code discovery, code location other issues
Key VM Technologies
Emulation binary in one ISA is executed in processor supporting a different ISA Dynamic Optimization binary is improved for higher performance
may be done as part of emulation may optimize same ISA (no emulation needed)
X86 apps Windows Alpha Emulation
HP UX HP Apps.
HP PA ISA
Optimization
2
Simulation
method for modeling a (sub)systems operation objective is to study the process; not just to imitate the function typically emulation is part of the simulation process
3
Definitions
Guest
environment being supported by underlying platform
Guest
Host
underlying platform that provides guest environment
supported by
Host
Definitions (2)
Source ISA or binary
original instruction set or binary the ISA to be emulated
Source
emulated by
Target
Emulation
Required for implementing many VMs. Process of implementing the interface and functionality of one (sub)system on a (sub)system having a different interface and functionality
terminal emulators, such as for VT100, xterm, putty
Binary translation
complex implementation high initial translation cost, small execution cost selective compilation
Interpreter State
An interpreter needs to maintain the complete architected state of the machine implementing the source ISA registers memory code data stack
Program Counter
Code
. . .
Data
Reg n-1
Stack
Interpreter Code
while (!halt && !interrupt) { inst = code[PC]; opcode = extract(inst,31,6); switch(opcode) { case LoadWordAndZero: LoadWordAndZero(inst); case ALU: ALU(inst); case Branch: Branch(inst); . . .} } Instruction function list
9
10
11
14
15
Advantages
binary intermediate code still portable improves efficiency over basic interpretation
Disadvantages
code replication increases interpreter size
16
dispatch loop
Decode-dispatch
Threaded
17
Predecoding
Parse each instruction into a pre-defined structure to facilitate interpretation
separate opcode, operands, etc. reduces shifts / masks significantly more useful for CICS ISAs changes to input binary damages portability
07
1 3 3
2 08 1 37 4
08 03 00
Predecoding (2)
struct instruction { unsigned long op; unsigned char dest, src1, src2; } code [CODE_SIZE]; Load Word and Zero: RT = code[TPC].dest; RA = code[TPC].src1; displacement = code[TPC].src2; if (RA == 0) source = 0; else source = regs[RA]; address = source + displacement; regs[RT] = (data[address]<< 32) >> 32; SPC = SPC + 4; TPC = TPC + 1; If (halt || interrupt) goto exit; opcode = code[TPC].op routine = dispatch[opcode]; goto *routine;
19
21
22
predecoder
23
Dispatch
...
24
...
Shared Routines
25
Binary Translation
Translate source binary program to target binary before execution is the logical conclusion of predecoding get rid of parsing and jumps altogether allows optimizations on the native code achieves higher performance than interpretation needs mapping of source state onto the host state (state mapping)
26
r1 points to x86 register context block r2 points to x86 memory image r3 contains x86 ISA PC value
27
binary translator
29
State Mapping
Maintaining the state of the source machine on the host (target) machine. state includes source registers and memory contents source registers can be held in host registers or in host memory reduces loads/stores significantly easier if target registers > source registers
30
Register Mapping
Map source registers to target registers
spill registers if needed
source ISA
Source Register Block
target ISA
R1
R2
program counter
R3
R2 R5 R6
reg n
RN+4
31
32
33
31 c0 8b b5 00 00 03 08 8b bd 00 00 03 00 %esi, 0x08030000(%ebp) ??
34
;load jump address from memory ;jump indirect through %eax ;compute x86 address ;get x86 jump address ; from x86 memory image ;move to count register ;jump indirect through ctr
36
mtctr r4 bctr
Simplified Solutions
Fixed-width RISC ISA are always aligned on fixed boundaries Use special instruction sets (Java)
no jumps/branches to arbitrary locations no data or pads mixed with instructions all code can then be discovered
37
Translate code
incrementally, as it is discovered place translated code in code cache use lookup table to save source to target PC mappings
Emulation process
execute translated block lookup next source PC in lookup table
if translated, jump to target PC else, interpret and translate
38
miss
translator
hit
Translation Memory
39
40
block 2
block 3 block 4
block 5
Flow of Control
Even after all blocks are translated, control flows between translated blocks and emulation manager. EM connects the translated blocks during execution. Optimizations can reduce the overhead of going through the EM between every pair of translation blocks.
42
Emulation Manager
translation block
translation block
43
General approach
translator returns to EM via branch-and-link (BL) SPC placed in stub immediately after BL EM uses link register to find SPC and hash to next target code block
Hash Table
Code Block
Code Block
44
Hit in Table?
No
Use SPC to Read Insts. from Source Memory Image -------------------Interpret, Translate and Place into Tranlsation Memory
Write new SPC -> TPC mapping into Table Get SPC for next Block
45
Translation Chaining
Translation blocks are linked into chains If the successor block has not yet being translated
code is inserted to jump to the EM later, after jumping to the EM, if the EM finds that the successor block has being translated, then the jump is modified to instead point directly to the successor
46
With Chaining
translation block
VMM
translation block
VMM
translation block
translation block
translation block
translation block
47
Lookup Successor
4
Set up chain
Successor
48
9AE4: 9C08:
49
51
52
53
Register Architectures
GPRs of the target ISA are used for
holding source ISA GPR holding source ISA special-purpose registers point to register context block and memory image holding intermediate emulator values
Issues
target ISA registers < source ISA registers prioritizing the use of target ISA registers
54
Condition Codes
Condition codes are not used uniformly
IA-32 ISA sets CC implicitly SPARC and PowerPC set CC explicitly MIPS ISA does not use CC
55
56
Optimizations can also be performed to analyze code to detect cases where CC generated will never be used
57
add jmp
genZF: add r29,r28,r27 ;add opcode to jump table base mtctr r29 ;copy to counter register bctr ;branch via jump table ... ... add: add. r24,r25,r26 ;perform PowerPC add, set cr0 blr ;return
59
61
62
Byte Order
Ordering of bytes within a word may differ
little endian and big endian
Target code must perform byte ordering Guest data image is generally maintained in the same byte order as assumed by the source ISA Emulation software modifies addresses when bytes within words are addressed
can be very inefficient