Lec13 X86asm
Lec13 X86asm
Lec13 X86asm
Instructions
Assembled into machine code by assembler Executed at runtime by the CPU Member of the Intel IA-32 instruction set Four parts
Label (optional) Mnemonic (required) Operand (usually required) Comment (optional) Mnemonic Operand(s) ;Comment
3
Label:
Labels
Act as place markers
marks the address (offset) of code and data
Easier to memorize and more flexible mov ax ax, [0020] mov ax ax, val Follow identifier rules Data D l label b l
must be unique example: l myArray BYTE 10
Identifiers
1-247 characters, including digits case insensitive (by default) first character must be a letter, _, @, or $ examples: p var1 Count $first _main MAX open_file @@myfile xVal _12345
Operands
constant (immediate value), 96 constant expression, expression 2+4 Register, eax memory y( (data label), ), cou count t ; set Carry flag ; add 1 to ax ; move BX to count
6
Number of operands: 0 to 3
stc inc ax mov count, , bx
Directives
Commands that are recognized and acted upon by the assembler
Part of assemblers syntax but not part of the Intel instruction set Used to declare code, data areas, select memory model declare procedures model, procedures, etc etc. case insensitive
Examples: E l .data d
.code d
PROC OC
Comments
Comments are good!
explain the program's program s purpose tricky coding techniques application-specific pp p explanations p
Single-line comments
begin g with semicolon (;)
block comments
begin with COMMENT directive and a programmer programmerchosen character and end with the same programmer-chosen character COMMENT ! This is a comment and this line is also a comment !
comment
; This program adds and subtracts 32-bit integers. INCLUDE Irvine32.inc copy definitions from Irvine32.inc .code code segment. seg e t. 3 segments: seg e ts: code, data, stac stack main PROC beginning of a procedure mov eax,10000h source ; EAX = 10000h add eax,40000h ; EAX = 50000h d destination ti ti sub eax,20000h ; EAX = 30000h call DumpRegs ; display registers exit defined in Irvine32 Irvine32.inc inc to end a program main ENDP END main marks the last line and
Example output
Program output output, showing registers and flags:
EAX 00030000 EAX=00030000 ESI=00000000 EIP=00401024 EBX EBX=7FFDF000 7FFDF000 EDI=00000000 EFL=00000206 ECX ECX=00000101 00000101 EBP=0012FFF0 CF=0 SF=0 EDX EDX=FFFFFFFF FFFFFFFF ESP=0012FFC4 ZF=0 OF=0
10
; This program adds and subtracts 32-bit integers. .386 .MODEL flat,stdcall , .STACK 4096 ExitProcess PROTO, dwExitCode:DWORD DumpRegs PROTO .code main PROC mov eax,10000h add eax,40000h sub eax,20000h call DumpRegs INVOKE ExitProcess,0 main ENDP END main
11
Program template
TITLE Program Template ; ; ; ; ; Program Description: Author: Creation Date: Revisions: Date: Modified by: (Template.asm)
.data ; (insert variables here) .code code main PROC ; (insert executable instructions here) exit i main ENDP ; (insert additional procedures here) END main
12
Step 3: linker
Step 4: OS loader
Output
13
Defining data
(1 of 2)
WORD, SWORD
16-bit unsigned & signed integer
DWORD, SDWORD
32-bit unsigned & signed integer
QWORD Q
64-bit integer
TBYTE
80-bit integer
15
(2 of 2)
REAL8
8-byte IEEE long real
REAL10
10-byte IEEE extended real
16
17
Integer constants
[{+|-}] digits [radix] Optional leading + or sign binary, decimal, hexadecimal, or octal digits C Common radix di characters: h t
h d b r o hexadecimal d i l (d decimal (default) f lt) binary encoded real octal
Examples: 30d, 6Ah, 42, 42o, 1101b Hexadecimal beginning with letter: 0A5h
18
Integer expressions
Operators and precedence levels:
Examples:
19
double
1 S 11 E 52 M
20
21
Embedded quotes:
Say y "Goodnight," g Gracie "This isn't a test"
22
value4 SBYTE -128 ; smallest signed byte value5 SBYTE +127 ; largest signed byte value6 BYTE ? ; uninitialized byte
24
Defining strings
(1 of 2)
A string is implemented as an array of characters For convenience, it is usually enclosed in quotation marks q It usually has a null byte at the end Examples:
str1 BYTE str2 BYTE str3 BYTE greeting1 "Enter your name",0 'Error: halting program',0 'A','E','I','O','U' BYTE "Welcome to the Encryption Demo program " BYTE "created created by Kip Irvine Irvine.",0 0 greeting2 \ BYTE "Welcome to the Encryption Demo program " BYTE "created by Kip Irvine.",0
25
Defining strings
(2 of 2)
End-of-line character sequence: 0Dh = carriage return 0Ah = line feed str1 BYTE "Enter your name: ",0Dh,0Ah BYTE "Enter Enter your address: ",0 0 newLine BYTE 0Dh 0Dh,0Ah,0 0Ah 0
Idea: Define all strings used by your program in the same area of the data segment.
26
28
29
30
31
; ; ; ; ;
start with 10000h add 40000h subtract 20000h store the result ( (30000h) ) display the registers
32
Within the segment, declare variables with "?" initializers: (will not be assembled into .exe)
Advantage: the program's EXE file size is reduced. .data smallArray DWORD 10 DUP(0) .data? bigArray DWORD 5000 DUP(?)
33
34
Symbolic constants
Equal-sign directive
name = expression
expression i i is a 32-bit 32 bit integer i t ( (expression i or constant) t t) may be redefined name is i called ll d a symbolic b li constant t t
36
list BYTE 10,20,30,40 var2 BYTE 20 DUP(?) ListSize = ($ - list) myString BYTE This is a long string. myString_len St i l = ($ - myString) St i )
37
38
EQU directive
name EQU expression name EQU symbol Q <text> name EQU Define a symbol as either an integer or text expression. expression Can be useful for non-integer constants Cannot C t be b redefined d fi d
39
EQU directive
PI EQU Q <3.1416> pressKey EQU <"Press any key to continue...",0> .data prompt BYTE pressKey matrix1 EQU 10*10 matrix2 EQU <10 <10*10> 10> .data M1 WORD matrix1 M2 WORD matrix2 ; M1 WORD 100 ; M2 WORD 10*10
40
Addressing
Addressing Modes
Addressing Modes
Operand types
Three basic types of operands:
I Immediate di t a constant t ti integer t (8 (8, 16 16, or 32 bit bits) ) value is encoded within the instruction Register R i t the th name of f a register i t register name is converted to a number and encoded within the instruction Memory reference to a location in memory memory address dd i is encoded d d within ithi th the instruction, or a register holds the address of a memory location
45
46
; AL = 10h ; AL = 10h
Direct-offset operands
A constant offset is added to a data label to produce an effective address (EA). (EA) The address is dereferenced to get the value inside its memory location. (no range checking) .data arrayB BYTE 10h,20h,30h,40h .code d mov al,arrayB+1 ; AL = 20h mov al,[arrayB+1] ; alternative notation mov al,arrayB+3 ; AL = 40h
48
50
OFFSET Operator
OFFSET returns the distance in bytes, of a label
The Protected-mode programs we write only have a single g segment g (we use the flat memory ( y model). )
51
OFFSET Examples
Let's assume that bVal is located at 00404000h: .data bVal BYTE ? wVal WORD ? dVal DWORD ? dV l2 DWORD ? dVal2 .code mov esi,OFFSET mov esi,OFFSET mov esi,OFFSET mov esi,OFFSET
= = = =
Relating to C/C++
The value returned by OFFSET is a pointer. Compare the following code written for both C++ and assembly language: ; C++ version: char array[1000]; char * p = &array; .data array BYTE 1000 DUP(?) .code mov esi esi,OFFSET OFFSET array ; ESI is p
53
TYPE Operator
The TYPE operator returns the size, in bytes, of a single element of a data declaration. declaration .data var1 BYTE ? var2 WORD ? var3 DWORD ? var4 QWORD ? .code mov eax,TYPE mov eax,TYPE mov eax,TYPE mov eax eax,TYPE TYPE
; ; ; ;
1 2 4 8
54
LENGTHOF Operator
The LENGTHOF operator counts the number of elements in a single data declaration. declaration .data byte1 BYTE 10,20,30 array1 WORD 30 DUP(?),0,0 array2 2 WORD 5 DUP(3 DUP(?)) array3 DWORD 1,2,3,4 digitStr BYTE "12345678",0 12345678 ,0 .code mov ecx,LENGTHOF array1 LENGTHOF ; 3 ; 32 ; 15 ; 4 ; 9
; 32
55
SIZEOF Operator
The SIZEOF operator returns a value that is equivalent to multiplying LENGTHOF by TYPE. TYPE .data byte1 BYTE 10,20,30 array1 WORD 30 DUP(?),0,0 array2 WORD 5 DUP(3 DUP(?)) array3 DWORD 1,2,3,4 digitStr BYTE "12345678" 12345678 ,0 0 .code mov ecx,SIZEOF array1 SIZEOF ; 3 ; 64 ; 30 ; 16 ; 9
; 64
56
ALIGN Directive
ALIGN bound aligns a variable on a byte, word, doubleword or paragraph boundary for doubleword, efficiency. (bound can be 1, 2, 4, or 16.) bVal ALIGN wVal bV l2 bVal2 ALIGN dVal dVal2 BYTE ? 2 WORD ? BYTE ? 4 DWORD ? DWORD ? ; 00404000 ; 00404002 ; 00404004 ; 00404008 ; 0040400C
57
PTR Operator
Overrides the default type of a label (variable). Provides the flexibility to access part of a variable. variable .data myDouble DWORD 12345678h .code mov o a ax,myDouble , y oub e mov ax,WORD PTR myDouble mov WORD PTR myDouble,4321h
To understand how this works, we need to know about little endian ordering of data in memory. memory
58
78 56 34 12
When integers are loaded from memory into registers registers, the bytes are automatically re-reversed into their correct positions.
59
12345678 5678
78 56
1234
34 12
; ; ; ; ;
AL AL AL AX AX
= = = = =
; ; ; ;
61
Your turn . . .
Write down the value of each destination operand: .data varB BYTE 65h,31h,02h,05h varW WORD 6543h 6543h,1202h 1202h varD DWORD 12345678h .code mov ax,WORD PTR [varB+2] mov bl,BYTE PTR varD mov bl,BYTE PTR [varW+2] mov ax ax,WORD WORD PTR [varD+2] mov eax,DWORD PTR varW
; ; ; ; ;
62
(1 of 2)
A data declaration spans multiple lines if each line (except the last) ends with a comma comma. The LENGTHOF and SIZEOF operators include all lines belonging to the declaration: .data array WORD 10,20, 30,40, 50,60 .code mov eax eax,LENGTHOF LENGTHOF array mov ebx,SIZEOF array
; 6 ; 12
63
(2 of 2)
In the following example, array identifies only the first WORD declaration. declaration Compare the values returned by LENGTHOF and SIZEOF here to those in the previous slide: .data array WORD 10,20 WORD 30,40 WORD 50,60
; 2 ; 4
64
LABEL Directive
Assigns an alternate label name and type to an existing storage location LABEL does not allocate any storage of its own; it is just an alias. j Removes the need for the PTR operator .data data dwList LABEL DWORD wordList o d st LABEL WORD O intList BYTE 00h,10h,00h,20h .code mov eax,dwList ; 20001000h mov cx,wordList ; 1000h mov dl dl,intList intList ; 00h
65
Indirect operands
(1 of 2)
An indirect operand holds the address of a variable, usually y an array y or string. g It can be dereferenced (j (just like a pointer). [reg] uses reg as pointer to access memory .data d val1 BYTE 10h,20h,30h .code code mov esi,OFFSET val1 mov al,[esi] ; dereference ESI (AL = 10h) inc esi mov al,[esi] l [ i] inc esi mov al,[esi]
; AL = 20h
; AL = 30h
66
Indirect operands
(2 of 2)
Use PTR when the size of a memory operand is ambiguous. .data myCount WORD 0 unable bl to t determine d t i th the size from the context .code mov esi esi,OFFSET OFFSET myCount inc [esi] ; error: ambiguous inc WORD PTR [esi] ; ok
67
WORD 1000h,2000h,3000h esi,OFFSET ax,[esi] esi,2 ax,[esi] esi,2 i 2 ax,[esi] arrayW ; or: add esi,TYPE arrayW ; i increment t ESI S b by 2 ; AX = sum of the array
68
Indexed operands
An indexed operand adds a constant to a register to generate an effective address. There are two notational forms: [label + reg] label[reg] .data d arrayW WORD 1000h,2000h,3000h .code code mov esi,0 mov ax,[arrayW y + esi] ; AX = 1000h mov ax,arrayW[esi] ; alternate format add esi,2 add dd ax,[arrayW [ W + esi] i] etc.
69
Index scaling
You can scale an indirect or indexed operand to the offset o set of o an a array a ay element. ele e t. This s is s do done e by multiplying ult ply g the index by the array's TYPE: .data arrayB BYTE 0,1,2,3,4,5 arrayW WORD 0 0,1,2,3,4,5 1 2 3 4 5 arrayD DWORD 0,1,2,3,4,5 .code code mov esi,4 mov al al,arrayB[esi*TYPE arrayB[esi*TYPE arrayB] mov bx,arrayW[esi*TYPE arrayW] mov edx edx,arrayD[esi*TYPE arrayD[esi*TYPE arrayD]
; 04 ; 0004 ; 00000004
70
Pointers
You can declare a pointer variable that contains the offset of another variable. variable .data arrayW WORD 1000h,2000h,3000h 1000 2000 3000 ptrW DWORD arrayW .code code mov esi,ptrW mov ax,[esi] ; AX = 1000h
71