16 MachineLang
16 MachineLang
16 MachineLang
Machine Language
1
Instruction Set Architecture (ISA)
There are many kinds of computer chips out there:
ARM
Intel x86 series Each of these different
IBM PowerPC “machine architectures”
understands a different
RISC-V machine language
MIPS
(and, in the old days, dozens more)
2
Machine Language
3
The Build Process
mypgm.c
Preprocess
Covered in COS 320:
mypgm.i
Compiling Techniques
Compile
mypgm.s
Assemble
mypgm.o libc.a Covered
here
Link
mypgm
4
Agenda
Machine Language: 1000 1011 0000 0011 0000 0000 0100 0001
5
AARCH64 Machine Language
Instruction formats
• Variety of ways different instructions are encoded
• We’ll go over quickly in class, to give you a flavor
• Refer to slides as reference for Assignment 5!
(Every instruction format you’ll need is in the following slides… we think…)
6
AARCH64 Instruction Format
msb: bit 31 lsb: bit 0
7
AARCH64 Instruction Format
msb: bit 31 lsb: bit 0
8
AARCH64 Instruction Format
msb: bit 31 lsb: bit 0
9
AARCH64 Instruction Format
msb: bit 31 lsb: bit 0
10
AARCH64 Instruction Format
msb: bit 31 lsb: bit 0
11
AARCH64 Instruction Format
msb: bit 31 lsb: bit 0
12
AARCH64 Instruction Format
msb: bit 31 lsb: bit 0
Branch
• Relative address of branch target in bits 0-25 for unconditional
branch (b) and function call (bl)
• Relative address of branch target in bits 5-23 for conditional branch
• Because all instructions are 32 bits long and are 4-byte aligned,
relative addresses end in 00. So, the values in the instruction must
be shifted left by 2 bits. This provides more range with fewer bits!
• Type of conditional branch encoded in bits 0-3
13
AARCH64 Instruction Format
msb: bit 31 lsb: bit 0
14
AARCH64 Instruction Format
msb: bit 31 lsb: bit 0
15
AARCH64 Instruction Format
msb: bit 31 lsb: bit 0
16
AARCH64 Instruction Format
msb: bit 31 lsb: bit 0
Load / store
• Instruction width in bits 30-31: 00 = 8-bit, 01 = 16-bit,
10 = 32-bit, 11 = 64-bit
• For [Xn,Xm] addressing mode: second source register in bits 16-20
• For [Xn,offset] addressing mode: offset in bits 10-21,
shifted left by 3 bits for 64-bit, 2 bits for 32-bit, 1 bit for 16-bit
• First source register in bits 5-9
• Destination register in bits 0-4
• Remaining bits encode additional information about instruction
17
AARCH64 Instruction Format
msb: bit 31 lsb: bit 0
18
AARCH64 Instruction Format
msb: bit 31 lsb: bit 0
19
AARCH64 Instruction Format
msb: bit 31 lsb: bit 0
20
AARCH64 Instruction Format
msb: bit 31 lsb: bit 0
21
AARCH64 Instruction Format
msb: bit 31 lsb: bit 0
22
Agenda
23
An Example Program
A simple (nonsensical) program, .section .rodata
in C and assembly: msg1: .string "Type a char: "
msg2: .string "Hi\n"
.section .text
.global main
main:
sub sp, sp, 16
#include <stdio.h> str x30, [sp]
int main(void)
{ printf("Type a char: "); adr x0, msg1
if (getchar() == 'A') bl printf
printf("Hi\n");
return 0; bl getchar
cmp w0, 'A'
}
bne skip
$ gcc217 -c detecta.s
$ objdump --full-contents --section .rodata detecta.o
0000000000000024 <skip>:
24: 52800000 mov w0, #0x0
28: f94003fe ldr x30, [sp]
2c: 910043ff add sp, sp, #0x10
30: d65f03c0 ret
26
Examining Machine Lang: TEXT
$ objdump --disassemble --reloc detecta.o Run objdump to see instructions
detecta.o: file format elf64-littleaarch64
0000000000000000 <main>:
0: d10043ff sub sp, sp, #0x10
4: f90003fe str x30, [sp] Machine
8: 10000000 adr x0, 0 <main>
8: R_AARCH64_ADR_PREL_LO21 .rodata language
c: 94000000 bl 0 <printf>
c: R_AARCH64_CALL26 printf
10: 94000000 bl 0 <getchar>
10: R_AARCH64_CALL26 getchar
14: 7101041f cmp w0, #0x41
18: 54000061 b.ne 24 <skip>
1c: 10000000 adr x0, 0 <main>
1c: R_AARCH64_ADR_PREL_LO21 .rodata+0xe
20: 94000000 bl 0 <printf>
20: R_AARCH64_CALL26 printf
0000000000000024 <skip>:
24: 52800000 mov w0, #0x0
28: f94003fe ldr x30, [sp]
2c: 910043ff add sp, sp, #0x10
30: d65f03c0 ret
27
Examining Machine Lang: TEXT
$ objdump --disassemble --reloc detecta.o Run objdump to see instructions
detecta.o: file format elf64-littleaarch64
0000000000000000 <main>:
0: d10043ff sub sp, sp, #0x10
4: f90003fe str x30, [sp]
8: 10000000 adr x0, 0 <main>
8: R_AARCH64_ADR_PREL_LO21 .rodata
c: 94000000 bl 0 <printf>
c: R_AARCH64_CALL26 printf
10: 94000000 bl 0 <getchar>
10: R_AARCH64_CALL26 getchar
14: 7101041f cmp w0, #0x41
18: 54000061 b.ne 24 <skip>
1c: 10000000 adr x0, 0 <main>
1c: R_AARCH64_ADR_PREL_LO21 .rodata+0xe Assembly
20: 94000000 bl 0 <printf>
20: R_AARCH64_CALL26 printf language
0000000000000024 <skip>:
24: 52800000 mov w0, #0x0
28: f94003fe ldr x30, [sp]
2c: 910043ff add sp, sp, #0x10
30: d65f03c0 ret
28
Examining Machine Lang: TEXT
$ objdump --disassemble --reloc detecta.o Run objdump to see instructions
detecta.o: file format elf64-littleaarch64
0000000000000000 <main>:
0: d10043ff sub sp, sp, #0x10
4: f90003fe str x30, [sp]
8: 10000000 adr x0, 0 <main>
8: R_AARCH64_ADR_PREL_LO21 .rodata
c: 94000000 bl 0 <printf>
c: R_AARCH64_CALL26 printf
10: 94000000 bl 0 <getchar> Relocation
10: R_AARCH64_CALL26 getchar
14: 7101041f cmp w0, #0x41 records
18: 54000061 b.ne 24 <skip>
1c: 10000000 adr x0, 0 <main>
1c: R_AARCH64_ADR_PREL_LO21 .rodata+0xe
20: 94000000 bl 0 <printf>
20: R_AARCH64_CALL26 printf
0000000000000024 <skip>:
24: 52800000 mov w0, #0x0
28: f94003fe ldr x30, [sp]
2c: 910043ff add sp, sp, #0x10
30: d65f03c0 ret
Let’s examine one line at a time… 29
sub sp, sp, #0x10
$ objdump --disassemble --reloc detecta.o
0000000000000000 <main>:
0: d10043ff sub sp, sp, #0x10
4: f90003fe str x30, [sp]
8: 10000000 adr x0, 0 <main>
8: R_AARCH64_ADR_PREL_LO21 .rodata
c: 94000000 bl 0 <printf>
c: R_AARCH64_CALL26 printf
10: 94000000 bl 0 <getchar>
10: R_AARCH64_CALL26 getchar
14: 7101041f cmp w0, #0x41
18: 54000061 b.ne 24 <skip>
1c: 10000000 adr x0, 0 <main>
1c: R_AARCH64_ADR_PREL_LO21 .rodata+0xe
20: 94000000 bl 0 <printf>
20: R_AARCH64_CALL26 printf
0000000000000024 <skip>:
24: 52800000 mov w0, #0x0
28: f94003fe ldr x30, [sp]
2c: 910043ff add sp, sp, #0x10
30: d65f03c0 ret
30
sub sp, sp, #0x10
msb: bit 31 0: d10043ff sub sp, sp, #0x10
lsb: bit 0
31
sub sp, sp, #0x10
msb: bit 31 0: d10043ff sub sp, sp, #0x10
lsb: bit 0
32
str x30, [sp]
$ objdump --disassemble --reloc detecta.o
0000000000000000 <main>:
0: d10043ff sub sp, sp, #0x10
4: f90003fe str x30, [sp]
8: 10000000 adr x0, 0 <main>
8: R_AARCH64_ADR_PREL_LO21 .rodata
c: 94000000 bl 0 <printf>
c: R_AARCH64_CALL26 printf
10: 94000000 bl 0 <getchar>
10: R_AARCH64_CALL26 getchar
14: 7101041f cmp w0, #0x41
18: 54000061 b.ne 24 <skip>
1c: 10000000 adr x0, 0 <main>
1c: R_AARCH64_ADR_PREL_LO21 .rodata+0xe
20: 94000000 bl 0 <printf>
20: R_AARCH64_CALL26 printf
0000000000000024 <skip>:
24: 52800000 mov w0, #0x0
28: f94003fe ldr x30, [sp]
2c: 910043ff add sp, sp, #0x10
30: d65f03c0 ret
33
str x30, [sp]
msb: bit 31 4: f90003fe str x30, [sp]
lsb: bit 0
34
adr x0, 0 <main>
$ objdump --disassemble --reloc detecta.o
0000000000000000 <main>:
0: d10043ff sub sp, sp, #0x10
4: f90003fe str x30, [sp]
8: 10000000 adr x0, 0 <main>
8: R_AARCH64_ADR_PREL_LO21 .rodata
c: 94000000 bl 0 <printf>
c: R_AARCH64_CALL26 printf
10: 94000000 bl 0 <getchar>
10: R_AARCH64_CALL26 getchar
14: 7101041f cmp w0, #0x41
18: 54000061 b.ne 24 <skip>
1c: 10000000 adr x0, 0 <main>
1c: R_AARCH64_ADR_PREL_LO21 .rodata+0xe
20: 94000000 bl 0 <printf>
20: R_AARCH64_CALL26 printf
0000000000000024 <skip>:
24: 52800000 mov w0, #0x0
28: f94003fe ldr x30, [sp]
2c: 910043ff add sp, sp, #0x10
30: d65f03c0 ret
35
adr x0, 0 <main>
msb: bit 31 8: 10000000 adr x0, 0 <main>
lsb: bit 0
0000000000000000 <main>:
0: d10043ff sub sp, sp, #0x10
4: f90003fe str x30, [sp]
8: 10000000 adr x0, 0 <main>
8: R_AARCH64_ADR_PREL_LO21 .rodata
c: 94000000 bl 0 <printf>
c: R_AARCH64_CALL26 printf
10: 94000000 bl 0 <getchar>
10: R_AARCH64_CALL26 getchar
14: 7101041f cmp w0, #0x41
18: 54000061 b.ne 24 <skip>
1c: 10000000 adr x0, 0 <main>
1c: R_AARCH64_ADR_PREL_LO21 .rodata+0xe
20: 94000000 bl 0 <printf>
20: R_AARCH64_CALL26 printf
0000000000000024 <skip>:
24: 52800000 mov w0, #0x0
28: f94003fe ldr x30, [sp]
2c: 910043ff add sp, sp, #0x10
30: d65f03c0 ret
37
Relocation Record 1
8: R_AARCH64_ADR_PREL_LO21 .rodata
This part is always the same,
it’s the name of the machine architecture!
Dear Linker,
Sincerely,
Assembler
38
bl 0 <printf>
$ objdump --disassemble --reloc detecta.o
0000000000000000 <main>:
0: d10043ff sub sp, sp, #0x10
4: f90003fe str x30, [sp]
8: 10000000 adr x0, 0 <main>
8: R_AARCH64_ADR_PREL_LO21 .rodata
c: 94000000 bl 0 <printf>
c: R_AARCH64_CALL26 printf
10: 94000000 bl 0 <getchar>
10: R_AARCH64_CALL26 getchar
14: 7101041f cmp w0, #0x41
18: 54000061 b.ne 24 <skip>
1c: 10000000 adr x0, 0 <main>
1c: R_AARCH64_ADR_PREL_LO21 .rodata+0xe
20: 94000000 bl 0 <printf>
20: R_AARCH64_CALL26 printf
0000000000000024 <skip>:
24: 52800000 mov w0, #0x0
28: f94003fe ldr x30, [sp]
2c: 910043ff add sp, sp, #0x10
30: d65f03c0 ret
39
bl 0 <printf>
msb: bit 31 c: 94000000 bl 0 <printf>
lsb: bit 0
40
R_AARCH64_CALL26 printf
$ objdump --disassemble --reloc detecta.o
0000000000000000 <main>:
0: d10043ff sub sp, sp, #0x10
4: f90003fe str x30, [sp]
8: 10000000 adr x0, 0 <main>
8: R_AARCH64_ADR_PREL_LO21 .rodata
c: 94000000 bl 0 <printf>
c: R_AARCH64_CALL26 printf
10: 94000000 bl 0 <getchar>
10: R_AARCH64_CALL26 getchar
14: 7101041f cmp w0, #0x41
18: 54000061 b.ne 24 <skip>
1c: 10000000 adr x0, 0 <main>
1c: R_AARCH64_ADR_PREL_LO21 .rodata+0xe
20: 94000000 bl 0 <printf>
20: R_AARCH64_CALL26 printf
0000000000000024 <skip>:
24: 52800000 mov w0, #0x0
28: f94003fe ldr x30, [sp]
2c: 910043ff add sp, sp, #0x10
30: d65f03c0 ret
41
Relocation Record 2
c: R_AARCH64_CALL26 printf
Dear Linker,
Sincerely,
Assembler
42
bl 0 <getchar>
$ objdump --disassemble --reloc detecta.o
0000000000000000 <main>:
0: d10043ff sub sp, sp, #0x10
4: f90003fe str x30, [sp]
8: 10000000 adr x0, 0 <main>
8: R_AARCH64_ADR_PREL_LO21 .rodata
c: 94000000 bl 0 <printf>
c: R_AARCH64_CALL26 printf
10: 94000000 bl 0 <getchar>
10: R_AARCH64_CALL26 getchar
14: 7101041f cmp w0, #0x41
18: 54000061 b.ne 24 <skip>
1c: 10000000 adr x0, 0 <main>
1c: R_AARCH64_ADR_PREL_LO21 .rodata+0xe
20: 94000000 bl 0 <printf>
20: R_AARCH64_CALL26 printf
0000000000000024 <skip>:
24: 52800000 mov w0, #0x0
28: f94003fe ldr x30, [sp]
2c: 910043ff add sp, sp, #0x10
30: d65f03c0 ret
43
bl 0 <getchar>
msb: bit 31 10: 94000000 bl 0 <getchar>
lsb: bit 0
44
Relocation Record 3
10: R_AARCH64_CALL26 getchar
Dear Linker,
Sincerely,
Assembler
45
cmp w0, #0x41
$ objdump --disassemble --reloc detecta.o
0000000000000000 <main>:
0: d10043ff sub sp, sp, #0x10
4: f90003fe str x30, [sp]
8: 10000000 adr x0, 0 <main>
8: R_AARCH64_ADR_PREL_LO21 .rodata
c: 94000000 bl 0 <printf>
c: R_AARCH64_CALL26 printf
10: 94000000 bl 0 <getchar>
10: R_AARCH64_CALL26 getchar
14: 7101041f cmp w0, #0x41
18: 54000061 b.ne 24 <skip>
1c: 10000000 adr x0, 0 <main>
1c: R_AARCH64_ADR_PREL_LO21 .rodata+0xe
20: 94000000 bl 0 <printf>
20: R_AARCH64_CALL26 printf
0000000000000024 <skip>:
24: 52800000 mov w0, #0x0
28: f94003fe ldr x30, [sp]
2c: 910043ff add sp, sp, #0x10
30: d65f03c0 ret
46
cmp w0, #0x41
msb: bit 31 14: 7101041f cmp w0, #0x41
lsb: bit 0
47
b.ne 24 <skip>
$ objdump --disassemble --reloc detecta.o
0000000000000000 <main>:
0: d10043ff sub sp, sp, #0x10
4: f90003fe str x30, [sp]
8: 10000000 adr x0, 0 <main>
8: R_AARCH64_ADR_PREL_LO21 .rodata
c: 94000000 bl 0 <printf>
c: R_AARCH64_CALL26 printf
10: 94000000 bl 0 <getchar>
10: R_AARCH64_CALL26 getchar
14: 7101041f cmp w0, #0x41
18: 54000061 b.ne 24 <skip>
1c: 10000000 adr x0, 0 <main>
1c: R_AARCH64_ADR_PREL_LO21 .rodata+0xe
20: 94000000 bl 0 <printf>
20: R_AARCH64_CALL26 printf
0000000000000024 <skip>:
24: 52800000 mov w0, #0x0
28: f94003fe ldr x30, [sp]
2c: 910043ff add sp, sp, #0x10
30: d65f03c0 ret
48
b.ne 24 <skip>
msb: bit 31 18: 54000061 b.ne 24 <skip>
lsb: bit 0
0000000000000000 <main>:
0: d10043ff sub sp, sp, #0x10
4: f90003fe str x30, [sp]
8: 10000000 adr x0, 0 <main>
8: R_AARCH64_ADR_PREL_LO21 .rodata
c: 94000000 bl 0 <printf>
c: R_AARCH64_CALL26 printf
10: 94000000 bl 0 <getchar>
10: R_AARCH64_CALL26 getchar
14: 7101041f cmp w0, #0x41
18: 54000061 b.ne 24 <skip>
1c: 10000000 adr x0, 0 <main>
1c: R_AARCH64_ADR_PREL_LO21 .rodata+0xe
20: 94000000 bl 0 <printf>
20: R_AARCH64_CALL26 printf
0000000000000024 <skip>:
24: 52800000 mov w0, #0x0
28: f94003fe ldr x30, [sp]
2c: 910043ff add sp, sp, #0x10
30: d65f03c0 ret
50
Relocation Record 4
1c: R_AARCH64_ADR_PREL_LO21 .rodata+0xe
Dear Linker,
Sincerely,
Assembler
51
Another printf, with relocation record…
$ objdump --disassemble --reloc detecta.o
0000000000000000 <main>:
0: d10043ff sub sp, sp, #0x10
4: f90003fe str x30, [sp]
8: 10000000 adr x0, 0 <main>
8: R_AARCH64_ADR_PREL_LO21 .rodata
c: 94000000 bl 0 <printf>
c: R_AARCH64_CALL26 printf
10: 94000000 bl 0 <getchar>
10: R_AARCH64_CALL26 getchar
14: 7101041f cmp w0, #0x41
18: 54000061 b.ne 24 <skip>
1c: 10000000 adr x0, 0 <main>
1c: R_AARCH64_ADR_PREL_LO21 .rodata+0xe
20: 94000000 bl 0 <printf>
20: R_AARCH64_CALL26 printf
0000000000000024 <skip>:
24: 52800000 mov w0, #0x0
28: f94003fe ldr x30, [sp]
2c: 910043ff add sp, sp, #0x10
30: d65f03c0 ret
52
mov w0, #0x0
$ objdump --disassemble --reloc detecta.o
0000000000000000 <main>:
0: d10043ff sub sp, sp, #0x10
4: f90003fe str x30, [sp]
8: 10000000 adr x0, 0 <main>
8: R_AARCH64_ADR_PREL_LO21 .rodata
c: 94000000 bl 0 <printf>
c: R_AARCH64_CALL26 printf
10: 94000000 bl 0 <getchar>
10: R_AARCH64_CALL26 getchar
14: 7101041f cmp w0, #0x41
18: 54000061 b.ne 24 <skip>
1c: 10000000 adr x0, 0 <main>
1c: R_AARCH64_ADR_PREL_LO21 .rodata+0xe
20: 94000000 bl 0 <printf>
20: R_AARCH64_CALL26 printf
0000000000000024 <skip>:
24: 52800000 mov w0, #0x0
28: f94003fe ldr x30, [sp]
2c: 910043ff add sp, sp, #0x10
30: d65f03c0 ret
53
mov w0, #0x0
msb: bit 31 24: 52800000 mov w0, #0x0
lsb: bit 0
54
Everything Else is Similar…
$ objdump --disassemble --reloc detecta.o
0000000000000000 <main>:
0: d10043ff sub sp, sp, #0x10 Exercise for you:
4: f90003fe str x30, [sp]
8: 10000000 adr x0, 0 <main> using information
8: R_AARCH64_ADR_PREL_LO21 .rodata from these slides,
c: 94000000 bl 0 <printf>
c: R_AARCH64_CALL26 printf create a bitwise
10: 94000000 bl 0 <getchar> breakdown of
10: R_AARCH64_CALL26 getchar
14: 7101041f cmp w0, #0x41 these instructions,
18: 54000061 b.ne 24 <skip> and convince yourself
1c: 10000000 adr x0, 0 <main>
1c: R_AARCH64_ADR_PREL_LO21 .rodata+0xe that the hex values
20: 94000000 bl 0 <printf> are correct!
20: R_AARCH64_CALL26 printf
0000000000000024 <skip>:
24: 52800000 mov w0, #0x0
28: f94003fe ldr x30, [sp]
2c: 910043ff add sp, sp, #0x10
30: d65f03c0 ret
55
Agenda
56
From Assembler to Linker
57
Linker Resolution
Resolution
• Linker resolves references
58
Linker Relocation
Relocation
• Linker patches (“relocates”) code
• Linker traverses relocation records, patching code as specified
59
Examining Machine Lang: RODATA
RODATA is at 0x400710
Addresses, Starts with some header info
not offsets Real start of RODATA is at 0x400720
"Type a char: " starts at 0x400720
"Hi\n" starts at 0x40072e
60
Examining Machine Lang: TEXT
$ objdump --disassemble --reloc detecta Run objdump to see instructions
detecta: file format elf64-littleaarch64
...
0000000000400650 <main>:
400650: d10043ff sub sp, sp, #0x10
400654: f90003fe str x30, [sp]
400658: 10000640 adr x0, 400720 <msg1>
40065c: 97ffffa1 bl 4004e0 <printf@plt>
400660: 97ffff9c bl 4004d0 <getchar@plt>
400664: 7101041f cmp w0, #0x41
400668: 54000061 b.ne 400674 <skip>
40066c: 50000600 adr x0, 40072e <msg2>
400670: 97ffff9c bl 4004e0 <printf@plt>
0000000000400674 <skip>:
400674: 52800000 mov w0, #0x0
400678: f94003fe ldr x30, [sp]
40067c: 910043ff add sp, sp, #0x10
400680: d65f03c0 ret
Addresses,
not offsets
61
Examining Machine Lang: TEXT
$ objdump --disassemble --reloc detecta
...
Additional code
0000000000400650 <main>:
400650: d10043ff sub sp, sp, #0x10
400654: f90003fe str x30, [sp]
400658: 10000640 adr x0, 400720 <msg1>
40065c: 97ffffa1 bl 4004e0 <printf@plt>
400660: 97ffff9c bl 4004d0 <getchar@plt>
400664: 7101041f cmp w0, #0x41
400668: 54000061 b.ne 400674 <skip>
40066c: 50000600 adr x0, 40072e <msg2>
400670: 97ffff9c bl 4004e0 <printf@plt>
0000000000400674 <skip>:
400674: 52800000 mov w0, #0x0
400678: f94003fe ldr x30, [sp]
40067c: 910043ff add sp, sp, #0x10
400680: d65f03c0 ret
62
Examining Machine Lang: TEXT
$ objdump --disassemble --reloc detecta
...
0000000000400650 <main>:
400650: d10043ff sub sp, sp, #0x10
400654: f90003fe str x30, [sp]
400658: 10000640 adr x0, 400720 <msg1>
40065c: 97ffffa1 bl 4004e0 <printf@plt>
400660: 97ffff9c bl 4004d0 <getchar@plt>
400664: 7101041f cmp w0, #0x41
400668: 54000061 b.ne 400674 <skip>
40066c: 50000600 adr x0, 40072e <msg2>
400670: 97ffff9c bl 4004e0 <printf@plt>
63
adr x0, 400720 <msg1>
$ objdump --disassemble --reloc detecta
...
0000000000400650 <main>:
400650: d10043ff sub sp, sp, #0x10
400654: f90003fe str x30, [sp]
400658: 10000640 adr x0, 400720 <msg1>
40065c: 97ffffa1 bl 4004e0 <printf@plt>
400660: 97ffff9c bl 4004d0 <getchar@plt>
400664: 7101041f cmp w0, #0x41
400668: 54000061 b.ne 400674 <skip>
40066c: 50000600 adr x0, 40072e <msg2>
400670: 97ffff9c bl 4004e0 <printf@plt>
0000000000400674 <skip>:
400674: 52800000 mov w0, #0x0
400678: f94003fe ldr x30, [sp]
40067c: 910043ff add sp, sp, #0x10
400680: d65f03c0 ret
64
adr x0, 400720 <msg1>
msb: bit 31 400658: 10000640 adr x0, 400720 <msg1>
lsb: bit 0
65
bl 4004e0 <printf@plt>
$ objdump --disassemble --reloc detecta
...
0000000000400650 <main>:
400650: d10043ff sub sp, sp, #0x10
400654: f90003fe str x30, [sp]
400658: 10000640 adr x0, 400720 <msg1>
40065c: 97ffffa1 bl 4004e0 <printf@plt>
400660: 97ffff9c bl 4004d0 <getchar@plt>
400664: 7101041f cmp w0, #0x41
400668: 54000061 b.ne 400674 <skip>
40066c: 50000600 adr x0, 40072e <msg2>
400670: 97ffff9c bl 4004e0 <printf@plt>
0000000000400674 <skip>:
400674: 52800000 mov w0, #0x0
400678: f94003fe ldr x30, [sp]
40067c: 910043ff add sp, sp, #0x10
400680: d65f03c0 ret
66
bl 4004e0 <printf@plt>
msb: bit 31 40065c: 97ffffa1 bl 4004e0 <printf@plt>
lsb: bit 0
67
Everything Else is Similar…
$ objdump --disassemble --reloc detecta
...
0000000000400650 <main>:
400650: d10043ff sub sp, sp, #0x10
400654: f90003fe str x30, [sp]
400658: 10000640 adr x0, 400720 <msg1>
40065c: 97ffffa1 bl 4004e0 <printf@plt>
400660: 97ffff9c bl 4004d0 <getchar@plt>
400664: 7101041f cmp w0, #0x41
400668: 54000061 b.ne 400674 <skip>
40066c: 50000600 adr x0, 40072e <msg2>
400670: 97ffff9c bl 4004e0 <printf@plt>
0000000000400674 <skip>:
400674: 52800000 mov w0, #0x0
400678: f94003fe ldr x30, [sp]
40067c: 910043ff add sp, sp, #0x10
400680: d65f03c0 ret
68
Agenda
69
A Program
#include <stdio.h>
int main(void)
{
char name[12], c;
int i = 0, magic = 42;
printf("What is your name?\n");
while ((c = getchar()) != '\n')
name[i++] = c;
name[i] = '\0';
printf("Thank you, %s.\n", name);
printf("The answer to life, the universe, "
"and everything is %d\n", magic);
return 0;
}
$ ./a.out
What is your name?
John Smith
Thank you, John Smith.
The answer to life, the universe, and everything is 42
70
Why People With Long Names Have Prob
#include <stdio.h>
int main(void)
{
char name[12], c;
int i = 0, magic = 42;
printf("What is your name?\n");
while ((c = getchar()) != '\n')
name[i++] = c;
name[i] = '\0';
printf("Thank you, %s.\n", name);
printf("The answer to life, the universe, "
"and everything is %d\n", magic);
return 0;
}
$ ./a.out
What is your name?
Szymon Rusinkiewicz ???!!?!
Thank you, Szymon Rusinkie ?
icz.
The answer to life, the universe, and everything is 8020841
71
Explanation: Stack Frame Layout
When there are too many characters,
0
program carelessly writes beyond
space “belonging” to name.
• Overwrites other variables
• This is a buffer overrun, or stack smash SP
Return addr
• The program has a security bug!
name
.
.
#include <stdio.h> .
int main(void)
{ c
char name[12], c;
int i = 0, magic = 42; magic
printf("What is your name?\n");
while ((c = getchar()) != '\n') i
name[i++] = c; Old SP
name[i] = '\0';
printf("Thank you, %s.\n", name);
printf("The answer to life, the universe, "
"and everything is %d\n", magic);
return 0;
}
72
It Gets Worse…
Buffer overrun can overwrite return
0
address of a previous stack frame!
SP
Return addr
name
.
.
#include <stdio.h> .
int main(void)
{ c
char name[12], c;
int i = 0, magic = 42; magic
printf("What is your name?\n");
while ((c = getchar()) != '\n') i
name[i++] = c; Old SP Return addr
name[i] = '\0';
printf("Thank you, %s.\n", name);
printf("The answer to life, the universe, "
"and everything is %d\n", magic);
return 0;
}
73
It Gets Worse…
Buffer overrun can overwrite return
0
address of a previous stack frame!
• Value can be an invalid address,
leading to a segfault,…
SP
Return addr
name
.
.
#include <stdio.h> .
int main(void)
{ c
char name[12], c;
int i = 0, magic = 42; magic
printf("What is your name?\n");
while ((c = getchar()) != '\n') i
name[i++] = c; Old SP 0x0042
name[i] = '\0';
printf("Thank you, %s.\n", name);
printf("The answer to life, the universe, "
"and everything is %d\n", magic);
return 0;
}
74
It Gets Much, Much Worse…
Buffer overrun can overwrite return
0
address of a previous stack frame!
• Value can be an invalid address,
leading to a segfault, or it can cleverly
point to malicious code
SP
Return addr
name
.
.
#include <stdio.h> .
int main(void)
{ c
char name[12], c;
int i = 0, magic = 42; magic
printf("What is your name?\n");
while ((c = getchar()) != '\n') i
name[i++] = c; Old SP
name[i] = '\0';
printf("Thank you, %s.\n", name);
printf("The answer to life, the universe, " Malicious
"and everything is %d\n", magic);
return 0;
code here...
} .bss
Or here... 75
Attacking a Web Server
URLs
for(i=0;p[i];i++)
Input in web forms search[i]=p[i];
76
Attacking a Web Browser
HTML keywords
Images for(i=0;p[i];i++)
gif[i]=p[i];
Image names
URLs
etc. Web Server
Client PC
www.badguy.com
@ badguy.com
77
Attacking Everything in Sight
for(i=0;p[i];i++)
gif[i]=p[i];
The Internet
Client PC
@ badguy.com
E-mail client
PDF viewer
Operating-system kernel
TCP/IP stack
Any application that ever sees input directly from the outside
78
Defenses Against This Attack
Best: program in languages that make
array-out-of-bounds impossible (Java, C#,
ML, python, ....)
$ ./grader
What is your name?
Bob
D is your grade.
Thank you, Bob.
$ ./grader
What is your name?
Andrew Appel
B is your grade.
Thank you, Andrew Appel. 81
Asgt. 5: Attack the “Grader” Program
int main(void) {
getname();
if (strcmp(name, "Andrew Appel") == 0)
grade = 'B';
printf("%c is your grade.\n", grade);
printf("Thank you, %s.\n", name);
return 0;
}
$ ./grader
What is your name?
Bob\0(#@&$%*#&(*^!@%*!!(&#$%(@*
B is your grade.
Thank you, Bob.
$ ./grader
What is your name?
Susan\0?!*!????*???!*!%!?!(!*%(*^^?
A is your grade.
Thank you, Susan. 82
Summary
AARCH64 Machine Language
• 32-bit instructions
• Formats have conventional locations for opcodes, registers, etc.
Assembler
• Reads assembly language file
• Generates TEXT, RODATA, DATA, BSS sections
• Containing machine language code
• Generates relocation records
• Writes object (.o) file
Linker
• Reads object (.o) file(s)
• Does resolution: resolves references to make code complete
• Does relocation: traverses relocation records to patch code
• Writes executable binary file
83