15_AssemblyFunctions
15_AssemblyFunctions
Assembly Language:
Function Calls
1
Goals of this Lecture
2
Function Call Problems
(1) Calling and returning
• How does caller function jump to callee function?
• How does callee function jump back to the right place in caller
function?
(5) Optimization
• How do caller and callee function minimize memory access? 3
Running Example
4
Agenda
Calling and returning
Passing arguments
Storing local variables
Returning a value
Optimization
5
Problem 1: Calling and Returning
How does caller jump to callee?
• i.e., Jump to the address of the callee’s first instruction
How does the callee jump back to the right place in caller?
• i.e., Jump to the instruction immediately following the
most-recently-executed call instruction
… absadd(3L, -4L);
… 1 long absadd(long a, long b)
{
long absA, absB, sum;
absA = labs(a);
absB = labs(b);
2 sum = absA + absB;
return sum;
}
6
Attempted Solution: jmp Instruction
Attempted solution: caller and callee use jmp instruction
f: g:
… …
jmp g # Call g jmp fReturnPoint # Return
fReturnPoint:
…
7
Attempted Solution: jmp Instruction
Problem: callee may be called by multiple callers
f1: g:
… …
jmp g # Call g jmp ??? # Return
f1ReturnPoint:
…
f2:
…
jmp g # Call g
f2ReturnPoint:
…
8
Attempted Solution: Use Register
Attempted solution: Store return address in register
f1: g:
movq $f1ReturnPoint, %rax …
jmp g # Call g jmp *%rax # Return
f1ReturnPoint:
…
f2:
movq $f2ReturnPoint, %rax
Special form of
jmp g # Call g
jmp instruction
f2ReturnPoint:
…
9
Attempted Solution: Use Register
Problem: Cannot handle nested function calls
f:
Problem if f() calls g(),
movq $fReturnPoint, %rax
and g() calls h()
jmp g # Call g
fReturnPoint: Return address g() -> f()
…
is lost
g:
movq $gReturnPoint, %rax
h:
jmp h # Call h
…
gReturnPoint:
… jmp *%rax # Return
jmp *%rax # Return
10
x86-64 Solution: Use the Stack
Observations:
• May need to store many return addresses
• The number of nested function calls is not known in advance
• A return address must be saved for as long as the invocation of
this function is live, and discarded thereafter
• Stored return addresses are destroyed in
reverse order of creation
• f() calls g() ⇒ return addr for g is stored RIP for h
• g() calls h() ⇒ return addr for h is stored RIP for g
x86-64 solution:
• Use the STACK section of memory, usually accsesed via RSP
• Via call and ret instructions
11
call and ret Instructions
ret instruction “knows” the return address
f: h:
…
1
…
call h 2 ret
…
call g
4
… 3
5
g:
…
6 call h
…
ret
12
Stack operations
RSP (stack pointer) register
0
points to top of stack
Instruction Equivalent to
pushq src subq $8, %rsp
movq src, (%rsp)
RSP
13
Implementation of call
RIP (instruction pointer) register
0
points to next instruction
to be executed
Instruction Equivalent to
Note: Can’t really access
pushq src subq $8, %rsp RIP directly, but this is
movq src, (%rsp) implicitly what call does
popq dest movq (%rsp), dest
addq $8, %rsp
15
Implementation of ret
RIP (instruction pointer) register
0
points to next instruction
to be executed
Instruction Equivalent to
pushq src subq $8, %rsp
movq src, (%rsp)
17
Running Example
19
Problem 2: Passing Arguments
Problem:
• How does caller pass arguments to callee?
• How does callee accept parameters from caller?
20
x86-64 Solution 1: Use the Stack
21
x86-64 Solution 2: Use Registers
x86-64 solution:
• Pass first 6 (integer or address) arguments in registers for efficiency
• RDI, RSI, RDX, RCX, R8, R9
• More than 6 arguments ⇒
• Pass arguments 7, 8, … on the stack
• (Beyond scope of COS 217)
• Arguments are structures ⇒
• Pass arguments on the stack
• (Beyond scope of COS 217)
22
Running Example
# long absadd(long a, long b) 0
absadd:
pushq %rdi # Push a
pushq %rsi # Push b
24
Problem 3: Storing Local Variables
25
x86-64 Solution: Use the Stack
Observations (déjà vu again!):
• May need to store many local var sets
• The number of local var sets is not known in advance
• Local var set must be saved for as long as the invocation of this
function is live, and discarded thereafter
• Stored local var sets are destroyed in reverse order of creation
• LIFO data structure (stack) is appropriate
x86-64 solution:
• Use the STACK section of memory
• Or maybe not!
• See later this lecture
26
Running Example
# long absadd(long a, long b)
absadd: 0
pushq %rdi # Push a
pushq %rsi # Push b
28
Problem 4: Return Values
Problem:
• How does callee function send return value back to caller function?
• How does caller function access return value?
29
x86-64 Solution: Use RAX
In principle
• Store return value in stack frame of caller
x86-64 convention
• Integer or address:
• Store return value in RAX
• Floating-point number:
• Store return value in floating-point register
• (Beyond scope of COS 217)
• Structure:
• Store return value on stack
• (Beyond scope of COS 217)
30
Running Example
# long absadd(long a, long b)
absadd:
pushq %rdi # Push a 0
pushq %rsi # Push b
31
Agenda
Calling and returning
Passing arguments
Storing local variables
Returning a value
Optimization
32
Problem 5: Optimization
Observation: Accessing memory is expensive
• More expensive than accessing registers
• For efficiency, want to store parameters and local variables in
registers (and not in memory) when possible
33
x86-64 Solution: Register Conventions
Callee-save registers
• RBX, RBP, R12, R13, R14, R15
• Callee function must preserve contents
• If necessary…
• Callee saves to stack near beginning
• Callee restores from stack near end
Caller-save registers
• RDI, RSI, RDX, RCX, R8, R9, RAX, R10, R11
• Callee function can change contents
• If necessary…
• Caller saves to stack before call
• Caller restores from stack after call
34
Running Example
Local variable handling in unoptimized version:
• At beginning, absadd() allocates space for local variables
(absA, absB, sum) on stack
• Body of absadd() uses stack
• At end, absadd() pops local variables from stack
35
Running Example
# long absadd(long a, long b)
absadd:
pushq %r13 # Save R13, use for absA
pushq %r14 # Save R14, use for absB
absadd() stores local
pushq %r15 # Save R15, use for sum
vars in R13, R14, R15, not
# absA = labs(a)
pushq %rsi # Save RSI
in memory
call labs
movq %rax, %r13
popq %rsi # Restore RSI
absadd() cannot destroy
# absB += labs(b)
contents of R13, R14, R15
movq %rsi, %rdi
call labs
movq %rax, %r14
So absadd() must save
# sum = absA + absB
R13, R14, R15 near
movq %r13, %r15
addq %r14, %r15
beginning and restore near
# return sum
end
movq %r15, %rax
popq %r15 # Restore R15
popq %r14 # Restore R14
popq %r13 # Restore R13
ret
36
Running Example
Parameter handling in unoptimized version:
• absadd() accepts parameters (a and b) in RDI and RSI
• At beginning, absadd() copies contents of RDI and RSI to stack
• Body of absadd() uses stack
• At end, absadd() pops parameters from stack
37
Running Example
# long absadd(long a, long b)
absadd:
pushq %r13 # Save R13, use for absA
pushq %r14 # Save R14, use for absB
absadd() keeps a and b
pushq %r15 # Save R15, use for sum
in RDI and RSI, not in
# absA = labs(a)
pushq %rsi # Save RSI
memory
call labs
movq %rax, %r13
popq %rsi # Restore RSI
labs() can change RDI
# absB += labs(b)
and/or RSI
movq %rsi, %rdi
call labs
movq %rax, %r14
absadd() must retain
# sum = absA + absB
contents of RSI (value of b)
movq %r13, %r15
addq %r14, %r15
across 1st call of labs()
# return sum
movq %r15, %rax
So absadd() must save
popq %r15 # Restore R15
popq %r14 # Restore R14
RSI before call and restore
popq %r13 # Restore R13
ret
RSI after call
38
Non-Optimized vs. Optimized Patterns
Unoptimized pattern
• Parameters and local variables strictly in memory (stack) during
function execution
• Pro: Always possible
• Con: Inefficient
• gcc compiler uses when invoked without –O option
Optimized pattern
• Parameters and local variables mostly in registers during function
execution
• Pro: Efficient
• Con: Sometimes impossible
• More than 6 local variables
• Local variable is a structure or array
• Function computes address of parameter or local variable
• gcc compiler uses when invoked with –O option, when it can! 39
Hybrid Patterns
Hybrids are possible
• Example
• Parameters in registers
• Local variables in memory (stack)
Passing arguments
• Caller copies args to caller-saved registers (in prescribed order)
• Unoptimized pattern:
• Callee pushes args to stack
• Callee uses args as positive offsets from RSP
• Callee pops args from stack
• Optimized pattern:
• Callee keeps args in caller-saved registers
• Be careful!
41
Summary (cont.)
Returning values
• Callee places return value in RAX
• Caller accesses return value in RAX
42
Putting it all together
Add up the keys of a tree .text
struct tree { .globl sum
int key; sum:
struct tree *left; # LOCAL VARIABLES:
struct tree *right; # %r12=t, %r13d=partial sum
}; pushq %r12
pushq %r13
int sum (struct tree *t) { movq %rdi, %r12
if (t==NULL) cmpq $0, %r12
return 0; jne .L2
else return t->key + movl $0, %eax
sum(t->left) + jmp .L3
sum(t->right); .L2:
} movl 0(%r12), %r13d
movq 8(%r12), %rdi
call sum
t addl
movq
%eax, %r13d
16(%r12), %rdi
3 ////
4 //// call sum
addl %eax, %r13d
0 movl %r13d, %eax
0 .L3:
popq %r13
5 //// popq %r12
3 0
ret
0
4 5 43