DPTX 2013 1 11320 0 378635 0 143329
DPTX 2013 1 11320 0 378635 0 143329
DPTX 2013 1 11320 0 378635 0 143329
Jakub Břečka
Prague 2016
I declare that I carried out this master thesis independently, and only with the
cited sources, literature and other professional sources.
I understand that my work relates to the rights and obligations under the Act
No. 121/2000 Sb., the Copyright Act, as amended, in particular the fact that the
Charles University has the right to conclude a license agreement on the use of this
work as a school work pursuant to Section 60 subsection 1 of the Copyright Act.
i
ii
Title: A Decompiler for Objective-C
Abstract:
iii
iv
I would like to thank my wife-to-be, Michaela, for her endless support, and my
supervisor, Jakub Yaghob, for his patience, advice and help with this thesis.
v
vi
Contents
Introduction 5
1 Background 7
1.1 Purposes of software reverse engineering . . . . . . . . . . . . . . 7
1.2 Existing disassemblers . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Existing decompilers . . . . . . . . . . . . . . . . . . . . . . . . . 11
1
4.3 Data flow analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.1 Value definitions and uses . . . . . . . . . . . . . . . . . . 48
4.3.2 SSA form . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.3 Value propagation . . . . . . . . . . . . . . . . . . . . . . 51
4.3.4 Constant folding . . . . . . . . . . . . . . . . . . . . . . . 52
4.3.5 Dead code elimination . . . . . . . . . . . . . . . . . . . . 52
4.4 AST generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4.1 AST transformations . . . . . . . . . . . . . . . . . . . . . 54
4.4.2 Printing the source code . . . . . . . . . . . . . . . . . . . 55
2
7 Evaluation 99
7.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8 Conclusion 103
8.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
9 References 105
3
4
Introduction
In recent years, reverse engineering has become a widely popular branch of soft-
ware engineering. Main uses of reverse engineering include security analysis of
closed-source products or the need to debug code without source code. While dis-
assembling an unknown binary certainly can be hard and no current disassembler
produces a 100% perfect result every time, when we deal with compiler-generated
(and not deliberately obfuscated) code, the problem is considered solved, or at
least solved for the vast majority of common cases.
Decompilation, on the other hand, is still a problem that resists being
solved in a general way, even for relatively simple programs. Some high-level
programming languages, for example Java, do not compile to machine code and
are, therefore, significantly easier to decompile, because the binary preserves a lot
of information about functions, variables and types. That is not the case for C
and C++, which leave very few traces for a decompiler to use.
Another programming language that is similar to C and C++, is Objective-C
(Obj-C for short), which has become very popular recently due to the rise of the
iOS and OS X platforms. Since Objective-C is a strict superset of C, all the
difficulties in decompilation of C are inherited as well, at least in the general
case. There are, however, specifics of the Objective-C language that cause more
information about the original source code to be preserved. These could either be
directly used or more easily reconstructed by a heuristic analysis. This includes
the dynamic nature of the language, its runtime, and the fact that common Obj-C
source codes typically does not resemble pure C, but uses high-level language
constructs heavily.
In this thesis, we will be analyzing how the general problem of decompilation
changes when we focus on the Objective-C language, the OS X and iOS platforms
and the AArch64 and x86 architectures (these are the only supported CPU archi-
tectures on OS X and iOS). The explicit goal of this thesis is to create a tool for
decompiling binary Objective-C programs compiled for these architectures. The
tool will be used primarily as an aid in manual malware analysis and implemen-
tation of security software – the ultimate motivation is to allow an independent
researcher to analyze and verify an implementation of a closed-source system, for
example to search for a hidden backdoor in a security-critical product.
The thesis is divided into several chapters. The first chapter, Background,
provides the necessary overview of reverse engineering, disassembling and decom-
pilation, currently available tools and techniques and discusses how well they work.
The second chapter, The Objective-C Language and Runtime, deals with
the external and internal features of Objective-C and how they could help de-
compilation. The next chapter, Low-level Reverse Engineering Techniques
describes what tasks are needed to build a low-level part of a reverse-engineering
5
tool. The fourth chapter, High-level Code Reconstruction, describes the
major decompilation challenges and what individual problems must be solved
to implement a decompiler. The next chapter, Objective-C Specific Decom-
pilation, focuses on the Objective-C language from the point of view of the
decompiler and describes what features can help decompilation as well as what
new problems we face. The sixth chapter, The “Cricket” Objective-C De-
compiler, introduces our actual implementation of an Objective-C decompilation
tool and describes the major architecture and design points. The seventh chapter,
Evaluation, compares our expectations to the actual results of our decompiler.
The last chapter, Conclusion, evaluates how well the set goals were achieved
and discusses possible future work.
6
Chapter 1
Background
This chapter focuses on providing the reader with necessary background to under-
stand what is the current state of the reverse engineering industry, what goals we
are trying to achieve and what is possible and what is not.
In this chapter and in the whole thesis, we will only consider languages,
compilers, and tools that work on and compile to physical CPU instructions. This
is how most software written in C, C++ and Objective-C is distributed. We will
not describe scripting languages that are interpreted or languages that distribute
their code in bytecode format (e.g. Java).
• All software has bugs. Without access to the source code, debugging is
extremely hard.
• Security and privacy related software often makes promises about their
behavior, for example that it will not have backdoors, it will store the data
in an encrypted way only, or that it will not leak user’s data to the vendor’s
servers. Verifying these properties in closed-source programs is extremely
difficult.
7
• Some platforms impose a limitation on what APIs are allowed to be used,
even when this is technically not enforced. The source code would immedi-
ately show what APIs are called.
• Malware needs to be analyzed by security researchers to correctly assess
its impact and level of threat. This task, which is often manual, would be
greatly simplified and sped up when done on source-code level rather than
on the level of machine instructions.
Furthermore, there are valid use cases when even with the source code of a
program, it is beneficial to analyze the binary form using reverse engineering
techniques:
• Compiler engineers often need to analyze the compiled output of a compiler
to verify that the optimizations used are correct.
• Designing a software protection scheme (DRM, copy protection, etc.) re-
quires knowing what the attacker can and will do.
The area of defending against reverse engineering is out of scope of this thesis,
but it is covered in great detail in [1], which describes various techniques from
code obfuscation, tamperproofing, software watermarking, plagiarism detection,
birthmarking and hardware-based protection.
For our purposes, we will only focus on compiler-generated code that has not
been intentionally obfuscated.
8
This shows the list of instructions from the function called rl vi put in the
libreadline shared library, which is compiled for the x86-64 architecture. The
complete reference and manual for the instruction set of x86-64 is available on
Intel’s website as Intel® 64 and IA-32 Architectures Software Developer Manuals
[2], which also contains a detailed description of the architecture itself. For
assembly listings, we will always use the so-called “Intel syntax” (which is also
used in Intel reference manuals).
The standard command-line disassembler tools provide a quick way to get
a complete disassembly of a binary, but they have many limitations, do not
provide any non-trivial analysis and they are unsuitable for a more serious reverse
engineering work. IDA – Interactive Disassembler (formerly named IDA Pro) is
a commercial product that is an interactive disassembler, but it also offers a
full reverse engineering IDE [3].
It has some very advanced features and it performs various analyses on the
disassembled programs to discover individual functions, global variables, stack
variables, function parameters and their types. It understands a lot of idioms that
are commonly used on the platforms it supports and decodes them. One of the
most useful features is the ability to get a list of cross-references, which shows all
the places (both code and data) where a symbol is referenced from. This allows
the engineer, for example, to quickly find all callers of a function, or to find which
code uses a particular string constant.
Another powerful feature is the ability to view graphs of basic blocks. This
visualizes the control flow of a function gives a good overview of how complex the
function is.
IDA supports dozens of processor architectures and platforms, and due to its
long history and success, it is well known among reverse engineers.
Hopper is a newer interactive disassembler that focuses on x86 and ARM
architectures [4]. It shares many of the same features with IDA, but it comes at a
much more available price.
Both IDA and Hopper are end-user products, but another important set of
9
Figure 1.2: A graph of basic blocks in IDA
CODE = b"\x55\x48\x8b\x05\xb8\x13\x00\x00"
md = Cs(CS_ARCH_X86, CS_MODE_64)
for i in md.disasm(CODE, 0x1000):
print("0x%x:\t%s\t%s" %(i.address, i.mnemonic, i.op_str))
11
[self.HTTPBodyParts addObject:bodyPart];
}
The decompiled output detected method calls, but their arguments are wrong
(addObject is called with 0x0 as a parameter). However, in general, Hopper gives
a decent overview of a function in pseudo-code, and it will later be used as a
competitor product for a comparison with our implementation of an Objective-C
specific decompiler.
12
Chapter 2
13
// Objective-C:
@interface MyClass
- (void)shutdown { ... }
- (int)randomNumberUpTo:(int)maximalNumber { ... }
- (int)divideNumber:(int)numerator byNumber:(int)denominator { ... }
+ (id)singletonInstance { ... }
@end
// C++:
class MyClass {
public:
void shutdown() { ... }
int randomNumberUpTo(int maximalNumber) { ... }
int divideNumberByNumber(int numerator, int denominator) { ... }
static MyClass *singletonInstance() { ... }
};
// C++:
this->shutdown();
int x = this->randomNumberUpTo(42);
int result = this->divideNumberByNumber(42, 6);
MyClass *instance = MyClass::singletonInstance();
void *o = NSObject::alloc().init();
14
(static method), where the receiver is the class object itself. The last example
shows a nested call, where the result of [NSObject alloc] is used as a receiver
for the call to the init method.
Note that all of this can be still mixed with C code, and it is not unusual to
do that. For example the probably most common function, NSLog, is a C function
that can print Obj-C objects into the console output:
• The BOOL type is a standard boolean type with values YES or NO.
• When dealing with objects, nil is the keyword equivalent to the NULL value
in C.
• Inside a method implementation, self refers to the current object, super
refers to the superclass object.
• id is the superclass of all objects.
// Cat.h
@interface Cat : Animal // class name and superclass
@end
15
The class has a name (Cat), a superclass (Animal), properties (name and
numberOfLives), and methods. Furthermore, a class can have instance vari-
ables (ivars). Defining a property automatically generates a hidden instance
variable, which is used as a backing store for the value of the variable. The
initWithName: method is an initializer.
An implementation file, for example Cat.m, contains the source code of the
defined methods:
// Cat.m
@implementation Cat
- (id)initWithName:(NSString *)name {
self = [super init];
if (self) {
self.name = name;
self.numberOfLives = 9; // a cat has 9 lives, initially
}
return self;
}
- (void)jump {
NSLog(@"%@ has jumped", self.name);
if (rand() % 6 == 0) {
// this jump did not work out as planned
self.numberOfLives = self.numberOfLives - 1;
}
}
@end
16
- (void)printAllElementsFromStringArray:(NSArray *)array {
for (NSString *element in array) {
NSLog(@"%@", element);
}
}
2.1.3 Blocks
Blocks (or dispatch blocks) are an Objective-C language extension, which
implements closures capturing the lexical context from the place where they are
defined, very similar to C++11 lambda functions. Blocks are defined by using the
caret symbol (^) and arguments in parentheses, for example:
// Objective-C:
dispatch_block_t my_block = ^() { NSLog(@"hello from block"); };
// C++:
auto my_lambda = [] { NSLog(@"hello from C++ lambda"); }
The dispatch block t type represents a block with no arguments and a void
return type, but in fact it is just a typedef for the real block type syntax, which is
similar to C function pointers:
typedef void (^dispatch_block_t)(void); // no-input void-returning block
typedef void (*func_ptr_t)(void); // no-input void-returning C function
For a full comparison of blocks with C++ lambda functions, see [16].
Blocks can have different arguments and return types. Invoking a block is
done by using parentheses on the variable that contains a reference to the block:
typedef void (^int_arg_block_t)(int);
typedef int (^int_returning_block_t)(void);
// invoking blocks:
b(10);
int output = b2();
Perhaps the most interesting feature of a block is that it can capture variables.
Variables can be captured by value, which is the value at the moment of the
creation of the block. This means that any subsequent modifications do not affect
the value inside the block, and the block cannot modify that variable either:
int a = 10;
dispatch_block_t b = ^() {
NSLog(@"a = %d", a);
a = 42; // this does not affect the outer-scope "a" variable
};
b();
17
__block int a = 10;
dispatch_block_t b = ^() {
NSLog(@"a = %d", a);
a = 42; // this *does* change the value of "a"
};
b();
NSLog(@"a = %d", a); // prints 42
dispatch_async(dispatch_get_main_queue(), ^{
// this is run in the main thread again:
[object redraw];
});
});
18
2.3.1 Dynamic dispatch
One of the main tasks of the runtime is to dispatch method calls to the method
implementations. This is due to Objective-C being a dynamically dispatched
language, where each method call is only resolved at runtime.
In contrast, a C function call is typically compiled into a CALL instruction, which
simply transfers control flow directly into the code of the callee. The address of the
callee is known at compile time and directly embedded into the CALL instruction.
However, when an Objective-C method is called, the address of the callee is not
computed at compile-time. Instead, a general dispatcher called objc msgSend is
invoked, which resolves the selector into a function implementation pointer and
transfers control to it:
// Method call in Objective-C:
id object = ...;
[object methodWithNumber:42];
@end
19
In this case, erasing a reference to a child automatically deallocates the whole
subtree, and there is no need to explicitly unset the parent references.
20
Chapter 3
21
3.1 Traditional compilation transformations
Let us demonstrate an example of a very simple program and what representations
it might go through when being compiled. Here is the source code in C:
#include <stdio.h>
int main(int argc, char *argv[]) {
printf("Hello, %s\n", argv[0]);
return 0;
}
When the compiler starts translating this program, it will convert it into a
token stream. While doing so, it will also preprocess the source code, resolving
all preprocessor macros and directives. If we for now ignore the inclusion of the
stdio.h system header file, the token stream can look like this:
int ’int’, identifier ’main’, l_paren ’(’, int ’int’, identifier ’argc’,
comma ’,’, char ’char’, star ’*’, identifier ’argv’, l_square ’[’, r_square ’]’,
r_paren ’)’, l_brace ’{’
r_brace ’}’
eof ’’
22
The AST representation is still very close to the original source code. Some in-
formation is lost, for example types are usually turned into canonical forms: Declar-
ing a variable as unsigned int would be indistinguishable from just unsigned.
When the code is present in the AST form in the compiler front-end, several
optimizations can be done. In Clang, however, a lot of optimizing techniques are
not used (e.g. constant folding) on the AST level, both because Clang wants to be
able to map the AST very closely to the original source code and also because the
LLVM back-end will perform the optimization anyway (and maybe in a better
way).
The AST is then transformed into an intermediate representation (IR).
For Clang this is the LLVM IR, which is a well-defined language that consists of
code in low-level instructions, but it also knows about high-level concepts, such
as functions and modules.
This is a major transformation, because the resulting code does not resemble
the original source code anymore. The instructions now represent very simple
steps, e.g. a single memory access, a function call or a single addition, etc. Any
non-trivial expression in the source code is likely to be expressed with several
LLVM instructions. LLVM also depends on static single assignment (SSA)
form and provides infinite number of virtual registers. This means that the
code can use any number of local variables (as virtual registers; their names start
with the percent symbol), but each can be assigned only once.
23
LLVM can do many optimizations that operate on the IR level and the output
is in the same format. It should be obvious that the shown code is sub-optimal: It
contains several unnecessary operations and variables, e.g. the %retval variable is
allocated in main memory (via the alloca instruction), then a zero is stored into
it, and then it is never used again. An optimized version of the same function
could look like this:
24
PUSH64r %RBP, %RSP, %RSP ; flags: FrameSetup
%RBP = MOV64rr %RSP ; flags: FrameSetup
%RSI = MOV64rm %RSI, 1, 0
%RDI = LEA64r %RIP, 1, <ga:@.str>
%EAX = XOR32rr %EAX, %EAX
CALL64pcrel32 <ga:@printf>
%EAX = XOR32rr %EAX, %EAX
%RBP = POP64r %RSP, %RSP
RETQ %EAX
These are the final machine-level instructions, but still in the internal LLVM
format. We lost some information when the meta-instructions, that were adjusting
the top of the stack, are now turned into PUSH64r and POP64r. But more
importantly, the information about uses of registers is also lost. While in the
previous form, we knew that the function call uses the %RDI and %RSI registers, in
the current form, we can not tell which registers are actually used by the
callee. It might even look like the assignments to the %RDI and %RSI registers are
not necessary, because they are not used anywhere else in the function. However,
they are used by the printf function.
The next representation is an assembly file. Here is the code using Intel
syntax :
_main: ## @main
PUSH RBP
MOV RBP, RSP
MOV RSI, QWORD PTR [RSI]
LEA RDI, [RIP + L_.str]
XOR EAX, EAX
CALL _printf
XOR EAX, EAX
POP RBP
RET
.section __TEXT,__cstring,cstring_literals
L_.str: ## @.str
.asciz "Hello, %s\n"
; Section __text
_main:
0x100000f50 PUSH RBP
0x100000f51 MOV RBP, RSP
0x100000f54 MOV RSI, QWORD PTR [RSI]
25
0x100000f57 LEA RDI, QWORD PTR [0x100000f8a]
0x100000f5e XOR EAX, EAX
0x100000f60 CALL 0x100000f6a
0x100000f65 XOR EAX, EAX
0x100000f67 POP RBP
0x100000f68 RET
; Section __stubs
0x100000f6a JMP QWORD PTR [0x100001010]
; Section __cstring
0x100000f8a DB "Hello, %s\n", 0
; Section __la_symbol_ptr
0x100001010 DQ ...
Notice that there are no more symbolic names at all, and everything is referred
by an integer constant or offset. There is one more final form of this code and
that is the actual binary representation inside the executable on the disk (here
shown in hexadecimal listing):
55 48 89 E5 48 8B 36 48
8D 3D 2C 00 00 00 31 C0
E8 05 00 00 00 31 C0 5D
C3 90
26
segname __PAGEZERO segname __TEXT
vmaddr 0x0000000000000000 vmaddr 0x0000000100000000
vmsize 0x0000000100000000 vmsize 0x0000000000001000
...
Section Section
sectname __text sectname __cstring
segname __TEXT segname __TEXT
addr 0x0000000100000f50 addr 0x0000000100000f8a
size 0x0000000000000019 size 0x000000000000000b
offset 3920 offset 3978
...
The load commands are instructions for the operation system’s loader and
they reveal important properties about the program. It is a 64-bit executable, and
the first load command describes a PAGEZERO segment, which should make the
first 4 GiB of address space (starting at address 0x0) to be inaccessible and any
access to it should trigger a segmentation fault. The second command sets up a 4
KiB TEXT segment starting at 0x100000000, where code and data are present.
The two sections, text and cstring, instruct the loader to map parts of the
contents of the file to the respective virtual addresses. The text section (which
usually contains executable instructions) will be created from 0x19 bytes starting
at offset 3920 in the file and mapped into the process address space starting at
address 0x100000f50; and similarly for the cstring section (which contains
string constants used by the program).
The naming and types of sections do not actually need to follow these rules,
and code obfuscation and other software protection schemes will intentionally try
to mislead us. However, binaries produced by standard compilers and linker will
follow a lot of customary conventions, and we can deduce information from them.
For example, the cstring section will contain only NULL-terminated string
literals and constants, and we would not expect code instructions to live in this
section. We should even be able to dump a list of all strings used by a binary
very easily by simply splitting the full contents of this section at each NULL
terminator. The section description above conveniently shows the file offset and
length of this data:
$ xxd -s 3978 -l 0xb a.out
0000f8a: 4865 6c6c 6f2c 2025 730a 00 Hello, %s..
The code section (named text) can be assumed to contain raw machine
instructions, and after we dump it in a similar way, we can use a disassembler
to try to reproduce the assembly listing. We will look into that in one of the
following sections, but for now, let us see what other sections and information we
can gather from the binary.
27
uses from them. Only after all references are resolved (which is a recursive task,
because libraries can depend on other libraries), the actual initialization of the
program starts. When talking about Mach-O and modern OS X, a segment called
LINKEDIT contains a blob of data that contains the necessary information for
the dynamic linker called dyld. A third-party tool called jtool can show what
all is contained in this segment:
Most of the contained data is in the form of structures that reference others,
but the very last item in the list, “String Table” contains again a list of NULL-
terminated strings that the other structures use:
We can immediately spot the name of the function in our source code, main
(prefixed with an underscore), and also the external function we are calling,
printf. Of course, there is a better way to view these exported and imported
symbols:
$ nm -m a.out
100000000 (__TEXT,__text) [referenced dynamically] external __mh_execute_header
100000f30 (__TEXT,__text) external _main
(undefined) external _printf (from libSystem)
(undefined) external dyld_stub_binder (from libSystem)
It might be somewhat unexpected that we can see the main symbol here,
because it is an internal function in our source code and there is no need for this
symbol to be externally visible or importable. The reason is that our binary is not
stripped, which is typical for debug versions of binaries. We probably would not
be able to see internal symbol names for release and production binaries, where
the strip command is usually run on the final binary.
The imported function ( printf), however, has to be visible and its name
has to be contained in the binary, because the dynamic linker uses the name to
resolve and link this external symbol. Using nm on any dynamically linked program
(i.e. almost all programs today) will reveal the symbol names and libraries of all
externally linked functions and other symbols.
28
Based on how a binary is compiled (optimized vs. non-optimized, debug
vs. release, stripped vs. non-stripped, etc.), more sections can be present, e.g.
unwind info is used to store compact unwind information, eh frame is
needed to support zero-cost exception handling in C++. Debugging data
(source files and line information, variable and procedure descriptions, etc.) can
be stored in another section.
3.4 Disassembly
Notice how much information we were able to retrieve without ever seeing the
actual instructions in the code sections. To see the instructions, we can run a
disassembler on the text section:
$ otool -t a.out
0000000100000f50 55 48 89 e5 48 8b 36 48 8d 3d 2c 00 00 00 31 c0
0000000100000f60 e8 05 00 00 00 31 c0 5d c3
$ otool -t -v a.out
0000000100000f50 PUSH RBP
0000000100000f51 MOV RBP, RSP
0000000100000f54 MOV RSI, QWORD PTR [RSI]
0000000100000f57 LEA RDI, QWORD PTR [RIP + 0x2c]
0000000100000f5e XOR EAX, EAX
0000000100000f60 CALL 0x100000f6a
0000000100000f65 XOR EAX, EAX
0000000100000f67 POP RBP
0000000100000f68 RET
In our case, we were able to successfully and accurately disassemble the whole
code section, and an important fact to notice is that even at first glance, some
instructions appear to be of higher importance than others. The CALL and RET
actually reveal some high-level structure of the function and the program. We
should also be able to recognize the function prologue and epilogue, where the
first two instructions (PUSH RBP and MOV RBP, RSP) form the standard prologue
that sets up the stack frame. The last two instructions, POP RBP and RET, tear
the stack frame down and return. We will deal with these later when we will
discuss ABI and calling conventions.
We can safely tell that the whole listing is a single function, and another piece
of high-level information that can be deduced is the use registers as inputs to the
function. The MOV RSI, QWORD PTR [RSI] instruction reads the value of RSI,
but the function never sets this value before. From this, we can infer that RSI in
an input to the function.
Our sample is very simple and contains no branches, and all of the code of the
function is in a single basic block. We will show more complex examples later
in the next chapter.
29
of a function or mistranslating one instruction can lead to more errors during
disassembly.
Another situation where the job of a disassembler is much harder is when
code is mixed with data inside the same section. The compiler can do this for
example for jump tables or large literals, where for both convenience and
performance, the required data can be embedded between functions, or even
between basic blocks. Alignment padding might also be evaluated as data, but
compiler-generated padding usually uses NOP instructions (to mitigate damage
when such code is accidentally executed).
For further processing, finding all procedures (and their sizes) is crucial. While
correctly and precisely disassembling arbitrary x86 code is a theoretically hard
problem [20], there are several heuristics and other indicators we can use:
• The function prologue and epilogue are commonly very distinguishable and
their instruction sequences are unlikely to appear within function bodies.
• Some procedure beginnings are explicitly listed (e.g. exported functions).
• Data sections can contain pointers to beginnings of procedures, but they
are very unlikely to contain pointers to the inside of a procedure.
• Objective-C metadata explicitly list all methods and pointers to the imple-
mentations.
3.5 Summary
We have described how a compiler transforms source code into a binary represen-
tation. Analyzing such a binary with existing standard tools can reveal useful
metadata. We have shown a heuristic technique to discover function entry points
30
and their sizes. This will be used as an input in the next chapter, where we will
reconstruct high-level code for an individual function.
31
32
Chapter 4
This saves the RBP register by pushing it onto the stack, and then stores the
current stack pointer (from the RSP register) as a new value into the RBP register,
which serves are the frame base pointer throughout the execution of the whole
function. This allows debuggers to always locate the start of the frame just by
looking at the RSP register. A typical epilogue on x86-64 does the opposite of the
prologue:
33
0000000100000f67 POP RBP
0000000100000f68 RET
First, we restore the original value of RBP, which will now store the previous
(caller’s) frame base pointer, and as a very last instruction in the function, the
RET instruction will transfer control to the calling function. If all the functions
in the program follow this convention, RBP always points to a valid frame, but
we can also always access the previous frame by looking into memory one word
below the frame, which will contain the caller’s saved frame pointer. This forms
the runtime call stack chain and it is easy for a debugger to walk this list and
display a backtrace of the current thread. The frame pointer serves another
important purpose, since it can be also used to access local variables (located on
the stack). This will be described later in this chapter.
In some cases, the prologue and epilogue can be different, or the frame pointer
might not be set up in the prologue. If the RBP register is used to hold the frame
pointer, it cannot be used as a general purpose register for the function execution.
As an optimization, Clang and GCC have a -fomit-frame-pointer flag, which
will allow using RBP as a regular register for register allocation, so fewer variables
need to be spilled onto the stack. However, the function prologue still needs to
save and restore the original RBP to satisfy the ABI requirements.
The prologue often allocates space on the stack for local variables:
0x100000ef0 <+0>: PUSH RBP
0x100000ef1 <+1>: MOV RBP, RSP
0x100000ef4 <+4>: SUB RSP, 0x10
The SUB instruction moves the stack pointer two words down, thus allocating
16 bytes on the stack. From this information alone, it is impossible to tell whether
these are two 8-byte variables or a 16-byte integer (or another combination which
totals 16 bytes). But the size of the stack frame is an important information
that will be later used by various analyses. If a function allocates space on the
stack in the prologue, it will also deallocate it in the epilogue, perhaps with a
corresponding ADD RSP, 0x10 instruction.
For the purposes of decompilation, the function prologue and epilogue are not
very interesting. They are target-specific low-level concepts and we can completely
skip analyzing the instructions from them. We are only interested in the fact that
the function has a frame and the size of the frame.
34
up the stack after the call has returned (if any arguments were stored on the
stack). The callee accesses the stack parameters indirectly via the RSP (or RBP)
register, by reading memory below its own function frame.
If we accept the fact that most functions have a low number of arguments
(there are some statistical measurements of this, e.g. [22] claims 2.8 is the average,
and [24] analyzed Windows DLLs with the outcome that most functions have 3 or
fewer arguments, but the measurements always depend on what software is being
analyzed), this means that most functions will only pass arguments in registers.
A typical function call on x86-64 can then look like this:
This stores three constant values into three registers. The CALL instruction
then transfers control to the function at the specified address, while also saving
the return address to the stack (which is used by the RET instruction to restore
control back to the caller). The callee can directly access the three arguments in
the RDI, RSI and RDX registers.
Analyzing such a simple situation is easy, but the content of the registers
at the time of the call might not be so obvious, as assigning the register value
does not necessarily need to happen right before the CALL instruction. A proper
data-flow analysis is necessary for us to be able to deduce the register values,
which will be described later.
The situation is simpler with input parameters. If we know the signature of
the function, then the ABI directly tells us in which registers (or stack locations)
are the arguments stored. If we do not know the signature, we can deduce that a
register is used as an input argument when it is being read from without assigning
it a value before:
35
4.1.3 Return values
If a function has a simple integer return value, it is returned via a register (RAX
for x86-64). This is achieved by simply leaving a value in that register before
returning from the function (via the RET instruction). The most basic example of
a function that returns a constant value is:
0x100000f0c <+0>: MOV RAX, 0x29A ; store the return value in RAX
0x100000f10 <+4>: RET
Besides the RET instruction, this function only executes a single instruction to
store the return value in RAX. It is also an example where the function does not
have a proper prologue and epilogue, because it does not use any stack space and
it does not even use any registers (except the return value register).
If we know the signature of a function, we can immediately tell which register
we expect to hold the return value. However, if the function signature is unknown,
we might need to make a heuristic guess. It might not be possible to distinguish
between a situation, where RAX contains a return value and a situation, when it
contains a leftover value from a previous computation that should be discarded.
A simple heuristic can be used: If the last value written to RAX before returning is
never used in other computation, it is more likely to be a return value. Otherwise,
it would be a dead store and the compiler is likely to eliminate such unnecessary
operation.
Additional ABI rules apply for more complex situations, e.g. when a larger
integer is returned or when returning a floating-point value. One special case is
worth mentioning: If the return value is a large structure, it is usually returned
by writing it to a memory place. However, the callee does not have such an
easily-available storage, because the stack must be left in the same state as it
was at the beginning of the call. Therefore, the System V ABI specifies, that the
caller is responsible for setting up this space, usually in its own stack frame, and
passing an additional hidden argument containing the pointer to this memory.
This hidden argument is prepended before the others.
large_struct function_returning_a_struct(int p) {
...
return x;
}
For our purposes, we should note that it is impossible for the analysis to
distinguish between these two variants. Unless we know the signature of the
function beforehand, we again have to make a heuristic guess.
36
be left in the same state as they were at the beginning of the call, which means
that for the caller it looks as if they were not used and the values in them did not
change. If the callee wants to use these registers, it must save the original values
(usually on the stack), and restore them before transferring control back to the
caller. Scratch registers can be used for any computation and they do not need to
be restored.
We can use this knowledge when analyzing a CALL instruction to make assump-
tions about what registers are clobbered during the call. This will be important
later during data-flow analysis.
This code allocates 16 bytes of stack storage (by moving the stack pointer 16
bytes down). The first stack memory access happens at the address of RBP-1, and
this pointer is accessed as a byte pointer. The same happens for the stack item at
RBP-2, but the third access happens as an 8-byte store at RBP-16. This suggests
37
that there are two 1-byte wide items and one 8-byte item on the stack. The bytes
at RBP-3 up to RBP-8 are unused due to alignment.
Note that the compiler is unlikely to generate the code mentioned above,
because it only stores constant values to the stack items, unless we explicitly
need the addresses of these stack items, for example when passed to a scanf(fmt,
&var1) call.
The x86-64 System V ABI also requires that a 128-byte red zone directly
below the stack pointer can be used without moving the stack pointer. This is
not too important for us, besides the fact that accessing a stack item beyond the
stack pointer is possible.
A quite uncommon feature of C/C++ is variably-sized arrays and alloca
calls, which allow allocating space on the stack of non-constant size. The compiler
achieves this by adjusting the stack pointer by a dynamic offset, and such a
variable is allocated below all the regular variables. Accesses to this variable are
done via another register that stores a pointer to it. When the function returns,
all of this space is deallocated by simply restoring the stack pointer to the saved
value.
4.1.6 AArch64
The AArch64 [25] architecture share most of the concepts mentioned in the
previous sections, but the implementations and the ABI is slightly different. On
AArch64, parameters are passed via R0–R7 registers; the FP register is used as a
frame pointer.
The function calling mechanism is different from x86. The BL instruction calls
a subroutine, but instead of pushing the return address onto the stack, it stores
the return address into LR, the link register. Returning from a function is done
via the RET instruction which reads a program address from the LR register and
transfers control to it. This means that the LR register must be explicitly saved
(pushed onto the stack) before a function can make calls, and restored before it
can return.
An example function with a prologue, a function call and an epilogue might
look like this:
The first instruction of the function prologue (STP, store pair ) saves the values
of FP and LR to the stack, while also adjusting (pre-indexing) the stack pointer
by 16 bytes. The second instruction then establishes the new frame pointer. The
epilogue then restores FP, LR, and also SP (by adjusting it by 16 bytes up).
38
4.2 Control flow analysis
Only very simple functions execute all instructions linearly, more interesting
functions have a more complex control flow. On the assembly level, most
control flow is carried out by branches, indirect branches and conditional
branches. These primitives partition a function into basic blocks, each of which
is a (maximal) series of instructions that is always executed completely. This
means that the first instruction of a basic block is the beginning of the function
or a target of some of the branches. No other instruction of a basic block is
a target of a branch. The last instruction of a basic block can be a branch, a
return instruction or a non-control-flow instruction (which means the basic block
performs a fall through at the end). In this definition we do not view function
calls as branches, nor do we care about exception handling, signals, interrupts or
other non-standard control flow.
39
7. Mark the basic block which has the very first instruction of the
function as the leaders as the entry basic block.
It should be easy to see that this algorithm finds all basic blocks and constructs
a control-flow graph (directed edges are formed from the succ(B) sets). The
algorithm however does not handle indirect branches (which are a common way of
representing switch statements), because it assumes that each branch has a known
target address and that the target is only one. We will work towards resolving
this later in the chapter.
Let us take a look at an example of how this algorithm works on the following
function (which implements the Euclidean algorithm to find the greatest common
divisor) for x86-64:
gcd:
0x100000e5a <+0>: PUSH RBP
0x100000e5b <+1>: MOV RBP, RSP
0x100000e5e <+4>: MOV RDX, RSI
0x100000e61 <+7>: MOV RAX, RDI
0x100000e64 <+10>: TEST RDX, RDX
0x100000e67 <+13>: JE 0x100000E7B ; <+33>
0x100000e69 <+15>: MOV RCX, RDX
0x100000e6c <+18>: XOR EDX, EDX
0x100000e6e <+20>: DIV RCX
0x100000e71 <+23>: TEST RDX, RDX
0x100000e74 <+26>: MOV RAX, RCX
0x100000e77 <+29>: JNE 0x100000E69 ; <+15>
0x100000e79 <+31>: JMP 0x100000E7E ; <+36>
0x100000e7b <+33>: MOV RCX, RAX
0x100000e7e <+36>: MOV RAX, RCX
0x100000e81 <+39>: POP RBP
0x100000e82 <+40>: RET
Naturally, the <+0> offset becomes the first leader, and we will add all the
branch targets (<+33>, <+15> and <+36>) to the set of leaders as well. Then
we also add all offsets that are immediately after a conditional jump (<+15> and
<+31>) to get the final set of leaders. When the function is split into basic blocks,
it looks like this:
; --- basic block #1 --- successors = {#2, #4} --------------------------
0x100000e5a <+0>: PUSH RBP
0x100000e5b <+1>: MOV RBP, RSP
0x100000e5e <+4>: MOV RDX, RSI
0x100000e61 <+7>: MOV RAX, RDI
0x100000e64 <+10>: TEST RDX, RDX
0x100000e67 <+13>: JE 0x100000E7B ; <+33>
; --- basic block #2 --- successors = {#2, #3} --------------------------
0x100000e69 <+15>: MOV RCX, RDX
0x100000e6c <+18>: XOR EDX, EDX
0x100000e6e <+20>: DIV RCX
0x100000e71 <+23>: TEST RDX, RDX
0x100000e74 <+26>: MOV RAX, RCX
0x100000e77 <+29>: JNE 0x100000E69 ; <+15>
; --- basic block #3 --- successors = {#5} ------------------------------
0x100000e79 <+31>: JMP 0x100000E7E ; <+36>
; --- basic block #4 --- successors = {#5} ------------------------------
0x100000e7b <+33>: MOV RCX, RAX
40
; --- basic block #5 --- successors = {} --------------------------------
0x100000e7e <+36>: MOV RAX, RCX
0x100000e81 <+39>: POP RBP
0x100000e82 <+40>: RET
; -----------------------------------------------------------------------
Basic block #2 has an interesting property that it has itself as one of its own
successors. That is perfectly valid and indicates that the block forms a loop. The
graph of basic blocks can be easily visualized, as shown in figure 4.1.
This function has a single exit block (a block with no successors ending with
a RET instruction), but other functions can have more than one of those; in cases
where the function never exits, it can even have no such basic blocks. On the
other hand, each function must have exactly one entry basic block.
The basic block graph can already give us a lot of non-trivial information about
the function. Loops in the graph are very likely to be loops in the original source
code. While technically the graph can have almost any shape, it is very common
that the graph has a linear path from the entry block to the exit block (if it has
only one), and branches on this path only form detours that later merge back to
the linear path. An interesting metric of a function is cyclomatic complexity,
which is defined as the number of linearly independent paths through the function
[26]. The higher this number, the more “complex” the structure of the function is
control flow.
4.2.2 Inlining
One of the most common optimizations the compiler performs is inlining, which
embeds the body of a called function into the caller, avoiding the overhead of
a function call, stack frame setup and register saving and restoring. There are
many rules and heuristics about inlining, which behaves differently in different
compilers, under different optimization settings and even in different versions of
the same compiler. The reason is that inlining can hurt performance or bloat code
size. However, we can make some observations:
41
_gcd:
push rbp
mov rbp, rsp
mov rdx, rsi
mov rax, rdi
test rdx, rdx
je 0x100000e7b
0x100000e69:
mov rcx, rdx
xor edx, edx
div rcx
test rdx, rdx
mov rax, rcx
jne 0x100000e69
0x100000e79: 0x100000e7b:
jmp 0x100000e7e mov rcx, rax
0x100000e7e:
mov rax, rcx
pop rbp
ret
Our catalogue certainly does not describe all possible control-flow statements,
but it will suffice to demonstrate our analyses. A much more complete pattern cat-
alogue including many edge cases is presented in Reverse Compilation Techniques
[8].
Analyzing a more complex control-flow graph then consists of finding these
patterns in the graph and then reducing the matched subgraph into one
entity, which represents the high-level language construct and embeds the matches
graph nodes as AST-like subnodes. This step can allow further reductions to
happen, and by repeating the pattern matching we can continue until we end
up with a single-entity graph. An example of such matching is shown in figure
4.3. At the end of the control-flow analysis, we have a single node describing the
function as an AST-like structure.
Simply finding matches from the pattern catalogue can often fail, because
there simply will not be any subgraph that can be reduced. The reasons for that
include:
43
Figure 4.3: Example of pattern matching in a control-flow graph. First step
matches an if-else statement, second step recognizes a while loop. Third step
matches a sequence of blocks into a single graph node. Last step shows recognizing
an if statement and a sequence.
In a situation where we cannot match any pattern from the catalogue, we can
sacrifice an edge from the control-flow graph and replace it with a goto statement.
A special case of this operation can be done within a loop, to replace an edge
with a break statement (in case the conditional edge points to the follow-up block
of the loop), or continue statement (the edge points to the loop condition test
node). Selection of which edge to sacrifice is inherently a heuristic decision, but
there are various indicators that can help, and we should never remove an edge
that will break the connectivity of the graph. Backedges (pointing towards
numerically lower addresses) are uncommon and usually indicate ends of loops; if
such an edge does not seem to belong to a particular loop, it is a good candidate
for removal. Pattern matching can also try to detect cases where some simple
structure matches, but there is an extra entry or exit edge. This can correspond
to an actual goto in the source code (e.g. for bailing out of a function with a
cleanup).
When the function has multiple exit basic blocks, we can create an artificial
“final” exit block that would become a single point of function’s exit, if that helps
our pattern matching algorithm. An even more beneficial transformation, however,
would be the opposite, because the compiler often prefers to have a single exit
basic block, and early returns from the function result in multiple unrelated edges
from various blocks into the exit node. In this case, duplicating the exit node
(assuming it is small) will restore the original intent (early return).
Removing an edge from the control-flow graph or duplicating exit nodes creates
new opportunities for catalogue-based pattern matching. If removing one edge
does not unblock pattern matching, we will remove another edge. Eventually, the
pattern matching must succeed, because, in worst case, the remaining graph will
be a tree, which can always be matched. Control-flow analysis can be summarized
44
Figure 4.4: Examples of removing an extraneous in-loop edge, and dealing with a
function with multiple exit points
45
0x100000ded <+13>: MOVSXD RCX, DWORD PTR [RAX + 4*RDI] ; RDI is the index
0x100000df1 <+17>: ADD RCX, RAX
0x100000df4 <+20>: JMP RCX ; indirect jump
0x100000df6 <+22>: ...
0x100000e17 <+55>: RET
; jump table with 6 entries:
0x100000e1c DD 0xffffffda
0x100000e20 DD 0xffffffee
0x100000e24 DD 0xfffffff3
0x100000e28 DD 0xfffffff3
0x100000e2c DD 0xfffffff4
0x100000e30 DD 0xfffffff9
The jump table is placed immediately after the instructions of the function,
in the same section. This is an example of data stored in the code section, which
must be analyzed differently, and skipped by the disassembler.
The LEA instruction simply loads the jump table address into RAX, and the
second instruction indexes into this table using the expression RAX + 4*RDI and
loads the found offset into RCX. This offset is added to the address of the jump
table, but since it is negative, the result will point into the function’s body.
To be able to distinguish this pattern even before basic block analysis has
been performed, we might need to resort to imprecise heuristics again. One such
method, which has a surprisingly high success, is to use a sliding window to find
this pattern, which has several very distinctive properties:
• There is a list of small negative or positive integers at the end of the function,
which often does not disassemble properly.
• There is an indirect branch in the function.
• There is a “register + 4*register” type of expression used.
• An address is loaded into a register, which points just beyond the last
instruction of the function.
These properties are unlikely to be present in other types of code, and they
can also be analyzed by simply looking at individual instructions, without the
need for any higher-level analysis.
To add support for jump tables into the basic block detection algorithm, we
simply have to add the list of jump table targets (calculated from the offsets) to
the set of leaders before the function is portioned into basic blocks. The sets of
predecessors and successors of the basic block ending with the indirect jump also
need to be updated accordingly. The rest of control-flow analysis needs to be
prepared to find basic blocks with more than two successors.
In certain cases, the compiler might choose not to generate a jump table, but
instead use a different method, for example a comparison-based binary search, a
series of comparisons or a combination of a jump table with other methods. In
either case, control-flow analysis should be already able to recognize such patterns.
If we want to properly fold these methods into a single switch statement, we need
to do that later, after data-flow analysis.
46
4.3 Data flow analysis
Just as control-flow analysis tries to reconstruct what high-level structures were
used to transfer control, data-flow analysis inspects what happens to the all
the data and values that are used throughout a function and tries to deduce what
are the high-level semantics of the operations.
Code represented with machine instructions will necessarily have low-level
aspects that are either impractical or even impossible to represent in high-level
source code. These can include manipulations with the stack pointer, using
multiple arithmetic operations to perform one semantic calculation, using registers
and stack slots, auxiliary address calculations, using CPU flags, etc. We will
perform data-flow based analyses that will try to eliminate or transform low-level
instructions into high-level code. At the end, we want all code to be transformed
into an AST, describing the whole function using high-level statements and
expressions only.
A practical implementation of a decompiler will likely use some sort of in-
termediate representation (IR), a middle-level code that will not contain the
low-level and architecture-specific details (like flags and specific set of registers),
but will still be in the form of instructions. The decompilation is then split into
three parts:
Since the values all are constants, this can be optimized into a single direct
assignment, but if we forget about the optimizations for a while, this could be
transformed into an AST, which is also graphically shown in figure 4.5.
- Sequence
- Assignment("imm", Constant(0xc200))
- Assignment("shifted_value", LeftShift(Variable("imm"), Constant(16)))
- Assignment("register_x9", Variable("shifted_value"))
47
Figure 4.5: Graphical representation of an AST
In this section, we will not describe a specific IR language nor a specific AST
representation, because the syntax and structure of these can greatly vary. Most
concepts of data-flow analysis can be explained without it, and where necessary
we will use a pseudo-code language.
In this case, the IMUL instruction implicitly uses the EAX register as its second
input (besides EBX), but it also implicitly uses another register for its output,
48
which is written into the EDX:EAX register pair (the output of a multiplication is
twice the size as the inputs). Secondarily, it also changes the CF and OF flags to
indicate whether a carry and an overflow happened.
All of the semantics, side effects and both explicit and implicit register uses
must be properly analyzed for the data-flow analysis to produce correct results.
This is another place where having an IR, which abstracts these problems away,
helps simplify the analysis. For the rest of the section, we are going to assume
that we can identify all semantics of all instructions.
An instruction then has:
For each instruction, we can also say which variables are live at that point:
If the instruction lies on a path between a variable’s definition and its use in a
reachable way, then the variable is live at this instruction, otherwise it is not live.
The IR can be structured in a way that if the instruction returns something,
it is always a single value, which may include multiple inner values, but it is a
single entity. In that case we can say that the whole instruction has uses (rather
than individual output variables). If an output variable does not have any uses,
we say that the variable is killed by the instruction.
Computing the definitions and uses of all variables within a single basic block is
straightforward. The following algorithm will find these, but it will also construct
the sets of all input and output variables of a basic block:
1. Initially:
• Let m be an empty multi-map (or an explicit algorithm input
in the later version).
2. For each instruction i in the basic block, do the following:
(a) If i reads from one of more variables v :
• Find key-value pairs (v, j) in m and set used-definitions(i)
to be all such values of j.
(b) If i writes to one or more variables v :
• If the variable is already one (or more) of the keys in m,
remove such entries.
• Add the (v, i) key-value pair into m.
For further enhancements of the algorithm, we will call the initial state the
multi-map m as basic-block-inputs(B), and the final state as basic-block-outputs(B).
We can see an example of the output of the algorithm on the following basic block:
; basic-block-inputs(B) = {}
1 $a := 10 ; used-definitions(1) = {}, m = {$a=>1}
2 $b := $a * 2 ; used-definitions(2) = {1}, m = {$a=>1, $b=>2}
49
3 $c := $x ; used-definitions(3) = {}, m = {$a=>1, $b=>2, $c=>3}
4 $d := $b + 30 ; used-definitions(4) = {2}, m = {$a=>1, $b=>2, $c=>3, $d=>4}
5 $a := 15 ; used-definitions(5) = {}, m = {$a=>5, $b=>2, $c=>3, $d=>4}
; basic-block-outputs(B) = {$a=>5, $b=>2, $c=>3, $d=>4}
The remaining task of finding uses and definitions beyond basic-block bound-
aries is more complex. The basic idea is to propagate the basic-block-outputs of
one BB into the basic-block-inputs map of another BB when there is an edge in
the control-flow graph between these two blocks. This is similar to classic variable
liveness analysis algorithms [28].
1. Initially, set basic-block-inputs(B) and basic-block-outputs(B) to
empty sets for all basic blocks.
2. Run the single-basic-block algorithm on the entry basic block.
3. While basic-block-inputs(B) or basic-block-outputs(B) are chang-
ing, do the following:
• For each basic block pair (A, B) where an control-flow edge
from A to B exists, do:
(a) Add all basic-block-outputs(A) into basic-block-inputs(B).
(b) Run the single-basic-block algorithm on B.
It should now be easy to see why m is a multi-map – several possible definitions
of the same variable can exist at the beginning of a basic block. This happens in
the following example:
bb_entry: 1 $a := 10
2 $b := 0
3 if ($a > 0) goto 5
bb_cond: 4 $b := 42
bb_exit: 5 $c := $b ; used-definitions(5) = {2, 4}
The algorithm will propagate the definition of $b into the last block (bb exit)
from two sources: Once from bb entry, and once from bb cond. This means that
basic-block-inputs(bb exit) will contain two possible definitions of $b: {$b=>2,
$b=>4}. If loops are present, a definition of a variable might be propagated from
the same from to itself – this is perfectly valid and indicates that the block uses a
value from the previous iteration of the loop.
It is easy to see that the algorithm must finish at some point, because there
is a finite number of key-value pairs in the multi-maps, and the algorithm only
grows the multi-maps, never removes from them. Once there is nothing to add,
the algorithm finishes. The time complexity of the algorithm certainly depends
on how the data structures used are actually implemented, and how sparse the
control-flow graph is. For our purposes, it will be sufficient to mention that bit
vectors and linked lists allow implementing the calculations as vector operations,
which compute the definitions of all variables at once; furthermore, compiler-
generated control-flow graphs usually have far less edges than complete graphs
[29].
50
is defined before all of its uses. This can be beneficial for data-flow analysis [31],
because all names in the function are unique, thus there is no need to search for
definitions or uses of variables. A good example of such an SSA language is the
LLVM IR [30].
Of course, some programs require to write a variable more than once. These
variables could be promoted to memory accesses, but that would very inefficient.
Instead, the SSA form allows the use of a special type of instruction, the phi
node (also marked as “φ”). The phi instruction selects a value from a list of
variables or constants based on which basic block was the real predecessor (in the
runtime sense of the word). There are additional restrictions on the use of phi
nodes, for example if they are present they must be before all other instructions
in the same basic block. The previous example could be rewritten into SSA form
as follows:
bb_entry: 1 $a := 10
2 $b1 := 0
3 if ($a > 0) goto 5
bb_cond: 4 $b2 := 42
bb_exit: 5 $c := phi bb_entry=>$b1, bb_cond=>$b2
51
Once data-flow analysis provides definitions and uses of all variables, it is
simple to perform value propagation. All we have to do is identify which values
should be propagated, for example variables with a single use, assignments with
no arithmetic operations, call arguments. As an example let us take a look at a
previously mentioned IR:
$immediate_value := 0xc200
$shifted_value := left_shift $immediate_value, 16
$register_x9 := $shifted_value
In this example, we cannot replace $a in the third line with $var. Doing so
would change the semantics of the function, because $var has a different value on
line 1 and line 3. This problem can sometimes be solved by renaming some of the
uses of a variable to a different name, but there are situations where this is not
possible either. Note that in SSA form, this problem does not exist, because all
variables are assigned exactly once.
52
bb_entry: $immediate_value := 0xc200
$shifted_value := 0xc2000000
$register_x9 := 0xc2000000
return $register_x9
In this case, it is easy to see that the two first instructions can be removed,
because they do not have any side-effects, they do not write into memory, and
the values they define are not used by any other instructions.
bb_entry: $register_x9 := 0xc2000000
return $register_x9
We could also perform another propagation and elimination to fold the function
into a single instruction:
bb_entry: return 0xc2000000
The important requirement for safe elimination is that the removed instruction
does not have any side effects.
Sequence([
- Assignment("rax", Variable("rsi"))
- Assignment("rbx", Variable("rdi"))
- Assignment("zf", Equals(Variable("rbx"), Constant(0)))
- WhileStatement(
Test: Negate(Variable("zf"))
Body: Sequence([
- Assignment("rcx", Variable("rax"))
53
Figure 4.6: IR with recognized control-flow
During the initial AST generation, we probably want to already apply some
optimizations, because we can use the data-flow analysis results. We can identify
variables that are only used once, but which has not been propagated because the
IR can only perform one operation per instruction. This allows us to fold multiple
calculations into a single complex expression, which can improve readability, but
we have to apply empirical limits so we do not end up generating functions with
only one extremely complex statement.
We also have to choose high-level data types for the variables used in the
AST. This is inherently a heuristic task, because many low-level instructions do
not indicate whether we are operating on signed or unsigned integers. When
the variable type does not match what the high-level language expects in some
expression, casts have to be generated. This can indicate that the type of the
variable is wrong, and we can try to perform an additional pass over the AST
that will try to remove as many casts as possible (since programmers usual try to
avoid using casts).
54
• If the body of a while loop ends with a statement that is the same as
the statement before the loop, the statement can be moved into the loop
condition:
a = expression(...);
while (a) {
...
a = expression(...);
}
• Goto statements from inside a loop can be changed into break and continue
statements if possible.
55
• Besides text output, we can also produce ranges for each node, so we
can easily find out which node is under the user’s cursor. An advanced
editor can then allow features like block collapsing or corresponding-bracket
highlighting.
• By having text ranges for nodes, editing the AST can be allowed. The
editor can let the user directly apply transformations to nodes, like swapping
the then and else bodies of an if statement (and negating the condition to
preserve the semantics).
56
Chapter 5
Objective-C Specific
Decompilation
In the previous chapters, we discussed what all steps are needed to build a generic
decompiler. Now we will take a look at programs written in Objective-C, what
extra information we can extract from Obj-C binaries, and how we can decompile
specific language constructs.
• The objc classlist section contains a list of all classes defined by the
program, as a list of pointers to class descriptors. A class descriptor is a
structure containing the superclass, the class name, instance size, imple-
mented protocols, methods, instance variables layout and list of properties.
• Each instance variable (ivar ) description contains its name, byte offset
in the object and encoded type. Properties also list their names and type.
Methods and class methods include their full selectors, including encoded
types of arguments and the return value.
• Sections objc methname and objc classname lists the names of classes
and methods (selectors) used by the program.
• In the objc ivar section, we will find offsets for individual instance
variables for all classes defined by the program.
There are other important metadata preserved in other sections, and a complete
specification of the structures in the Objective-C runtime header files [33]. The
reason why so much type information is stored in the binary is to allow several
advanced features of Obj-C: Classes can be created at runtime, including adding
57
methods and instance variables, they can also be looked up by name during runtime.
Various information about objects and classes can be queried at runtime, and
key-value coding [34], key-value observation [34] and method swizzling
[36] build on these features. In fact, the whole dynamic dispatch approach
requires to be able to list methods available to an unknown object.
This means that we are able to reconstruct almost complete class declarations,
and there are existing tools that dump these declarations in the form of header
files, which can be then used to link against libraries which do not provide header
files [37].
For decompilation, certainly the most interesting piece of information are the
type information about method arguments and instance variables, plus the fact
that instance variable offsets are not “hard-coded” but they are always read from
a global variable. Consider the following machine instruction listing:
0x100000ef4 <+0>: PUSH RBP
0x100000ef5 <+1>: MOV RBP, RSP
0x100000ef8 <+4>: MOV RBX, QWORD PTR [0x100001130]
0x100000efa <+7>: MOV RBX, QWORD PTR [RDI + RBX]
0x100000eff <+11>: ADD RDX, RBX
0x100000f04 <+16>: MOV RAX, RDX
0x100000f07 <+19>: POP RBP
0x100000f08 <+20>: RET
While it is not particularly hard to analyze the inputs and outputs of this
function, if we apply the Obj-C metadata into our analysis, we will immediately
know that this function is a method of a certain class, we will realize that
0x100001130 stores the offset of an ivar and we will know the type signature of
the method and it is much easier to analyze the function:
; Method signature: (long)my_method:(long)arg;
; (hidden) input argument "self" is in RDI
; (hidden) input argument "cmd" is in RSI
; input argument "arg" is in RDX
; output is in RAX
0x100000ef4 <+0>: ... ; function prologue
0x100000ef8 <+4>: MOV RBX, _$_MyClass_$_my_ivar_$_offset
0x100000efa <+7>: MOV RBX, QWORD PTR [RDI + RBX] ; read "my_ivar" ivar
; of "self" object
0x100000eff <+11>: ADD RDX, RBX ; add "RBX" to "arg"
0x100000f04 <+16>: MOV RAX, RDX ; return the result
0x100000f07 <+19>: ... ; function epilogue
Now it is very easy to say that this method simply reads the my ivar instance
variable of the self object, and adds the value to its argument of type long. We
do not have to guess whether the final value of RAX should be discarded or if it is
a real return value, because we know the signature of the method.
The type encodings do not describe every single possible type, but they do
include class types, common integer and floating-point types, strings, arrays,
pointers, selectors and structures. A full specification of the encoding is in the
Objective-C Runtime Programming Guide [38], but here is a few examples:
// variable types:
char c; // type encoding: c
long l; // type encoding: l
char *str; // type encoding: *
58
id obj; // type encoding: @
NSArray *arr; // type encoding: @"NSArray"
SEL selector; // type encoding: :
// function signatures:
- (id)init; // signature: @16@0:8
- (void)dealloc; // signature: v16@0:8 ... "v" is void
- (id)initWithCoder:(NSCoder *)decoder; // signature: @24@0:8@16
Type encodings for simple variables usually use one letter (c for char). The
encodings for full function signatures contain the return type first, and they also
include numbers which indicate offset in the stack frame. If we ignore the numbers,
decoding the signature is easy: @16@0:8 without numbers is @@:, which means
a function returning an object (the first @), and taking two arguments, first is
an object (the second @), second is a selector. All Objective-C methods have
these first two arguments, which indicate the receiver of the object (hidden self
argument) and the invoked selector (hidden cmd argument).
// C:
char *selector = "setObject:forKey:";
objc_msgSend(dictionary, selector, myObject, myKey);
The objc msgSend function is the main dispatcher, which looks up the selector
in the object’s class method table, and invokes that method (via an optimized
tail call ). This dispatcher is heavily optimized, so a fast path (method is present
in the method cache) performs less than 20 instructions on x86-64 [39]. Since
objc msgSend handles all selectors, it is a variadic function, and in fact a whole
family of these functions exist for various purposes:
id objc_msgSend(id self, SEL op, ...);
id objc_msgSendSuper(objc_super *super, SEL op, ...); // call superclass method
id objc_msgSendSuper2(objc_super *super, SEL op, ...); // newer ABI
void objc_msgSend_stret(id self, SEL op, ...); // returns a struct
void objc_msgSendSuper_stret(id self, SEL op, ...);
void objc_msgSend_fpret(id self, SEL op, ...); // returns floating-point number
void objc_msgSend_fp2ret(id self, SEL op, ...); // returns two FP values
These prototypes are still only “informational”, because these functions some-
times do not follow them. For example, objc msgSend is declared to return id,
but it can also return integer numbers or void.
Let us take a look how a method call looks like in compiled code:
59
0x100001a63 <+0>: MOV RDI, ... ; set the receiver
0x100001a67 <+4>: MOV RSI, QWORD PTR [0x10003cdb8] ; load the selector
0x100001a6e <+11>: CALL _objc_msgSend ; call objc_msgSend
Calling class methods works very similarly, because all classes are Obj-C
objects as well. The only difference is that the receiver of the message is not an
instance, but it is a pointer to the class (which is really a global variable), which
is loaded from a class reference pointer:
0x1000184c1 <+0>: MOV RDI, _$_CLASS_$_NSMutableSet ; QWORD [0x100003dce]
0x1000184c7 <+6>: MOV RSI, _$_SELECTOR_$_set ; QWORD [0x10003cdb8]
0x1000184cb <+10>: CALL _objc_msgSend
@implementation MyClass
- (long)myMethod {
long sum = 0;
for (int i = 0; i < 10; i++) {
sum += self.myNumber;
}
return sum;
60
}
@end
The property read inside the loop is actually turned into a method call and a
getter is generated by the compiler:
- (long)myMethod {
long sum = 0;
for (int i = 0; i < 10; i++) {
sum += [self myNumber];
}
return sum;
}
- (long)myNumber {
return self->_myNumber; // ivar access
}
The function call and the fact that it needs to dynamically dispatched prevents
the compiler from inlining the ivar access and optimizing the loop into a single
multiplication. This is a major difference from C++ code, where method inlining is
very common, and even when a method is invoked via the virtual table mechanism,
there are still optimization opportunities (specialization or the fact that virtual
tables of one object cannot be changed after the object is constructed).
The resulting optimized code of this loop can look like the following assembly
listing, where it is easy to recognize a loop and a function call inside the loop:
...
0x100000e79 <+11>: MOV R14, RDI
0x100000e7c <+14>: MOV R12, 0xA ; R12 starts at 10
0x100000e89 <+27>: MOV RBX, 0 ; RBX is the result
-----------------------------------------------------------------------------
0x100000e8b <+29>: MOV RDI, R14
0x100000e8e <+32>: MOV RSI, _$_SELECTOR_$_myNumber
0x100000e91 <+35>: CALL _objc_msgSend ; call -[self myNumber]
0x100000e97 <+41>: ADD RBX, RAX
0x100000e9a <+44>: DEC R12
0x100000e9d <+47>: JNE 0x100000E8B ; <+29>
-----------------------------------------------------------------------------
0x100000e9f <+49>: MOV RAX, RBX ; return result via RAX
...
In summary, this all is very good news for decompilation, because certain
patterns are more likely to be preserved in Obj-C compiled code than in compiled
code written in other languages. Based on the fact that method calls and prop-
erty accesses are very common in most Obj-C code, we will see that automatic
decompilation can often reconstruct a very high-quality source code.
61
to zero, the object is deallocated. On top of that, Objective-C provides a concept
of autoreleasing, which releases a reference to an object, but only marks the
object to be deallocated later and not immediately.
There are two major compiler modes for reference counting:
• Manual Retain-Release (MRR) means that the programmer is completely
is managing the retain count of all objects used in their code. This is done
via cals to objc retain and objc release (from C code) or calling the
retain and release selectors on Obj-C objects.
A strict set of memory-management rules exists and the programmer must
follow them in order to avoid memory corruptions, leaks and undefined behavior
[40]. The rules influence naming of functions, for example when a function contains
“Create” in its name, it will always return a “+1” reference count [41].
• Automatic Reference Counting (ARC) is a modern compiler technology
which implements the strict rules for maintaining reference counts in the
compiler itself and it is what most modern Objective-C code uses. Whenever
an object’s reference is stored to a local variable, global variable or an ivar,
the compiler automatically retains the object. Similarly, when a reference is
removed (or replaced), the previous object is released. Variables with this
behavior are called strong references, and the language also provides a
concept of non-retaining variables, called weak references. These are often
used to avoid cyclic object references (retain cycles) which cause memory
leaks.
Developers are actively discouraged from writing code with manual retain
count management, because it is error-prone and makes code less readable, so we
will also focus on ARC.
When analyzing an Obj-C binary, we will often see calls to retain count man-
agement functions, such as objc storeStrong, objc autoreleaseReturnValue,
objc retainAutoreleasedReturnValue, objc retain and objc release, even
when the original source code contained no such calls. This is the result of
ARC, the exact rules about how the compiler behaves and what function calls it
generates is described in Clang’s documentation [42]. it is desirable to produce
decompiled source code which does not manually call the ARC functions, so let
us take a look at an example of an IR code and how it can be transformed:
; Function prototype:
; - (void)methodWithObject:(id)object;
; Input "object" is in register RDX.
bb_entry: $local_object := $register_rdx
call objc_retain, $local_object
$result := call objc_msgSend, $local_object, _$_SELECTOR_$_method
$result2 := call objc_retainAutoreleasedReturnValue, $result
call objc_release, $local_object
return $result2
In this case, we can recognize that the retain-release pair on $local object
and the retaining of the autoreleased return value from the method call are
both results of ARC and we can remove them. The retain and releases can be
simply stripped off, and the call to objc retainAutoreleasedReturnValue can
be changed to return the object that was passed to it:
62
; - (void)methodWithObject:(id)object;
bb_entry: $local_object := $register_rdx
$result := call objc_msgSend, $local_object, _$_SELECTOR_$_method
$result2 := $result
return $result2
Both the “old” and “new” examples are valid decompilation results, but the
latter expressions are more readable, so a decompiler should choose to prefer them.
63
The instancetype keyword indicates that the method returns an object of
the same class as the class that the method is defined in, or one of its subclasses.
It helps the compiler in static type checking. The second unusual thing is the
assignment to self. It might look like it has some special semantics, however,
self is a simple local argument to the function, and during the execution of the
function, it acts as an ordinary local variable.
Detecting this pattern is straightforward again, the following assembly example
shows the call to [super init] on x86-64, which stores the result in RAX. This
register then holds the self variable for the rest of the function, and is then
returned. During AST generation or rewriting, we will simply rename this local
variable to self.
0x100000e50 <+0>: ... ; function prologue
0x100000e67 <+23>: MOV RSI, _$_SELECTOR_$_init
0x100000e6e <+30>: MOV RDI, ... ; super-call options
0x100000e72 <+34>: CALL _objc_msgSendSuper2 ; call [super init]
0x100000e77 <+39>: TEST RAX, RAX ; RAX is self
0x100000e7a <+42>: JE 0x100000e8b ; <+59>
----------------------------------------------------------------------
0x100000e7c <+44>: ... additional instance setup
----------------------------------------------------------------------
0x100000e8b <+59>: ... ; function epilogue
0x100000e90 <+64>: ret ; returns self in RAX
5.5 Blocks
Blocks are a major feature of Objective-C that is extensively used both by library
vendors and end users. Many popular libraries and framework exist purely to
provide easy-to-use API via blocks, and the language feature is even available
in pure C when using Clang as the compiler. One of the most common uses of
64
blocks involves Grand Central Dispatch (GCD), a standard library used for
asynchronous execution and multithreading.
Let us take a look at a complex example of declaring a block, and its later
invocation:
__block long shared_variable;
long (^my_block)(long) = ^(long input) {
shared_variable += input;
return 42;
};
...
long output = my_block(10);
This block takes one integer as its input and returns an integer value as well.
However, it also captures a special variable marked with block, which means
the variable will be promoted to a heap-allocated variable so it is accessible even
when the declaration goes out of scope. Special rules apply if the block operates
with objects or other blocks.
Successfully decompiling blocks is certainly non-trivial, as it involves:
Note that the implementation details of blocks are actually part of a platform’s
ABI, which means that the structure will not be changed in the future. This is
what allows us to pattern match and parse the structure, even when they are not
a public API.
Finding the block descriptor among all other global structures is not obvious,
as there is no list of all block in the binary’s metadata. Pattern matching over
static data in the binary can be used with the following hints:
65
• On 64-bit platforms, the size member is 40, indicating that the block
descriptor has five 8-byte-wide members.
• The copy helper and dispose helper members point to beginning of
helper functions, which are in the code section and are not part of any
class method. These helper functions are very likely to contain calls to
Block object assign and Block object dispose APIs.
• The signature pointer points to a string containing a type encoding.
• The address of the descriptor is referenced from the code section.
The last item, a reference from the code section, is important to find as we
will use it to find the pointer to the actual code of the block. If we know where
the block descriptor is referenced from, we can look to nearby instructions, as
they are likely to contain the code pointer. This again needs to be heuristic, but
a pointer to the beginning of an otherwise-unreferenced function is unusual. If
the binary we are analyzing is not stripped (which removes local symbol names),
then this job is much easier, because the block’s code will be a function with the
string block invoke in its name.
As mentioned before, there can be multiple places that reference the block
descriptor. In such a case, all of these instances are creating different blocks of
the same signature.
With this we can build a database of all blocks that are defined by the program,
and for each block we will know its signature, code pointer and places where these
are referenced from.
struct Block_literal {
void *isa = &_NSConcreteStackBlock; // _NSConcreteGlobalBlock for globals
int flags;
int reserved;
void *invoke = ...; // pointer to the block’s code
Block_descriptor *descriptor = ...; // pointer to the block descriptor
Such a large stack item should be easy to recognize as well, mainly be-
cause of the isa pointer, which always points to NSConcreteStackBlock or
NSConcreteGlobalBlock classes. The structure of the block literal explains why
we are likely to see references to the block’s code and block descriptors close to
each other. This large structure has to live on the stack and cannot be optimized
into registers.
66
5.5.3 Captured variables
When the stack block is allocated, it also captures local variables. When a variable
is captured by value, its current value is simply copied to the end of the block
literal. Here is an example assembly listing of a function which defines a block
inside:
Of course, such a code sequence can be optimized in many ways, but once we
recognize the large stack item and its structure, we can identify the individual
member assignments (isa, invoke pointer, descriptor, captured variables).
This means that every generated block literal has a different structure, based
on what variables are captured.
Capturing a value by reference works differently, because it transforms
the block variable into another large stack structure:
// By-reference declaration:
__block int a;
// Transformed into:
struct block_byref_a {
void *isa;
struct block_byref_a *forwarding;
int flags;
int size;
void *copy_helper;
void *dispose_helper;
int captured_a; // actual storage
};
A reference to this structure is stored into block literal. Since we know the
signature of the block, we can identify that we are capturing a by-ref value and
from this we can recognize the by-ref stack structure. All subsequent accesses to
the actual storage are done via the forwarding pointer, which again is something
that we should rewrite into direct assignments:
block_byref_a->forwarding->captured_a = 10;
67
5.5.4 Block code
The actual body of a block needs to able to access the captured variables, which
are available through the block literal. A pointer to the block literal is added as
an extra (hidden) argument to the block’s signature. Let us take a look at an
example of a block definition:
long x;
long (^my_block)(long) = ^(long input) {
long a = x; // access to a captured variable
long b = input; // access to an explicit input
...
};
This actual generated block body will be translated by the compiler into:
long my_block_invoke(Block_literal *lit, long input) {
long a = lit->x; // access to a captured variable
long b = input; // no change
...
}
When decompiling a block body, we just need to take the extra parameter
into account and recognize accesses via the block literal pointer. We already know
what variables and types are captured in the block literal and at what offsets
these variables are.
// Is equivalent to:
- (void)directlyInvokeBlock:(block_literal *)block_literal {
block_literal->invoke();
}
68
; Method signature:
; - (void)myMethod:(long)arg;
; Input "arg" is in register RDX
myMethod:
0x100000dd1 <+0>: PUSH RBP
0x100000dd2 <+1>: MOV RBP, RSP
0x100000dd5 <+4>: PUSH R14
0x100000dd7 <+6>: PUSH RBX
0x100000dd8 <+7>: SUB RSP, 0x30
0x100000ddc <+11>: MOV R14, RDX
0x100000ddf <+14>: XOR EDI, EDI
0x100000de1 <+16>: XOR ESI, ESI
0x100000de3 <+18>: CALL _dispatch_get_global_queue
0x100000de8 <+23>: MOV RDI, RAX
0x100000deb <+26>: CALL _objc_retainAutoreleasedReturnValue
0x100000df0 <+31>: MOV RBX, RAX
0x100000df3 <+34>: MOV RAX, &_NSConcreteStackBlock
0x100000dfa <+41>: MOV QWORD PTR [RBP - 0x38], RAX
0x100000dfe <+45>: MOV DWORD PTR [RBP - 0x30], 0xC0000000
0x100000e05 <+52>: MOV DWORD PTR [RBP - 0x2c], 0x0
0x100000e0c <+59>: MOV RAX, &my_block_body
0x100000e13 <+66>: MOV QWORD PTR [RBP - 0x28], RAX
0x100000e17 <+70>: MOV RAX, &my_block_descriptor
0x100000e1e <+77>: MOV QWORD PTR [RBP - 0x20], RAX
0x100000e22 <+81>: MOV QWORD PTR [RBP - 0x18], R14
0x100000e26 <+85>: LEA RSI, [RBP - 0x38]
0x100000e2a <+89>: MOV RDI, RBX
0x100000e2d <+92>: CALL _dispatch_async
0x100000e32 <+97>: MOV RDI, RBX
0x100000e35 <+100>: CALL _objc_release
0x100000e3b <+106>: ADD RSP, 0x30
0x100000e3f <+110>: POP RBX
0x100000e40 <+111>: POP R14
0x100000e42 <+113>: POP RBP
0x100000e43 <+114>: RET
my_block_body:
0x100000e44 <+0>: PUSH RBP
0x100000e45 <+1>: MOV RBP, RSP
0x100000e48 <+4>: MOV RSI, QWORD PTR [RDI + 0x20]
0x100000e4c <+8>: MOV RDI, &format_string
0x100000e53 <+15>: XOR EAX, EAX
0x100000e55 <+17>: POP RBP
0x100000e56 <+18>: JMP _printf
my_block_descriptor:
0x100001060: DQ 0x0000000000000000
0x100001068: DQ 0x0000000000000028
0x100001070: DQ 0x0000000100000f2e "v8@?0"
0x100001078: DQ 0x0000000100000f59 ""
format_string:
0x100000f2a: DB "%ld", 0
It fairly simply to recognize the block’s body (my block body), its descriptor
(my block descriptor) and the block creation site (starting at 0x100000dfa in
myMethod). The block descriptor contains a signature, v8@?0, which means “a
function returning void, total argument size of 8 bytes, and a block literal pointer
69
at offset 0”. The resulting block literal structure looks like this (annotated with
offsets):
struct my_block_literal {
void *isa; // offset 0x0
int flags; // offset 0x8
int reserved; // offset 0xc
void *invoke; // offset 0x10
Block_descriptor *descriptor; // offset 0x18
long arg1; // offset 0x20
}
This allows us to decompile the block’s body, and resolve the RDI + 0x20
pointer dereference into a nice member access. We can rewrite the function directly
into an IR:
; void my_block_body(my_block_literal *lit);
my_block_body: $register_rsi := extract_member $register_rdi, arg1 ; offset 0x20
$register_rdi := &format_string
$register_eax := 0x0
call printf, $register_rdi, $register_rsi
This can be further optimized, and we can generate the AST. Notice the access
to the captured variable via the lit argument.
Now, rewriting the outer function directly into IR results in the following:
; - (void)myMethod:(long)arg;
myMethod: $register_r14 := arg
$register_rdi := 0x0
$register_rsi := 0x0
$register_rax :=
call dispatch_get_global_queue, $register_rdi, $register_rsi
$register_rdi := $register_rax
$register_rax :=
call objc_retainAutoreleasedReturnValue, $register_rdi
$register_rbx := $register_rax
set_member $lit, isa, &_NSConcreteStackBlock ; offset 0x0
set_member $lit, flags, 0xc0000000 ; offset 0x8
set_member $lit, reserved, 0x0 ; offset 0xc
set_member $lit, invoke, &my_block_body ; offset 0x10
set_member $lit, descriptor, &my_block_descriptor ; offset 0x18
set_member $lit, arg1, $r14 ; offset 0x20
$register_rsi := address_of $lit
$register_rdi := $register_rbx
call dispatch_async, $register_rdi, $register_rsi
$register_rdi := $register_rbx
call objc_release, $register_rdi
70
- (void)myMethod:(long)arg {
long local1 = arg;
void *local2 = dispatch_get_global_queue(0x0, 0x0);
void *local3 = objc_retainAutoreleasedReturnValue(local2);
my_block_literal lit;
lit->isa = &_NSConcreteStackBlock;
lit->flags = 0xc0000000;
lit->reserved = 0x0;
lit->invoke = &my_block_body;
lit->descriptor = &my_block_descriptor;
lit->arg1 = local1;
void *local4 = &lit;
dispatch_async(local3, local4);
dispatch_release(local3);
}
Now we will combine the two decompiled sources, while also applying some
more simplification (removing of ARC memory management, propagating and
remaining variables). we will embed the code from the block body into the outer
function and replace all references to lit->arg1 with arg to resolve the variable
capture. The full final result is as simple as the following decompiled code:
- (void)myMethod:(long)arg {
dispatch_queue_t queue1 = dispatch_get_global_queue(0x0, 0x0);
dispatch_block_t block1 = ^() {
printf("%ld", arg);
};
dispatch_async(queue1, block1);
}
71
instance = [[MyClass alloc] init];
... // other initialization of "instance"
};
if (onceToken != 0xffffffffffffffff) {
dispatch_once(&onceToken, block); // a function call, not a macro call
}
return instance;
}
However, this involves at least two function calls per iteration: One to retrieve
the item and one to get the length of the array. Furthermore, the access to
array.count is an opaque statement to the compiler, it cannot optimize the loop
(e.g. reading the array length cannot be moved out of the loop, unrolling the loop
is not possible). Therefore, Objective-C offers fast enumeration (also called
for-in loop or for-each loop), which uses a different syntax for the for loop
[44]:
NSArray *array = ...;
for (id item in array) {
...;
}
@protocol NSFastEnumeration
- (NSUInteger)countByEnumeratingWithState:(NSFastEnumerationState *)state
objects:(id [])buffer
count:(NSUInteger)len;
@end
72
When a collection implements this protocol, the method is supposed to store
a C array of objects in buffer (of up to len elements), and return the number of
elements stored. The function will be called repeatedly until it returns 0, indicating
that we have reached the end of the collection. The state parameter can be
used by the implementation for bookkeeping information (e.g. position in the
array), and can include mechanisms to detect collection mutation during iteration,
which is forbidden and indicates a programmer’s error. The itemsPtr field of the
NSFastEnumerationState structure is the actual pointer to the returned data
– the implementation can choose whether it will set itemsPtr to point to the
user-supplied buffer, or whether it will point it to some internal data structure.
A for-in loop is translated by the compiler into the following:
id item;
NSFastEnumerationState state = { 0 };
id objects[16];
NSUInteger limit;
73
• The code from the body of the for-in loop is unlikely to be moved out from
the loop or even within the most inner loop of the generated code, because
the item variable is only calculated as the last compiler-generated statement.
it is also calculated in a way which is hard to statically reason about (it is
loaded from a pointer that is modified by a previous opaque method call).
Based on this, we can deduce that most of the generated auxiliary basic blocks
will not contain any decompilation-interesting code. If we can detect which basic
block contains the user-written body of the for-in loop, we may discard the two
generated loops altogether.
As an example, let us take a look at figure 5.1, which shows the CFG of a
function with a for-in loop. It should not be hard to spot the two inner loops,
and also a short block which calls objc enumerationMutation. we will use this
block as an anchor, because its successor block must be the basic block containing
the user’s for-in body.
The approach we can take here is to only analyze the inner-most block and
treat the whole CFG subgraph as if the loops were not executed at all (the very
first branch based on the limit variable returns false). In this case, we would be
able to transform the following CFG structure:
- Sequence
- ...
- If
- Test: Basic Block #1 ; contains the reference to the container
- True: Sequence
- Basic Block #2
- While
- Test: Basic Block #3
- Body: While
- Test: Basic Block #4
- Body: Sequence
- If
- Test: Basic Block #5
- True: Basic Block #6 ; contains call to objc_enumerationMutation
- Basic Block #7 ; contains user’s body of the for-in loop
- Basic Block #8
- ...
- Sequence
- ...
- Basic Block #1
- FastEnumeration
- Array: ... ; extracted from Basic Block #1
- Loop variable: ... ; extracted from Basic Block #7
- Body: Basic Block #7
- ...
This just presents the idea how such a change might work. The full im-
plementation of such a transformation would require coordination on both the
CFG-recognition level, data-flow analysis and post-AST optimization. In case of
74
_AFQueryStringFromParameters:
push rbp
mov rbp, rsp
sub rsp, 0x150
lea rax, qword [ss:rbp+var_90]
mov rcx, qword [ds:imp___got____stack_chk_guard]
mov rcx, qword [ds:rcx]
mov qword [ss:rbp+var_8], rcx
mov qword [ss:rbp+var_90], 0x0
mov qword [ss:rbp+var_E8], rdi
mov rdi, rax
mov rsi, qword [ss:rbp+var_E8]
call imp___stubs__objc_storeStrong
mov rax, qword [ds:objc_cls_ref_NSMutableArray]
mov rsi, qword [ds:0x3cf08]
mov rdi, rax
call imp___stubs__objc_msgSend
mov rdi, rax
call imp___stubs__objc_retainAutoreleasedReturnValue
xor esi, esi
mov edx, 0x40
lea rcx, qword [ss:rbp+var_E0]
mov qword [ss:rbp+var_98], rax
mov rdi, rcx
call imp___stubs__memset
mov rdi, qword [ss:rbp+var_90]
call _AFQueryStringPairsFromDictionary
mov rdi, rax
call imp___stubs__objc_retainAutoreleasedReturnValue
lea rdx, qword [ss:rbp+var_E0]
lea rcx, qword [ss:rbp+var_88]
mov esi, 0x10
mov r8d, esi
mov rsi, qword [ds:0x3ce80]
mov rdi, rax
mov qword [ss:rbp+var_F0], rax
call imp___stubs__objc_msgSend
cmp rax, 0x0
mov qword [ss:rbp+var_F8], rax
je 0x16f45
0x16ddd:
xor eax, eax
mov ecx, eax
lea rdx, qword [ss:rbp+var_E0]
add rdx, 0x10
mov rsi, qword [ss:rbp+var_D0]
mov rsi, qword [ds:rsi]
mov rdi, qword [ss:rbp+var_F8]
mov qword [ss:rbp+var_100], rsi
mov qword [ss:rbp+var_108], rdx
mov qword [ss:rbp+var_110], rcx
mov qword [ss:rbp+var_118], rdi
0x16e19:
mov rax, qword [ss:rbp+var_118]
mov rcx, qword [ss:rbp+var_110]
mov rdx, qword [ss:rbp+var_108]
mov rsi, qword [ds:rdx]
mov rdi, qword [ss:rbp+var_100]
cmp qword [ds:rsi], rdi
mov qword [ss:rbp+var_120], rax
mov qword [ss:rbp+var_128], rcx
je 0x16e5e
0x16e4f:
mov rax, qword [ss:rbp+var_F0]
mov rdi, rax
call imp___stubs__objc_enumerationMutation
0x16e5e:
mov rax, qword [ss:rbp+var_D8]
mov rcx, qword [ss:rbp+var_128]
mov rax, qword [ds:rax+rcx*8]
mov qword [ss:rbp+var_A0], rax
mov rax, qword [ss:rbp+var_98]
mov rdx, qword [ss:rbp+var_A0]
mov rsi, qword [ds:0x3d618]
mov rdi, rdx
mov qword [ss:rbp+var_130], rax
call imp___stubs__objc_msgSend
mov rdi, rax
call imp___stubs__objc_retainAutoreleasedReturnValue
mov rcx, rax
mov rsi, qword [ds:0x3ce90]
mov rdx, qword [ss:rbp+var_130]
mov rdi, rdx
mov rdx, rcx
mov qword [ss:rbp+var_138], rax
call imp___stubs__objc_msgSend
mov rax, qword [ss:rbp+var_138]
mov rdi, rax
call imp___stubs__objc_release
mov rax, qword [ss:rbp+var_128]
add rax, 0x1
mov rcx, qword [ss:rbp+var_120]
cmp rax, rcx
mov qword [ss:rbp+var_118], rcx
mov qword [ss:rbp+var_110], rax
jb 0x16e19
0x16efe:
lea rdx, qword [ss:rbp+var_E0]
lea rcx, qword [ss:rbp+var_88]
mov eax, 0x10
mov r8d, eax
mov rsi, qword [ds:0x3ce80]
mov rdi, qword [ss:rbp+var_F0]
call imp___stubs__objc_msgSend
xor r9d, r9d
mov ecx, r9d
cmp rax, 0x0
mov qword [ss:rbp+var_110], rcx
mov qword [ss:rbp+var_118], rax
jne 0x16e19
0x16f45:
mov rax, qword [ss:rbp+var_F0]
mov rdi, rax
call imp___stubs__objc_release
lea rax, qword [ds:cfstring___366a8]
mov rcx, qword [ss:rbp+var_98]
mov rsi, qword [ds:0x3d620]
mov rdi, rcx
mov rdx, rax
call imp___stubs__objc_msgSend
mov rdi, rax
call imp___stubs__objc_retainAutoreleasedReturnValue
xor r8d, r8d
mov esi, r8d
lea rcx, qword [ss:rbp+var_98]
mov rdi, rcx
mov qword [ss:rbp+var_140], rax
call imp___stubs__objc_storeStrong
xor r8d, r8d
mov esi, r8d
lea rax, qword [ss:rbp+var_90]
mov rdi, rax
call imp___stubs__objc_storeStrong
mov rax, qword [ss:rbp+var_140]
mov rdi, rax
call imp___stubs__objc_autoreleaseReturnValue
mov rcx, qword [ds:imp___got____stack_chk_guard]
mov rcx, qword [ds:rcx]
cmp rcx, qword [ss:rbp+var_8]
mov qword [ss:rbp+var_148], rax
jne 0x16fe7
0x16fd7:
mov rax, qword [ss:rbp+var_148]
add rsp, 0x150 0x16fe7:
pop rbp call imp___stubs____stack_chk_fail
ret
Figure 5.1: An example of an optimized compiled for-in loop from the AFNet-
working project. Graphical output from the Hopper program.
nested for-in loops or if the body if the loop contains other interesting control-
flow statements, we should analyze the basic blocks from the innermost to the
outermost one, as the presented CFG algorithm in the previous chapters did.
Alternatively, the transformation can be done on the AST level. This, however,
comes with a downside that a lot of the necessary work is much easier done by
the data-flow analysis on an IR level.
Let us take a look at an example. An already optimized output of the
decompiler could produce an AST like the following:
void *_AFQueryStringFromParameters(void *arg1) {
NSFastEnumerationState state;
id objects[16];
local_mutable_array1 = ...;
rax = ...;
void *local1 = &state;
void *local2 = &objects;
void *array = rax;
rax = [array countByEnumeratingWithState:local1 objects:local2 count:0x10];
if (rax != 0x0) {
...
do {
do {
...
if (...) {
objc_enumerationMutation(...);
}
rax = ...; // "rax" becomes the loop variable
rax = [rax URLEncodedStringValue];
rax = [rax retain];
local3 = rax;
[local_mutable_array1 addObject:rax];
[local3 release];
...
} while (...);
rax = [array
countByEnumeratingWithState:local1 objects:local2 count:0x10];
...
} while (rax != 0x0);
}
rax = ...;
return rax;
}
The transformation can completely ignore the “...” parts of the complex
control-flow structure, and replace the outer if statement with a for-in structure.
The receiver of the first countByEnumeratingWithState... call is the container
that we are iterating. The loop variable is retrieved from the inner-most block.
void *_AFQueryStringFromParameters(void *arg1) {
NSFastEnumerationState state;
id objects[16];
local_mutable_array1 = ...;
rax = ...;
void *local1 = &state;
void *local2 = &objects;
void *array = rax;
rax = [array countByEnumeratingWithState:local1 objects:local2 count:0x10];
76
for (rax in array) {
rax = [rax URLEncodedStringValue];
rax = [rax retain];
local3 = rax;
[local_mutable_array1 addObject:rax];
[local3 release];
}
rax = ...;
return rax;
}
Further optimizations can eliminate the auxiliary local variables and the
method call to countByEnumeratingWithState..., and also the ARC memory
management calls. The resulting method can look like this:
void *_AFQueryStringFromParameters(void *arg1) {
local_mutable_array1 = ...;
rax = ...;
void *array = rax;
for (rax in array) {
rax = [rax URLEncodedStringValue];
[local_mutable_array1 addObject:rax];
}
rax = ...;
return rax;
}
- (NSArray *)methodReturningArrayWithMyString {
NSMutableArray *array = [NSMutableArray array];
NSString *str = @"my string";
NSString *str2 = [str uppercaseString];
77
[array addObject:str2];
return array;
}
• Some method names are available on a too broad set of types, such as the
description method, which is present in almost all classes. However, we
can construct a set of selectors that are called on a variable and then find a
type which responds to all of them.
• Properties and ivars contain their class types in the metadata. When we
find an access to a property or an ivar, we can infer the type of the assigned
or read variable. However, the actual variable can still be a superclass or
subclass of the property or ivar class.
5.8 Summary
The chapter discussed several features of the Objective-C language, runtime and
compiler which directly affect the design of a potential Objective-C decompiler.
Among other things, we have seen that there are specifics about the language and
resulting compiled code, which will make the job of a decompiler both easier and
harder:
• Various metadata about classes, methods and blocks can be used to recon-
struct proper method signatures and identify property accesses, instance
variable accesses, method calls and block instantiation and invocation.
78
• Blocks, fast enumeration, reference counting and high-level language syntax
are extra features that the decompiler must support, but all are recognizable
using pattern matching on appropriate decompilation levels.
• Function calls and property accesses are opaque statements to the compiler,
which means it cannot optimize beyond them.
• Naming of methods can be used to infer types of variables and parameters.
79
80
Chapter 6
Figure 6.1 shows how the main components interact. The analysis core acts
as a back-end component and the other components are possible front-ends to it.
The decompiler’s only software requirements are an installation of the OS
X operating system, 10.10 (named Yosemite) or newer, and an installation of
the Xcode development package (version 6 or higher). Both are provided free of
charge by Apple and can be installed on supported Mac computers. A packaged
distribution of the decompiler (an application bundle named Cricket.app) does
81
Figure 6.1: Main components of the Cricket decompiler.
not require any additional software, as it embeds all of its dependencies within
the package.
For local development, however, several libraries need to be present on the
developer’s machine:
The application’s README file contains instructions how to properly install these
software packages, how to run the GUI application, how to use the command-line
interface and how to run the provided test suite. The source code also contains a
script called deploy.sh which creates a stand-alone distribution of Cricket.app
as described above. A short user documentation is included as an appendix to
this thesis, and developer documentation is provided in the README file and also
as comments in the applications source code.
Several sample binary executables are provided as well to demonstrate the
capabilities of the decompiler. These include both synthetic examples which show
how individual language constructs are handled, as well as real-world optimized
binaries generated from popular open-source projects.
82
Naturally, the decompiler was written as an OS X application, because the
Objective-C language is extremely often tied to OS X and iOS development, most
of which is done on a desktop OS X system. Although Objective-C code can be
compiled to other binary formats, Mach-O is the only supported executable and
library format that can be loaded into Cricket. it is also the platform’s standard
format for OS X and iOS programs. It may seem that the software is too tied
to OS X, but Cricket was written in Python and with portability in mind. Most
of the analysis core is completely platform independent, and the GUI framework
used (Qt) also works on most major platforms. This means that creating a port
for other systems would require only small changes, for example in the binary
format parsing.
The choice of Python as the primary language to write a decompiler might
seem odd since traditional compilers and compiler tools which need to operate on
individual instructions usually use strongly-typed languages. There are several
reasons behind this: The Python+Qt+PyQt combination provides a very easy
way to design the GUI without writing code and to conveniently handle GUI
events, yet still being completely platform independent. Python is a dynamic
programming language which does not need the source code to be explicitly
compiled before running, which helps speed up development, especially compared
to larger projects in C++, which often suffer from long compilation times. Thirdly,
there is a huge number of readily available Python packages and bindings for
almost any area of software development, including disassemblers and other binary
analysis frameworks.
The GUI front-end provides an environment in which user can fulfill a complete
analysis workflow, starting by opening a binary file (executable or a library), select-
ing a class and method to analyze, overseeing the individual levels of decompilation
and generating the resulting source code and displaying it in an editor window
with syntax highlighting. In all the decompilation steps, the user can observe
and influence the intermediary results. This differs from other available products,
which usually do not allow the user to see unfinished decompilation outcome, but
it proved to be of tremendous help for the development of the tool itself, while
also giving expert users more options to get better decompilation results.
• Binary loading is responsible for parsing the binary file, finding individual
sections and segments, parsing the Objective-C metadata (class and method
lists, type information, finding block descriptor) and finding all possible
functions, including unnamed non-Obj-C procedures. It also analyzes ex-
ternal references (dynamically linked libraries). The techniques for this are
described in chapter 3.
For this purpose, Cricket uses several system-provided and 3rd-party tools,
because binary format is not the end goal of this thesis. For example,
83
Figure 6.2: Architecture of the analysis core.
the system tools named otool and dyldinfo are able to list sections, find
Objective-C classes and methods and list external symbols.
We also perform two heuristic passes over the code and data sections to detect
additional functions that are not described in the metadata, as described in
section 3.4.1. One pass looks for CALL instructions with a constant target
over the code section. The second pass skims the data sections and looks for
data that look like pointers into the code section, and for any such pointers
we perform a heuristic detection whether the code looks like the beginning of
a function (based on the first few instructions, e.g. a PUSH RBP; MOV RBP,
RSP is an extremely common function prologue on x86-64 but it is also very
unlikely to be used elsewhere). This detects functions accessed from virtual
tables and other data structures.
Based on the results, we cut the code section on each detected function
boundary to get a list of both named and unnamed procedures.
84
• Basic block detection uses the algorithm described in section 4.2.1 to
detect all basic blocks from the single list of assembly instructions. Special
analysis is done to detect jump tables (see section 4.2.4). A control-flow
graph with successor and predecessor lists is constructed.
85
that the results will help us resolve function calls and memory accesses. A
special optimization tries to promote accesses to a stack item into a local
variable, which certainly enhances the IR, but sometimes is not possible
(e.g. when a pointer to the stack item is used as an argument to a function
call). The data-flow analysis is then repeated and so on.
• User AST changes are then applied, if the user chooses to. There are
a lot of refactorings available to the user, starting from simple variable
renaming, to restructuring control-flow statements. These are used by a user
to further enhance the quality of the decompiled result, where the automatic
decompilation did not choose the best option. Local variable names are
one obvious thing where the decompilation will often provide sub-optimal
results. Allowing the user to post-process the source code at the AST level
can still offer a more convenient way of analyzing the code than a regular
text editor.
This gives a general overview of the Cricket decompiler and the individual
steps and representations that are used during the decompilation. The following
sections discuss the design and internals of some of the parts of Cricket.
86
the same compiler infrastructure to be reused for several source code languages
and several result architectures. An option to use some existing IR language and
infrastructure, instead of designing a custom intermediate instruction language,
was considered. For example, the LLVM IR [30] seems to be an interesting
option, because of advantages such as having a set of transforms that already
perform optimizations on the IR. However, it was decided that the benefits of
introducing a decompilation-specific custom IR would outweigh those provided by
the LLVM IR:
The actual syntax and semantics of µCode were designed to be very simple to
read and easy to generate. As a result, a single machine instruction often needs
to be translated to several µCode instructions, and it is the task of subsequent
optimizations to simplify the calculations. A function written in µCode has an
infinite number of registers available for its use, and each register has an
associated size in bytes. For example, a register “val.8” has a size of 8 bytes.
The smallest register size is 1 byte, which is also used to represent boolean 1-bit
variables. There is no maximum size of a register, but we will rarely use sizes
larger than 8 bytes (native register size on 64-bit architectures). Registers are
also called variables and there is no distinction between a variable and a register.
Each µCode instruction has an opcode starting with a lowercase letter “u”,
followed by uppercase mnemonic. For better readability, instruction opcodes also
show the size of the resulting value. Instructions always explicitly state their
input and output variables, with the exception for the uCALL instruction which
can be “unresolved” (see below). Unless specifically stated they have no side
effects and do not produce other outputs.
Let us take a look at a few examples of move instructions and basic integer
arithmetic operations:
uMOV.8 rbx.8 := 0x1 ; store constant value into "rbx"
uMOV.8 rax.8 := rbx.8 ; store the value of "rbx" into "rax"
uADD.8 var1.8 := rax.8 + rbx.8 ; add "rax"+"rbx", store result in "var1"
uSUB.8 var2.8 := rax.8 - 0x1 ; subtraction
uDIV.8 quotient.8 := rax.8 / 0x2 ; division
uMOD.8 remainder.8 := rax.8 % 0x2 ; modulo
uNOP ; does nothing
The registers used do not need to be explicitly declared and the first use
automatically declares the register and its size. The register sizes are final, for
example, once the program uses rax.8 as a register, it cannot use rax.4 later.
With the exception of function inputs, each register needs to be defined (by an
instruction that writes to the register) before it is used as an input to a subsequent
87
instruction. Constants do not explicitly state their sizes, they have an implicit
size based on the other inputs and outputs of the instruction.
Most instructions require their inputs and outputs to be of the same size. If
a smaller-sized variable is to be used in a larger-sized calculation, it needs to be
zero- or sign-extended, and similarly for truncation:
uEXTEND.8 larger.8 := EXTEND(smaller.1) ; zero-extending
uTRUNC.1 byte.1 := TRUNC(larger.8) ; truncating
Memory accesses are done by using the uSTORE and uLOAD instructions:
uMOV.8 ptr.8 := 0x10000896a ; stores a constant value to "ptr"
uSTORE.1 *(ptr.8) := b.1 ; stores "b" (1 byte) into "*ptr"
uLOAD.8 val.8 := *(otherptr.8) ; loads an 8-byte word into "val"
Notice that the size of the uSTORE instruction indicates how many bytes are
being written to. The pointer size is always the native register size on the used
architecture (8 bytes on 64-bit architectures and 4 bytes on 32-bit architectures),
but the size of the memory access can be different.
Function calls are done via the uCALL instruction. The syntax of this instruction
differs based on the call arguments and return type. If the function call does
not return anything, the instruction does not have any output register either. A
special case is an “unresolved” function call marked by “...” in the parameter
list. In this case, a subsequent analysis needs to resolve the call arguments (either
by matching the function from a database of function prototypes or by heuristic
analysis). Functions can also be called indirectly when the pointer to the function
is stored in a register.
uCALL result.8 := my_function(rdi.8, rsi.8) ; fully resolved call
uCALL other_function() ; void-returning function
uCALL result.8 := unresolved_function(...) ; unresolved function call
uCALL rax.8() ; indirect call
88
The uBRANCH instruction requires that the target label is a constant. To handle
switch statements, a uSWITCH instruction exists. The uRET instruction exits the
function and optionally returns a register as the result of the function.
As µCode is not meant to be manually written or being parsed, it is missing
some explicit information, such as the function header (with the inputs and
outputs and the appropriate registers). The Cricket decompiler keeps track of
those internally, and the editor provides these in generated comments.
Types in µCode are only specified in byte sizes and there is no distinction
between signed and unsigned types. There is no register aliasing and no subregister
accesses are allowed. When several registers need to form a larger structure, we will
just create the variable as a total size of the structures as well as uEXTRACTMEMBER
and uSETMEMBER with either integer offsets or field names of known structures.
Assuming the following is a known C structure:
struct two_ints {
long first_integer;
long second_integer;
};
We can then extract and set members of this structure either directly via
offsets or by names:
uCALL ret.16 := function_returning_two_ints()
uEXTRACTMEMBER.8 first.8 := EXTRACT(ret.16, 0x0)
uEXTRACTMEMBER.8 second.8 := EXTRACT(ret.16, 0x8)
uSETMEMBER.8 ret.16 := SET(ret.16, first_integer, 0x2a)
uSETMEMBER.8 ret.16 := SET(ret.16, second_integer, 0x29a)
Can be translated into this µCode (ignoring the obvious constant folding we
can immediately perform):
uMOV.8 temp1.8 := 0xc200
uMOV.8 temp2.8 := 0x10 ; decimal 16
uSHIFTLEFT.8 x14.8 := temp1.8 << temp2.8 ; perform the left shift
89
; x86-64:
MOV R14, QWORD PTR [RDX + R15 * 8]
; uCode:
uMOV.8 tempaddr.8 := rdx.8
uMOV.8 tempidx.8 := r15.8
uMUL.8 tempidx.8 := tempidx.8 * 0x8
uADD.8 tempaddr.8 := tempaddr.8 + tempidx.8
uLOAD.8 r14.8 := *(tempaddr.8)
Secondly, several machine instructions return more than one value, which is
done by writing to multiple registers by one instruction. Sometimes this can
be easily solved by splitting the instruction into more µCode instructions, for
example, the x86-64 integer division instruction (DIV) returns the quotient in RAX
and the remainder in RDX. In µCode this is represented by:
uDIV.8 rax.8 := src1.8 / src2.8
uMOD.8 rdx.8 := src1.8 % src2.8
; uCode:
uMUL.16 tempres.16 := rax.8 * rbx.8
uEXTRACTMEMBER.8 rax.8 := EXTRACTMEMBER(tempres.16, 0x0) ; low 8 bytes
uEXTRACTMEMBER.8 rdx.8 := EXTRACTMEMBER(tempres.16, 0x8) ; high 8 bytes
Alternatively, uTRUNC can be used to extract the lower 8 bytes of the result.
Arithmetic as well as explicit comparison instructions often set CPU flags.
The uFLAG instructions are generated to capture all of the semantics:
90
; x86-64:
TEST RCX, 0x1 ; sets SF, ZF and PF
; based on bitwise (RCX & 0x1)
; uCode:
uFLAG.1 sf.1 := SIGN(rcx.8 & 0x1)
uFLAG.1 zf.1 := ZERO(rcx.8 & 0x1)
uFLAG.1 pf.1 := PARITY(rcx.8 & 0x1)
In order to generate a reasonable high-level code, the stack item accesses must
be promoted to local variables (registers). Cricket performs a special optimization
pass over the IR to produce the following:
uMOV.8 stackitem_0x8.8 := 0x2a
uMOV.8 stackitem_0x10.8 := 0x29a
...
uMOV.8 rax.8 := stackitem_0x8.8
uRET rax.8
However, there are many restrictions that apply, otherwise, the transformation
can be invalid:
91
• For a stack item that we want to promote, we must be able to find all of its
uses. If we miss an access during the promotion, it will no longer access the
same variable, which will break the original semantics.
• Pointers to stack items can be used throughout the function and can enter
as arguments to function calls.
• Unresolved function calls can possibly access stack items.
• When a function call uses a stack item pointer, we must be able to distinguish
the extent of the accessed data. For example, if multiple stack items form a
structure or an array, the pointer to the beginning of the structure looks
the same as a pointer to the first item only.
In such cases, the data-flow analysis will still work properly and even though
the resulting source code will contain the asm statement, in most cases it will
not break the decompilation progress on other parts of the code.
92
; instruction is 5 bytes long
POP EAX ; EAX now contains the current PC
Cricket detects this pattern in the function prologue and explicitly stores the
PC to EAX when generating the IR.
The x86 architectures often make use of sub-registers, which allow accesses
to smaller-than-native parts of physical registers. This also includes writes to
sub-registers, which leave the rest of the register unaffected. However, there is an
exception to this rule on x86-64: Writes to 32-bit registers are zero-extended to
overwrite the full 64-bit registers. So the following two instructions are equal in
behavior on x86-64:
; x86-64:
XOR RAX, RAX ; zero out RAX
XOR EAX, EAX ; zero out EAX, but *also* zero the rest of RAX
The problem with such sequences is that we need to be able to find references
to code and data for various analysis, even before data-flow analysis is performed,
for example for the discovery of all function beginnings in a binary. Cricket
heuristically detects several of such patterns by looking for two- or three-instruction
sequences when skimming the code section.
Fortunately for decompilation, AArch64 does not contain predicated in-
structions, as the 32-bit ARM architectures do. With only a few exceptions (for
explicitly conditional instructions), only control-flow branches can be conditionally
executed.
6.2.7 Limitations
There are some limitations in the instructions supported by Cricket at this time:
93
• Floating-point support is limited to XMM registers on x86-64 and unsup-
ported on AArch64.
• Vector instructions are unsupported.
Various tasks and workflows are allowed because of this. A security researcher
can get a nice overview of an unknown binary program just by looking at available
classes, their methods, and signatures. It will be obvious at first sight whether the
program uses some code or symbol name obfuscation. Clicking a class (instead
of a method) produces a class dump, which lists all of its instance variables,
properties, and methods, further helping the user to understand the purpose and
structure of a particular class.
After basic-block detection is performed, a visual representation of the control-
flow graph is displayed, which offers an indication of how complex the logic of one
function is, as shown in figure 6.4.
94
Figure 6.4: Control-flow graph visualization in Cricket.
When assembly instructions are transformed into µCode, the result is shown
in an editor window which supports additional features, such as showing “expla-
nations” of what were the original instructions or showing definitions and uses of
inputs and outputs, which can be seen in figure 6.5. The editor also allows the
user to manually perform various transformations of the IR.
95
Figure 6.5: Explained generated IR with highlighted current line (blue), definitions
of used values (red lines) and uses of produced values (green lines).
• µCode tests are used to test that the generated and already optimized IR
contains the expected instructions. The following is an abbreviated example
of a µCode test. Note that instead of actual register names, we use a {.*}
regular expression, because we do not care about their names.
// MATCH-UCODE: uMUL.{4|8} {.*} := {.*} * 0x6ed
// MATCH-UCODE: uADD.{4|8} {.*} := {.*} + 0x79a1
// MATCH-UCODE: uRET {.*}
...
- (long)math_test:(long)arg {
long a = arg;
long b = a * 1773;
long c = b + 31137;
return c;
}
96
syntactically correct and compilable and that the behavior and semantics of
the original code are preserved.
6.6 Summary
We presented an implementation of Cricket, an Objective-C decompiler, which
supports the major Apple-supported architectures, compiler, and binary format.
We have shown the main design decisions and what is the architecture of the
complete program, the analysis core, and the GUI.
The next chapter will show how good the decompilation results are, compare
them to a competing product and discuss the goals of this thesis and how well
were they achieved.
97
98
Chapter 7
Evaluation
7.1 Methodology
The following software is used in the evaluation:
• The compiler used is Apple LLVM version 7.3.0 (clang-703.0.31),
which is included in Xcode 7.3.1 shipped in May 2016.
• OS X El Capitan 10.11.5 as the operating system.
• Hopper Disassembler 3.11.17 as a comparison competing product (latest
version as of July 2016).
• The tested code base is the most popular Objective-C open-source project
on GitHub: AFNetworking, with the git revision 2a53b2c3 (top of master
development as of July 2016).
• The current development version of Cricket.
We will build two versions of the open-source library:
• A debug, unoptimized build of the library for OS X (x86-64 architecture).
• A release, fully optimized build for OS X (x86-64 architecture).
Since Hopper does not support decompilation of ARM/AArch64 code, this
architecture is not included in the evaluation.
The compiled build of AFNetworking contains 20 classes with a total of 441
methods, which have the following properties:
• The library defines 51 distinct blocks. Blocks are used extensively through-
out the library both for internal (e.g. network callbacks are implemented
with blocks) and external purposes (e.g. the API expects the user to supply
a completion handler as a block).
99
• Most of the methods are property accessors generated by the compiler.
• Most of the methods consist of a single basic block (in the release build).
This includes all of the compiler-generated property accessors, but even if
we exclude them, most of the remaining methods still only have a single
basic block.
• In the source code, most functions have less that 10 lines of code.
• How many lines of code (excluding empty lines) does the original method
have? (column “L”)
• How many basic blocks does the compiled function have? If it uses blocks,
how many basic block do the defined blocks have? (column “BBs”)
• Is the high-level control-flow reconstructed? Is it exactly the same as the
source code, is it different but valid or is it incorrect? (column “CF”)
• Are the blocks recognized and integrated into the decompiled function?
(column “BR”)
• Are the variables and data, which the function manipulates, used correctly?
Is the decompiled data-flow correct or does it have a significant mistake?
(column “DF”)
• How many lines of code (excluding empty lines) does the decompiled result
have? In case the decompilation is missing a significant part of the method,
for example when it does not include a block’s definition, we leave this
metric blank. (column “LO”)
7.2 Results
This section presents the results of the evaluation. In the following tables, method
names have been shortened and class names omitted for brevity. The full list of
tested methods, including their full names, signatures and source codes is available
in appendix B, which also includes the full decompilation results from both Hopper
and Cricket. Some methods in the table have “0” source code lines, which means
that the method is compiler-generated and there is no corresponding source code.
Bold highlight means Cricket performed better in the test, italic highlight
means we performed worse.
The following table shows the results comparing Hopper and Cricket on a
debug build of AFNetworking:
Hopper Cricket
Method name L BBs CF BR DF LO CF BR DF LO
sharedManager 6 3 (+1) Valid Fail Fail - Exact Exact Exact 7
100
Hopper Cricket
Method name L BBs CF BR DF LO CF BR DF LO
managerForDomain 4 1 Exact - Fail 9 Exact - Valid 11
managerForAddress 4 1 Exact - Exact 7 Exact - Exact 6
isReachable 1 3 Valid - Valid 7 Valid - Valid 8
stopMonitoring 4 4 Valid - Exact 5 Exact - Valid 8
pinnedCertificates 0 1 Exact - Exact 2 Exact - Exact 2
GET 1 1 Exact - Fail 12 Exact - Valid 17
validatesDomainName 0 1 Exact - Valid 2 Exact - Exact 2
setValidatesDomainName 0 1 Exact - Exact 2 Exact - Exact 2
initWithBaseURL 1 1 Exact - Fail 8 Exact - Exact 1
init 10 4 Valid - Valid 38 Valid - Valid 24
invalidateSession 7 1 (+4) Exact Fail Fail - Exact Exact Valid 25
respondsToSelector 10 13 Valid - Valid 37 Valid - Valid 44
certificatesInBundle 7 9 Valid - Fail 45 Exact - Valid 25
The following table shows the results comparing Hopper and Cricket on a
release build of AFNetworking:
Hopper Cricket
Method name L BBs CF BR DF LO CF BR DF LO
sharedManager 6 3 (+1) Valid Fail Fail - Exact Exact Exact 6
managerForDomain 4 1 Exact - Valid 10 Exact - Valid 6
managerForAddress 4 1 Exact - Fail 6 Exact - Valid 5
isReachable 1 3 Valid - Valid 7 Valid - Valid 10
stopMonitoring 4 4 Valid - Valid 9 Exact - Valid 9
pinnedCertificates 0 1 Exact - Exact 2 Exact - Exact 2
GET 1 1 Exact - Exact 10 Exact - Valid 3
validatesDomainName 0 1 Exact - Valid 2 Exact - Exact 2
setValidatesDomainName 0 1 Exact - Exact 2 Exact - Exact 2
initWithBaseURL 1 1 Exact - Exact 2 Exact - Exact 1
init 10 3 Valid - Valid 24 Exact - Exact 19
invalidateSession 7 1 (+4) Exact Fail Fail - Exact Exact Valid 14
respondsToSelector 10 11 Fail - Valid - Valid - Valid 31
certificatesInBundle 7 9 Valid - Valid 38 Exact - Valid 22
7.3 Discussion
Let us now analyze the results from the tables above and the outputs in appendix
B. The first important thing to notice is that Hopper completely fails to analyze
functions that contain blocks within them. Although it supports decompiling them,
in this test project, any function containing a block completely confused the decom-
pilation, which even produced invalid statements, such as dispatch async(...,
NSConcreteStackBlock);. This is one area where Cricket performs significantly
better, and in all the evaluates method, blocks were always recognized correctly.
Secondly, Hopper sometimes fails to recognize the high-level control-flow
structures, and in these cases, it gives up completely and produces a flat function
with goto statements instead of all control flow. Surprisingly, this happens both
for debug and release builds. Cricket always recognizes at least some of the
control-flow statements, inserting only individual gotos. On top of that, Cricket’s
101
heuristic for early-returns produces a much more readable code, which in most
cases matches the control-flow of the original source code.
The number of cases where Hopper incorrectly assigns a variable or performs
a calculation on a wrong variable, is surprisingly high. Even more unexpectedly,
this happens more often in debug builds rather than release builds. This reason
for that seems to be that release builds often make much more use of registers
instead of stack items, and Hopper is very imprecise when dealing with local
variables stored on the stack. Explicitly, when an address of a stack variable is
used as a parameter into a function call, Hopper tends to produce wrong results.
This often leads to further errors in the output.
Cricket, on the other hand, supports this behavior better and in most cases, it
recognizes the data-flow of the methods correctly.
In terms of output line count, both Hopper and Cricket produce varying results.
Simple and short methods are usually decompiled into just a few lines of code and
are very readable. Larger methods tend to be more confusing to read because of
the missing local variable names.
Overall, we can conclude that Hopper decompiles significantly more correct
output, which is very important for manual reading of the decompiled output.
102
Chapter 8
Conclusion
The goal of the thesis was to create an interactive tool for decompilation of
Objective-C applications, with the purpose to support manual work with unknown
binaries (e.g. malware analysis). The implementation of our decompiler called
Cricket meets this goal and provides very interesting results in comparison with
current state-of-the-art competitor products. We have shown that we can pro-
duce much more readable, concise and correct decompilation outputs for typical
Objective-C programs.
The comparison shows that out decompiler can correctly analyze cases where
a major competitor fails. We support complex Objective-C structures, such as
blocks and for-in statements, which is a unique feature of our decompiler and
which greatly improves the results.
Because Cricket is an interactive GUI tool which shows the progress and
individual steps of the decompilation, it can be also used as a learning and
experimental environment. Students can learn the principles of decompiling, but
also how code transformations and common compiler theory algorithms work in
general.
Secondarily, this thesis provides a generic description of how Objective-C binary
programs are structured and how these structures can be recognized into high-level
language constructs. This can serve as valuable input for further research, for
example, protection scheme design (DRM).
103
• Supporting more CPU architectures. Cricket currently only supports
i386, x86-64 and AArch64 architectures.
104
Chapter 9
References
105
[15] Concepts in Objective-C Programming, https://developer.
apple.com/library/mac/documentation/General/Conceptual/
CocoaEncyclopedia/Introduction/Introduction.html.
[20] Richard Wartell, Yan Zhou, Kevin W. Hamlen, Murat Kantarcioglu, and
Bhavani Thuraisingham: Differentiating Code from Data in x86 Binaries.
http://www.utdallas.edu/~kxh060100/wartell-pkdd11.pdf.
[25] Procedure Call Standard for the ARM 64-bit Architecture (AArch64).
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/
IHI0055B_aapcs64.pdf.
106
[29] Boissinot, B. et al. Fast Liveness Checking for SSA-Form Pro-
grams. CGO 2008. http://www.rw.cdl.uni-saarland.de/~grund/
papers/cgo08-liveness.pdf.
[31] Van Emmerik, Michael. Static Single Assignment for Decompilation. Uni-
versity of Queensland, 2007. PhD Thesis.
[32] Cytron, Ron; Ferrante, Jeanne; Rosen, Barry K.; Wegman, Mark N. &
Zadeck, F. Kenneth. Efficiently computing static single assignment form
and the control dependence graph. ACM Transactions on Programming
Languages and Systems 13, 1991.
107
[42] Objective-C Automatic Reference Counting (ARC), Clang documentation.
http://clang.llvm.org/docs/AutomaticReferenceCounting.html.
108
List of Abbreviations
• BB — Basic Block
• GCC — GNU Compiler Collection, the major compiler for C and C++ on
Unix systems
109
• KVO — Key-value Observation
• Obj-C — Objective-C
• PC — Program Counter
110
Appendix A: Cricket User’s
Manual
Running Cricket
111
This window allows you to select which binary file to open in Cricket: You
can either select a recently-opened file, one of the demo binaries, or press the
“Browse. . . ” button to open a file selection dialog.
Main window
After a binary is selected, the main Cricket’s analysis window will be shown:
112
The left bar shows a list of all classes and methods found in the binary. You can
switch to view all functions (including C functions) using the “Functions” button.
The “Blocks” button switches the list to show all recognized blocks. “Symbols”
lists all symbols available in the binary.
In the class view you can use the “Show external classes” checkbox to also include
externally-linked classes in the list. “Flat view” switches the list from hierarchical
(where methods are shown as subitems to classes) to a simple flat list.
The next step is to select a method to decompile from the left-side list. After a
method is selected, its full disassembly listing will be shown:
113
Automatic decompilation
The easiest way to decompile a function is to press the “Auto” button in the
toolbar. This will invoke a fully automatic decompilation mode, which ana-
lyzes all referenced blocks and the function itself, and performs all the stages of
decompilation and display the decompilation result:
114
Manual decompilation
115
If the function contains any NOP instructions, you can click the “Strip NOPs”
button to remove them. Clicking the “-> BBs” button will perform basic block
detection and removal of low-level architecture specific idioms (function prologue
and epilogue), and show the results on the next page, “Basic Blocks”:
116
This lists all the basic blocks in the function and also indicates conditional and
unconditional jumps with arrows. You can click the “switch to graph” button to
show the basic block graph in a visual form:
117
The “Strip BBs” button will remove basic blocks that are unnecessary
(e.g. compiler-generated stack overflow protection). Clicking the “-> uCode”
button will transform the basic blocks and machine instructions into uCode:
In the uCode tab, there are several possible transformation you can perform:
• “Strip NOPs” removes all NOP instruction (they can be generated by other
transformations).
• “Simplify” performs constant folding and other expression simplification.
• “De-Spill” promotes stack variables to local registers.
• “Propagate” tries to propagate value definitions to their uses.
• “Eliminate” tried to remote instructions which are unnecessary.
• “ARC” performs removal of automatic reference counting function calls.
• “Patterns” matches several useful instruction patterns and transforms them
into simpler ones.
• “Resolve” tries to resolve function parameters and types.
All of these can be performed on the whole uCode by clicking the button in
the toolbar. To perform the transformation on a single instruction, select the
appropriate item from the “Instruction” menu bar.
Clicking the “-> CFG” button, a control flow reconstruction tab will be shown:
118
The task of manual decompilation is to collapse multiple nodes into simpler
control-flow statements. If you click a basic block, you will see what patterns
match this basic block in the “CFG” menu bar or by the orange icons in the right
bottom toolbar. Applying this match will simplify the CFG.
Once the CFG is a single node (all control-flow patterns are recognized), you can
click the “-> Source” button to generate the AST:
119
More transformations are available on the AST level:
• “Ivar Loads” will replace loads including ivar offsets with an explicit ivar
access.
• “Ivar Stores” will replace stored including ivar offsets with an explicit ivar
access.
• “objc msgSend” matches C calls to objc msgSend and replaces them with
the Objective-C syntax.
• “Embed Blocks” includes block bodies into the function.
• “Simplify Source Code” will perform various simplification of the AST
including removal of empty statements.
120
Appendix B: Evaluation Source
Codes and Results
1 // +[AFNetworkReachabilityManager managerForDomain:]
2 + (instancetype)managerForDomain:(NSString *)domain {
3 SCNetworkReachabilityRef reachability =
SCNetworkReachabilityCreateWithName(kCFAllocatorDefault, [domain
UTF8String]);
4
5 AFNetworkReachabilityManager *manager = [[self alloc] initWithReachability:
reachability];
6
7 CFRelease(reachability);
8
9 return manager;
10 }
1 // +[AFNetworkReachabilityManager managerForAddress:]
2 + (instancetype)managerForAddress:(const void *)address {
3 SCNetworkReachabilityRef reachability =
SCNetworkReachabilityCreateWithAddress(kCFAllocatorDefault, (const struct
sockaddr *)address);
4 AFNetworkReachabilityManager *manager = [[self alloc] initWithReachability:
reachability];
5
6 CFRelease(reachability);
7
8 return manager;
121
9 }
1 // -[AFNetworkReachabilityManager isReachable]
2 - (BOOL)isReachable {
3 return [self isReachableViaWWAN] || [self isReachableViaWiFi];
4 }
1 // -[AFNetworkReachabilityManager stopMonitoring]
2 - (void)stopMonitoring {
3 if (!self.networkReachability) {
4 return;
5 }
6
7 SCNetworkReachabilityUnscheduleFromRunLoop(self.networkReachability,
CFRunLoopGetMain(), kCFRunLoopCommonModes);
8 }
1 // -[AFSecurityPolicy pinnedCertificates]
2 // No source-code, autogenerated.
1 // -[AFHTTPSessionManager GET:parameters:success:failure:]
2 - (NSURLSessionDataTask *)GET:(NSString *)URLString
3 parameters:(id)parameters
4 success:(void (^)(NSURLSessionDataTask *task, id
responseObject))success
5 failure:(void (^)(NSURLSessionDataTask *task, NSError *
error))failure
6 {
7
8 return [self GET:URLString parameters:parameters progress:nil success:
success failure:failure];
9 }
1 // -[AFSecurityPolicy validatesDomainName]
2 // No source-code, autogenerated.
1 // -[AFSecurityPolicy setValidatesDomainName:]
2 // No source-code, autogenerated.
1 // -[AFHTTPSessionManager initWithBaseURL:]
2 - (instancetype)initWithBaseURL:(NSURL *)url {
3 return [self initWithBaseURL:url sessionConfiguration:nil];
4 }
122
1 // -[AFURLSessionManagerTaskDelegate init]
2 - (instancetype)init {
3 self = [super init];
4 if (!self) {
5 return nil;
6 }
7
8 self.mutableData = [NSMutableData data];
9 self.uploadProgress = [[NSProgress alloc] initWithParent:nil userInfo:nil
];
10 self.uploadProgress.totalUnitCount = NSURLSessionTransferSizeUnknown;
11
12 self.downloadProgress = [[NSProgress alloc] initWithParent:nil userInfo:
nil];
13 self.downloadProgress.totalUnitCount = NSURLSessionTransferSizeUnknown;
14 return self;
15 }
1 // -[AFURLSessionManager invalidateSessionCancelingTasks:]
2 - (void)invalidateSessionCancelingTasks:(BOOL)cancelPendingTasks {
3 dispatch_async(dispatch_get_main_queue(), ^{
4 if (cancelPendingTasks) {
5 [self.session invalidateAndCancel];
6 } else {
7 [self.session finishTasksAndInvalidate];
8 }
9 });
10 }
1 // -[AFURLSessionManager respondsToSelector:]
2 - (BOOL)respondsToSelector:(SEL)selector {
3 if (selector == @selector(URLSession:task:willPerformHTTPRedirection:
newRequest:completionHandler:)) {
4 return self.taskWillPerformHTTPRedirection != nil;
5 } else if (selector == @selector(URLSession:dataTask:didReceiveResponse:
completionHandler:)) {
6 return self.dataTaskDidReceiveResponse != nil;
7 } else if (selector == @selector(URLSession:dataTask:willCacheResponse:
completionHandler:)) {
8 return self.dataTaskWillCacheResponse != nil;
9 } else if (selector == @selector(
URLSessionDidFinishEventsForBackgroundURLSession:)) {
10 return self.didFinishEventsForBackgroundURLSession != nil;
11 }
12
13 return [[self class] instancesRespondToSelector:selector];
14 }
1 // +[AFSecurityPolicy certificatesInBundle:]
2 + (NSSet *)certificatesInBundle:(NSBundle *)bundle {
3 NSArray *paths = [bundle pathsForResourcesOfType:@"cer" inDirectory:@"."];
4
5 NSMutableSet *certificates = [NSMutableSet setWithCapacity:[paths count]];
123
6 for (NSString *path in paths) {
7 NSData *certificateData = [NSData dataWithContentsOfFile:path];
8 [certificates addObject:certificateData];
9 }
10
11 return [NSSet setWithSet:certificates];
12 }
1 // +[AFNetworkReachabilityManager managerForDomain:]
2 void * +[AFNetworkReachabilityManager managerForDomain:](void * self, void *
_cmd, void * arg2) {
3 objc_storeStrong(0x0, arg2);
4 var_20 = SCNetworkReachabilityCreateWithName(*_kCFAllocatorDefault, [
objc_retainAutorelease(0x0) UTF8String]);
5 var_28 = [[self alloc] initWithReachability:var_20];
6 CFRelease(var_20);
7 var_48 = [var_28 retain];
8 objc_storeStrong(var_28, 0x0);
9 objc_storeStrong(0x0, 0x0);
10 rax = [var_48 autorelease];
11 return rax;
12 }
1 // +[AFNetworkReachabilityManager managerForAddress:]
2 int +[AFNetworkReachabilityManager managerForAddress:](int arg0, int arg1, int
arg2) {
3 var_20 = SCNetworkReachabilityCreateWithAddress(*_kCFAllocatorDefault,
arg2, arg2);
4 var_28 = [[arg0 alloc] initWithReachability:var_20];
5 CFRelease(var_20);
6 var_38 = [var_28 retain];
7 objc_storeStrong(var_28, 0x0);
8 rax = [var_38 autorelease];
9 return rax;
10 }
124
1 // -[AFNetworkReachabilityManager isReachable]
2 char -[AFNetworkReachabilityManager isReachable](void * self, void * _cmd) {
3 var_8 = self;
4 var_19 = 0x1;
5 if (sign_extend_64([var_8 isReachableViaWWAN]) == 0x0) {
6 var_19 = sign_extend_64([var_8 isReachableViaWiFi]) != 0x0 ? 0x1 :
0x0;
7 }
8 rax = sign_extend_64(var_19 & 0x1 & 0xff);
9 return rax;
10 }
1 // -[AFNetworkReachabilityManager stopMonitoring]
2 void -[AFNetworkReachabilityManager stopMonitoring](void * self, void * _cmd)
{
3 var_8 = self;
4 if ([var_8 networkReachability] != 0x0) {
5 SCNetworkReachabilityUnscheduleFromRunLoop([var_8
networkReachability], CFRunLoopGetMain(), *_kCFRunLoopCommonModes);
6 }
7 return;
8 }
1 // -[AFSecurityPolicy pinnedCertificates]
2 void * -[AFSecurityPolicy pinnedCertificates](void * self, void * _cmd) {
3 rax = self->_pinnedCertificates;
4 return rax;
5 }
1 // -[AFHTTPSessionManager GET:parameters:success:failure:]
2 void * -[AFHTTPSessionManager GET:parameters:success:failure:](void * self,
void * _cmd, void * arg2, void * arg3, void * arg4, void * arg5) {
3 objc_storeStrong(0x0, arg2);
4 objc_storeStrong(0x0, arg3);
5 objc_storeStrong(0x0, arg4);
6 objc_storeStrong(0x0, arg5);
7 *rsp = 0x0;
8 var_78 = [[self GET:0x0 parameters:0x0 progress:0x0 success:0x0 failure:
stack[2031]] retain];
9 objc_storeStrong(0x0, 0x0);
10 objc_storeStrong(0x0, 0x0);
11 objc_storeStrong(0x0, 0x0);
12 objc_storeStrong(0x0, 0x0);
13 rax = [var_78 autorelease];
14 return rax;
15 }
1 // -[AFSecurityPolicy validatesDomainName]
2 char -[AFSecurityPolicy validatesDomainName](void * self, void * _cmd) {
3 rax = sign_extend_64(self->_validatesDomainName);
4 return rax;
125
5 }
1 // -[AFSecurityPolicy setValidatesDomainName:]
2 void -[AFSecurityPolicy setValidatesDomainName:](void * self, void * _cmd,
char arg2) {
3 self->_validatesDomainName = arg2;
4 return;
5 }
1 // -[AFHTTPSessionManager initWithBaseURL:]
2 void * -[AFHTTPSessionManager initWithBaseURL:](void * self, void * _cmd, void
* arg2) {
3 objc_storeStrong(var_18, arg2);
4 rax = [self initWithBaseURL:0x0 sessionConfiguration:0x0];
5 var_8 = rax;
6 var_20 = [rax retain];
7 objc_storeStrong(0x0, 0x0);
8 objc_storeStrong(var_8, 0x0);
9 rax = var_20;
10 return rax;
11 }
1 // -[AFURLSessionManagerTaskDelegate init]
2 void * -[AFURLSessionManagerTaskDelegate init](void * self, void * _cmd) {
3 rax = var_28;
4 rax = [[rax super] init];
5 var_10 = rax;
6 objc_storeStrong(0x0, rax);
7 if (var_10 == 0x0) {
8 var_8 = 0x0;
9 }
10 else {
11 rax = [NSMutableData data];
12 rax = [rax retain];
13 var_40 = rax;
14 [var_10 setMutableData:rax];
15 [var_40 release];
16 rax = [NSProgress alloc];
17 rax = [rax initWithParent:0x0 userInfo:rcx];
18 var_50 = rax;
19 [var_10 setUploadProgress:rax];
20 [var_50 release];
21 rax = [var_10 uploadProgress];
22 rax = [rax retain];
23 var_58 = rax;
24 [rax setTotalUnitCount:*_NSURLSessionTransferSizeUnknown];
25 [var_58 release];
26 rax = [NSProgress alloc];
27 rax = [rax initWithParent:0x0 userInfo:rcx];
28 var_68 = rax;
29 [var_10 setDownloadProgress:rax];
30 [var_68 release];
31 rax = [var_10 downloadProgress];
126
32 rax = [rax retain];
33 var_70 = rax;
34 [rax setTotalUnitCount:*_NSURLSessionTransferSizeUnknown];
35 [var_70 release];
36 var_8 = [var_10 retain];
37 }
38 objc_storeStrong(var_10, 0x0);
39 rax = var_8;
40 return rax;
41 }
1 // -[AFURLSessionManager invalidateSessionCancelingTasks:]
2 void -[AFURLSessionManager invalidateSessionCancelingTasks:](void * self, void
* _cmd, char arg2) {
3 var_50 = [objc_retainAutoreleaseReturnValue(__dispatch_main_q) retain];
4 [self retain];
5 dispatch_async(var_50, __NSConcreteStackBlock);
6 [var_50 release];
7 objc_storeStrong(var_40 + 0x20, 0x0);
8 return;
9 }
1 // -[AFURLSessionManager respondsToSelector:]
2 char -[AFURLSessionManager respondsToSelector:](void * self, void * _cmd, void
* arg2) {
3 var_10 = self;
4 var_20 = arg2;
5 if (var_20 == @selector(URLSession:task:willPerformHTTPRedirection:
newRequest:completionHandler:)) {
6 rax = [var_10 taskWillPerformHTTPRedirection];
7 rax = [rax retain];
8 var_1 = (rax != 0x0 ? 0x1 : 0x0) & 0x1 & 0xff;
9 [rax release];
10 }
11 else {
12 if (var_20 == @selector(URLSession:dataTask:didReceiveResponse:
completionHandler:)) {
13 rax = [var_10 dataTaskDidReceiveResponse];
14 rax = [rax retain];
15 var_1 = (rax != 0x0 ? 0x1 : 0x0) & 0x1 & 0xff;
16 [rax release];
17 }
18 else {
19 if (var_20 == @selector(URLSession:dataTask:
willCacheResponse:completionHandler:)) {
20 rax = [var_10 dataTaskWillCacheResponse];
21 rax = [rax retain];
22 var_1 = (rax != 0x0 ? 0x1 : 0x0) & 0x1 & 0xff;
23 [rax release];
24 }
25 else {
26 if (var_20 == @selector(
URLSessionDidFinishEventsForBackgroundURLSession:)) {
27 rax = [var_10
didFinishEventsForBackgroundURLSession];
127
28 rax = [rax retain];
29 var_1 = (rax != 0x0 ? 0x1 : 0x0) & 0x1 & 0
xff;
30 [rax release];
31 }
32 else {
33 var_1 = [[var_10 class]
instancesRespondToSelector:var_20];
34 }
35 }
36 }
37 }
38 rax = sign_extend_64(var_1);
39 return rax;
40 }
1 // +[AFSecurityPolicy certificatesInBundle:]
2 void * +[AFSecurityPolicy certificatesInBundle:](void * self, void * _cmd,
void * arg2) {
3 var_8 = *___stack_chk_guard;
4 objc_storeStrong(0x0, arg2);
5 var_A8 = [[0x0 pathsForResourcesOfType:@"cer" inDirectory:@"."] retain];
6 var_B0 = [[NSMutableSet setWithCapacity:[var_A8 count]] retain];
7 memset(var_F8, 0x0, 0x40);
8 rax = [var_A8 retain];
9 var_110 = rax;
10 rax = [rax countByEnumeratingWithState:var_F8 objects:var_88 count:0x10];
11 var_118 = rax;
12 if (rax != 0x0) {
13 var_120 = *var_E8;
14 var_128 = var_F8 + 0x10;
15 var_130 = 0x0;
16 var_138 = var_118;
17 do {
18 do {
19 var_140 = var_138;
20 var_148 = var_130;
21 if (**var_128 != var_120) {
22 objc_enumerationMutation(var_110);
23 }
24 var_100 = [[NSData dataWithContentsOfFile:*(var_F0
+ var_148 * 0x8)] retain];
25 [var_B0 addObject:var_100];
26 objc_storeStrong(var_100, 0x0);
27 var_138 = var_140;
28 var_130 = var_148 + 0x1;
29 } while (var_148 + 0x1 < var_140);
30 rax = [var_110 countByEnumeratingWithState:var_F8 objects:
var_88 count:0x10];
31 var_130 = 0x0;
32 var_138 = rax;
33 } while (rax != 0x0);
34 }
35 [var_110 release];
36 var_150 = [[NSSet setWithSet:var_B0] retain];
37 objc_storeStrong(var_B0, 0x0);
38 objc_storeStrong(var_A8, 0x0);
128
39 objc_storeStrong(0x0, 0x0);
40 var_158 = [var_150 autorelease];
41 if (*___stack_chk_guard == var_8) {
42 rax = var_158;
43 }
44 else {
45 rax = __stack_chk_fail();
46 }
47 return rax;
48 }
1 // +[AFNetworkReachabilityManager managerForDomain:]
2 void * +[AFNetworkReachabilityManager managerForDomain:](void * self, void *
_cmd, void * arg2) {
3 r15 = *_kCFAllocatorDefault;
4 r12 = [arg2 retain];
5 rbx = [objc_retainAutorelease(arg2) UTF8String];
6 [r12 release];
7 rbx = SCNetworkReachabilityCreateWithName(r15, rbx);
8 r14 = [[self alloc] initWithReachability:rbx];
9 CFRelease(rbx);
10 rdi = r14;
11 rax = [rdi autorelease];
12 return rax;
13 }
1 // +[AFNetworkReachabilityManager managerForAddress:]
2 int +[AFNetworkReachabilityManager managerForAddress:](int arg0) {
3 rbx = SCNetworkReachabilityCreateWithAddress(*_kCFAllocatorDefault, rdx);
4 r14 = [[arg0 alloc] initWithReachability:rbx];
5 CFRelease(rbx);
6 rdi = r14;
7 rax = [rdi autorelease];
8 return rax;
9 }
1 // -[AFNetworkReachabilityManager isReachable]
2 char -[AFNetworkReachabilityManager isReachable](void * self, void * _cmd) {
129
3 rbx = self;
4 rcx = 0x1;
5 if ([self isReachableViaWWAN] == 0x0) {
6 rcx = [rbx isReachableViaWiFi] != 0x0 ? 0x1 : 0x0;
7 }
8 rax = rcx & 0xff;
9 return rax;
10 }
1 // -[AFNetworkReachabilityManager stopMonitoring]
2 void -[AFNetworkReachabilityManager stopMonitoring](void * self, void * _cmd)
{
3 rbx = self;
4 r14 = @selector(networkReachability);
5 if (_objc_msgSend(self, r14) != 0x0) {
6 rbx = _objc_msgSend(rbx, r14);
7 rax = CFRunLoopGetMain();
8 rdi = rbx;
9 SCNetworkReachabilityUnscheduleFromRunLoop(rdi, rax, *
_kCFRunLoopCommonModes, _kCFRunLoopCommonModes);
10 }
11 return;
12 }
1 // -[AFSecurityPolicy pinnedCertificates]
2 void * -[AFSecurityPolicy pinnedCertificates](void * self, void * _cmd) {
3 rax = self->_pinnedCertificates;
4 return rax;
5 }
1 // -[AFHTTPSessionManager GET:parameters:success:failure:]
2 void * -[AFHTTPSessionManager GET:parameters:success:failure:](void * self,
void * _cmd, void * arg2, void * arg3, void * arg4, void * arg5) {
3 r12 = [arg2 retain];
4 r13 = [arg3 retain];
5 rbx = [arg4 retain];
6 r14 = [self GET:r12 parameters:r13 progress:0x0 success:rbx failure:arg5];
7 [rbx release];
8 [r13 release];
9 [r12 release];
10 rax = [r14 retain];
11 rax = [rax autorelease];
12 return rax;
13 }
1 // -[AFSecurityPolicy validatesDomainName]
2 char -[AFSecurityPolicy validatesDomainName](void * self, void * _cmd) {
3 rax = sign_extend_64(self->_validatesDomainName);
4 return rax;
5 }
130
1 // -[AFSecurityPolicy setValidatesDomainName:]
2 void -[AFSecurityPolicy setValidatesDomainName:](void * self, void * _cmd,
char arg2) {
3 self->_validatesDomainName = arg2;
4 return;
5 }
1 // -[AFHTTPSessionManager initWithBaseURL:]
2 void * -[AFHTTPSessionManager initWithBaseURL:](void * self, void * _cmd, void
* arg2) {
3 rax = [self initWithBaseURL:arg2 sessionConfiguration:0x0];
4 return rax;
5 }
1 // -[AFURLSessionManagerTaskDelegate init]
2 void * -[AFURLSessionManagerTaskDelegate init](void * self, void * _cmd) {
3 r14 = [[self super] init];
4 rbx = 0x0;
5 if (r14 != 0x0) {
6 rbx = [[NSMutableData data] retain];
7 [r14 setMutableData:rbx];
8 [rbx release];
9 rbx = [[NSProgress alloc] initWithParent:0x0 userInfo:0x0];
10 [r14 setUploadProgress:rbx];
11 [rbx release];
12 rbx = [[r14 uploadProgress] retain];
13 r12 = *_NSURLSessionTransferSizeUnknown;
14 [rbx setTotalUnitCount:r12];
15 [rbx release];
16 rbx = [[NSProgress alloc] initWithParent:0x0 userInfo:0x0];
17 [r14 setDownloadProgress:rbx];
18 [rbx release];
19 rbx = [[r14 downloadProgress] retain];
20 [rbx setTotalUnitCount:r12];
21 [rbx release];
22 rbx = [r14 retain];
23 }
24 [r14 release];
25 rax = rbx;
26 return rax;
27 }
1 // -[AFURLSessionManager invalidateSessionCancelingTasks:]
2 void -[AFURLSessionManager invalidateSessionCancelingTasks:](void * self, void
* _cmd, char arg2) {
3 var_10 = [self retain];
4 dispatch_async(__dispatch_main_q, __NSConcreteStackBlock);
5 [var_10 release];
6 return;
7 }
131
1 // -[AFURLSessionManager respondsToSelector:]
2 char -[AFURLSessionManager respondsToSelector:](void * self, void * _cmd, void
* arg2) {
3 rdi = self;
4 rbx = arg2;
5 if (@selector(URLSession:task:willPerformHTTPRedirection:newRequest:
completionHandler:) == rbx) goto loc_b75b;
6
7 loc_b71b:
8 if (@selector(URLSession:dataTask:didReceiveResponse:completionHandler:)
== rbx) goto loc_b764;
9
10 loc_b724:
11 if (@selector(URLSession:dataTask:willCacheResponse:completionHandler:) ==
rbx) goto loc_b76d;
12
13 loc_b72d:
14 if (@selector(URLSessionDidFinishEventsForBackgroundURLSession:) == rbx)
goto loc_b776;
15
16 loc_b736:
17 rbx = [[rdi class] instancesRespondToSelector:rbx];
18 goto loc_b79a;
19
20 loc_b79a:
21 rax = sign_extend_64(rbx);
22 return rax;
23
24 loc_b776:
25 rsi = @selector(didFinishEventsForBackgroundURLSession);
26 goto loc_b77d;
27
28 loc_b77d:
29 rax = _objc_msgSend(rdi, rsi);
30 rax = [rax retain];
31 rbx = rax != 0x0 ? 0x1 : 0x0;
32 [rax release];
33 goto loc_b79a;
34
35 loc_b76d:
36 rsi = @selector(dataTaskWillCacheResponse);
37 goto loc_b77d;
38
39 loc_b764:
40 rsi = @selector(dataTaskDidReceiveResponse);
41 goto loc_b77d;
42
43 loc_b75b:
44 rsi = @selector(taskWillPerformHTTPRedirection);
45 goto loc_b77d;
46 }
1 // +[AFSecurityPolicy certificatesInBundle:]
2 void * +[AFSecurityPolicy certificatesInBundle:](void * self, void * _cmd,
void * arg2) {
3 var_30 = *___stack_chk_guard;
4 r14 = [[arg2 pathsForResourcesOfType:@"cer" inDirectory:@"."] retain];
132
5 var_F8 = [[NSMutableSet setWithCapacity:[r14 count]] retain];
6 intrinsic_movaps(var_C0, 0x0);
7 intrinsic_movaps(var_D0, 0x0);
8 var_E0 = intrinsic_movaps(var_E0, 0x0);
9 var_F0 = intrinsic_movaps(var_F0, 0x0);
10 rax = [r14 retain];
11 var_100 = rax;
12 r15 = [rax countByEnumeratingWithState:var_F0 objects:var_B0 count:0x10];
13 if (r15 != 0x0) {
14 r13 = *var_E0;
15 do {
16 r12 = 0x0;
17 do {
18 if (*var_E0 != r13) {
19 objc_enumerationMutation(var_100);
20 }
21 r14 = [[NSData dataWithContentsOfFile:*(var_E8 +
r12 * 0x8)] retain];
22 [var_F8 addObject:r14];
23 [r14 release];
24 r12 = r12 + 0x1;
25 } while (r12 < r15);
26 r15 = [var_100 countByEnumeratingWithState:var_F0 objects:
var_B0 count:0x10];
27 } while (r15 != 0x0);
28 }
29 [var_100 release];
30 r15 = [[NSSet setWithSet:var_F8] retain];
31 [var_F8 release];
32 [var_100 release];
33 if (*___stack_chk_guard == var_30) {
34 rdi = r15;
35 rax = [rdi autorelease];
36 }
37 else {
38 rax = __stack_chk_fail();
39 }
40 return rax;
41 }
133
1 // +[AFNetworkReachabilityManager managerForDomain:]
2 + (id)managerForDomain:(id)arg_10 {
3 long var_18 = 0;
4 long temp_3 = &(var_18);
5 *(temp_3) = arg_10;
6 id rax = [var_18 UTF8String];
7 rax = _SCNetworkReachabilityCreateWithName(*(_kCFAllocatorDefault), rax);
8 long var_20 = rax;
9 rax = [self alloc];
10 [rax initWithReachability:var_20];
11 _CFRelease(var_20);
12 *(temp_3) = 0;
13 return 0;
14 }
1 // +[AFNetworkReachabilityManager managerForAddress:]
2 + (id)managerForAddress:(long)arg_10 {
3 long rax = _SCNetworkReachabilityCreateWithAddress(*(_kCFAllocatorDefault),
arg_10);
4 long var_20 = rax;
5 rax = [self alloc];
6 [rax initWithReachability:var_20];
7 _CFRelease(var_20);
8 return 0;
9 }
1 // -[AFNetworkReachabilityManager isReachable]
2 - (char)isReachable {
3 id rax = [self isReachableViaWWAN];
4 long rcx = rcx && 18446744073709551360 || 1;
5 if (!(rax == 0)) {
6 [self isReachableViaWiFi];
7 }
8 rcx = rcx && 1;
9 rax = rcx;
10 return rax;
11 }
1 // -[AFNetworkReachabilityManager stopMonitoring]
2 - (void)stopMonitoring {
3 long var_8 = self;
4 id rax = [self networkReachability];
5 if (!(rax == 0)) {
6 } else {
7 [var_8 networkReachability];
8 rax = _CFRunLoopGetMain(var_8, @selector(networkReachability));
9 _SCNetworkReachabilityUnscheduleFromRunLoop(rax, rax);
10 }
11 }
1 // -[AFSecurityPolicy pinnedCertificates]
2 - (id)pinnedCertificates {
134
3 long temp_7 = self + _OBJC_IVAR_$_AFSecurityPolicy._pinnedCertificates;
4 return self->_pinnedCertificates;
5 }
1 // -[AFHTTPSessionManager GET:parameters:success:failure:]
2 - (id)GET:(id)arg_10 parameters:(id)arg_18 success:(void *)arg_20 failure:(
void *)arg_28 {
3 long var_18 = 0;
4 long temp_3 = &(var_18);
5 *(temp_3) = arg_10;
6 long var_20 = 0;
7 long temp_9 = &(var_20);
8 *(temp_9) = arg_18;
9 long var_28 = 0;
10 long temp_14 = &(var_28);
11 *(temp_14) = arg_20;
12 long var_30 = 0;
13 long temp_19 = &(var_30);
14 *(temp_19) = arg_28;
15 *(temp_19) = 0;
16 *(temp_14) = 0;
17 *(temp_9) = 0;
18 *(temp_3) = 0;
19 return [self GET:var_18 parameters:var_20 progress:0 success:var_28
failure:var_30];
20 }
1 // -[AFSecurityPolicy validatesDomainName]
2 - (char)validatesDomainName {
3 long temp_7 = self + _OBJC_IVAR_$_AFSecurityPolicy._validatesDomainName;
4 return self->_validatesDomainName;
5 }
1 // -[AFSecurityPolicy setValidatesDomainName:]
2 - (void)setValidatesDomainName:(char)arg_10 {
3 long temp_17 = self + _OBJC_IVAR_$_AFSecurityPolicy._validatesDomainName;
4 self->_validatesDomainName = arg_10;
5 }
1 // -[AFHTTPSessionManager initWithBaseURL:]
2 - (id)initWithBaseURL:(id)arg_10 {
3 return [0 initWithBaseURL:arg_10 sessionConfiguration:0];
4 }
1 // -[AFURLSessionManagerTaskDelegate init]
2 - (id)init {
3 long temp_28;
4 long var_38;
5 long var_8;
6 long temp_0 = &(0);
135
7 id rax = [super init];
8 long var_10 = rax;
9 if (!(rax == 0)) {
10 var_8 = 0;
11 } else {
12 var_38->off_0 = var_10;
13 rax = [NSMutableData data];
14 <<TODO uGETMEMBER.8 temp_28.8 := var_38.12[0x0]>>;
15 [temp_28 setMutableData:rax];
16 rax = [[NSProgress alloc] initWithParent:0 userInfo:0];
17 [var_10 setUploadProgress:rax];
18 rax = [var_10 uploadProgress];
19 [rax setTotalUnitCount:*(_NSURLSessionTransferSizeUnknown)];
20 rax = [[NSProgress alloc] initWithParent:0 userInfo:0];
21 [var_10 setDownloadProgress:rax];
22 rax = [var_10 downloadProgress];
23 [rax setTotalUnitCount:*(_NSURLSessionTransferSizeUnknown)];
24 var_8 = var_10;
25 }
26 return var_8;
27 }
1 // -[AFURLSessionManager invalidateSessionCancelingTasks:]
2 - (void)invalidateSessionCancelingTasks:(char)arg_10 {
3 long temp_0 = arg_10;
4 _dispatch_async(__dispatch_main_q, ^() {
5 long rax;
6 long temp_2 = &(temp_0);
7 long var_18 = block_literal;
8 if (temp_0 == 0) {
9 rax = [*(var_18 + 32) session];
10 [rax invalidateAndCancel];
11 } else {
12 rax = [*(var_18 + 32) session];
13 [rax finishTasksAndInvalidate];
14 }
15 });
16 *(^() {
17 long rax;
18 long temp_2 = &(temp_0);
19 long var_18 = block_literal;
20 if (temp_0 == 0) {
21 rax = [*(var_18 + 32) session];
22 [rax invalidateAndCancel];
23 } else {
24 rax = [*(var_18 + 32) session];
25 [rax finishTasksAndInvalidate];
26 }
27 }) = 0;
28 }
1 // -[AFURLSessionManager respondsToSelector:]
2 - (char)respondsToSelector:(char *)arg_10 {
3 long temp_50;
4 long var_10;
136
5 long temp_88;
6 long temp_71;
7 long temp_29;
8 long temp_8;
9 var_10->off_0 = self;
10 long var_20 = arg_10;
11 BOOL zf = arg_10 == @selector(URLSession:task:willPerformHTTPRedirection:
newRequest:completionHandler:);
12 BOOL branch_condition = !(zf);
13 if (branch_condition) {
14 <<TODO uGETMEMBER.8 temp_8.8 := var_10.15[0x0]>>;
15 rax = [temp_8 taskWillPerformHTTPRedirection];
16 zf = rax == 0;
17 branch_condition = !(zf);
18 arg_10 = branch_condition && 1;
19 } else {
20 zf = var_20 == @selector(URLSession:dataTask:didReceiveResponse:
completionHandler:);
21 branch_condition = !(zf);
22 if (branch_condition) {
23 zf = var_20 == @selector(URLSession:dataTask:willCacheResponse:
completionHandler:);
24 branch_condition = !(zf);
25 if (branch_condition) {
26 zf = var_20 == @selector(
URLSessionDidFinishEventsForBackgroundURLSession:);
27 branch_condition = !(zf);
28 if (branch_condition) {
29 <<TODO uGETMEMBER.8 temp_71.8 := var_10.15[0x0]>>;
30 [temp_71 didFinishEventsForBackgroundURLSession];
31 } else {
32 <<TODO uGETMEMBER.8 temp_88.8 := var_10.15[0x0]>>;
33 rax = [temp_88 class];
34 [rax instancesRespondToSelector:var_20];
35 }
36 } else {
37 <<TODO uGETMEMBER.8 temp_50.8 := var_10.15[0x0]>>;
38 [temp_50 dataTaskWillCacheResponse];
39 }
40 } else {
41 <<TODO uGETMEMBER.8 temp_29.8 := var_10.15[0x0]>>;
42 [temp_29 dataTaskDidReceiveResponse];
43 }
44 }
45 long rax = arg_10;
46 return rax;
47 }
1 // +[AFSecurityPolicy certificatesInBundle:]
2 + (id)certificatesInBundle:(id)arg_10 {
3 long temp_94;
4 long var_88;
5 long var_148;
6 BOOL cf;
7 long temp_75;
8 long var_130;
9 long var_f8;
137
10 id rax = [arg_10 pathsForResourcesOfType:@"cer" inDirectory:@"."];
11 rax = [rax count];
12 rax = [NSMutableSet setWithCapacity:rax];
13 long var_b0 = rax;
14 _memset(&(var_f8), 0, 64);
15 long temp_32 = &(var_f8);
16 long temp_33 = &(var_88);
17 long var_110 = rax;
18 BOOL zf = rax == 0;
19 for (id temp_75 in rax) {
20 rax = [NSData dataWithContentsOfFile:temp_75];
21 [var_b0 addObject:rax];
22 temp_94 = var_148 + 1;
23 cf = temp_94 > rax;
24 var_130 = temp_94;
25 }
26 rax = [NSSet setWithSet:var_b0];
27 return rax;
28 }
1 // +[AFNetworkReachabilityManager managerForDomain:]
2 + (id)managerForDomain:(id)arg_10 {
3 id rax = [arg_10 UTF8String];
4 rax = _SCNetworkReachabilityCreateWithName(*(_kCFAllocatorDefault), rax);
5 long rbx = rax;
6 rax = [[self alloc] initWithReachability:rbx];
7 _CFRelease(rbx);
8 return rax;
9 }
1 // +[AFNetworkReachabilityManager managerForAddress:]
2 + (id)managerForAddress:(long)arg_10 {
3 long rax = _SCNetworkReachabilityCreateWithAddress(*(_kCFAllocatorDefault),
arg_10);
4 long rbx = rax;
5 rax = [[self alloc] initWithReachability:rbx];
6 _CFRelease(rbx);
7 return rax;
8 }
138
1 // -[AFNetworkReachabilityManager isReachable]
2 - (char)isReachable {
3 id rax = [self isReachableViaWWAN];
4 long zf = rax;
5 BOOL branch_condition = !(zf);
6 if (branch_condition) {
7 rax = [self isReachableViaWiFi];
8 zf = rax;
9 branch_condition = !(zf);
10 }
11 rax = branch_condition;
12 return rax;
13 }
1 // -[AFNetworkReachabilityManager stopMonitoring]
2 - (void)stopMonitoring {
3 long rbx = self;
4 id rax = [self networkReachability];
5 if (rax) {
6 return;
7 }
8 rax = [rbx networkReachability];
9 rbx = rax;
10 rax = _CFRunLoopGetMain();
11 _SCNetworkReachabilityUnscheduleFromRunLoop(rbx, rax, *(
_kCFRunLoopCommonModes), _kCFRunLoopCommonModes);
12 }
1 // -[AFSecurityPolicy pinnedCertificates]
2 - (id)pinnedCertificates {
3 long temp_3 = self + _OBJC_IVAR_$_AFSecurityPolicy._pinnedCertificates;
4 return self->_pinnedCertificates;
5 }
1 // -[AFHTTPSessionManager GET:parameters:success:failure:]
2 - (id)GET:(id)arg_10 parameters:(id)arg_18 success:(void *)arg_20 failure:(
void *)arg_28 {
3 long var_30;
4 var_30->off_0 = self;
5 return [var_30->off_0 GET:arg_10 parameters:arg_18 progress:0 success:
arg_20 failure:arg_28];
6 }
1 // -[AFSecurityPolicy validatesDomainName]
2 - (char)validatesDomainName {
3 long temp_3 = self + _OBJC_IVAR_$_AFSecurityPolicy._validatesDomainName;
4 return self->_validatesDomainName;
5 }
139
1 // -[AFSecurityPolicy setValidatesDomainName:]
2 - (void)setValidatesDomainName:(char)arg_10 {
3 long temp_4 = self + _OBJC_IVAR_$_AFSecurityPolicy._validatesDomainName;
4 self->_validatesDomainName = arg_10;
5 }
1 // -[AFHTTPSessionManager initWithBaseURL:]
2 - (id)initWithBaseURL:(id)arg_10 {
3 return [self initWithBaseURL:arg_10 sessionConfiguration:0];
4 }
1 // -[AFURLSessionManagerTaskDelegate init]
2 - (id)init {
3 long temp_30;
4 long temp_6 = &(self);
5 id rax = [super init];
6 long r14 = rax;
7 if (rax) {
8 rax = [NSMutableData data];
9 [r14 setMutableData:rax];
10 rax = [[NSProgress alloc] initWithParent:0 userInfo:0];
11 [r14 setUploadProgress:rax];
12 rax = [r14 uploadProgress];
13 temp_30 = *(_NSURLSessionTransferSizeUnknown);
14 [rax setTotalUnitCount:temp_30];
15 rax = [[NSProgress alloc] initWithParent:0 userInfo:0];
16 [r14 setDownloadProgress:rax];
17 rax = [r14 downloadProgress];
18 [rax setTotalUnitCount:temp_30];
19 rax = r14;
20 }
21 return rax;
22 }
1 // -[AFURLSessionManager invalidateSessionCancelingTasks:]
2 - (void)invalidateSessionCancelingTasks:(char)arg_10 {
3 long temp_9 = arg_10;
4 _dispatch_async(__dispatch_main_q, ^() {
5 long rsi;
6 long temp_0 = &(temp_9);
7 id rax = [*(&(self)) session];
8 id rbx = rax;
9 if (temp_9 == 0) {
10 rsi = @selector(finishTasksAndInvalidate);
11 } else {
12 rsi = @selector(invalidateAndCancel);
13 }
14 block_literal = rbx;
15 rax = _objc_msgSend(rbx, rsi);
16 });
17 }
140
1 // -[AFURLSessionManager respondsToSelector:]
2 - (char)respondsToSelector:(char *)arg_10 {
3 long temp_14;
4 long rbx = arg_10;
5 BOOL zf = @selector(URLSession:task:willPerformHTTPRedirection:newRequest:
completionHandler:) == arg_10;
6 if (zf) {
7 _cmd = @selector(taskWillPerformHTTPRedirection);
8 } else {
9 zf = @selector(URLSession:dataTask:didReceiveResponse:
completionHandler:) == rbx;
10 if (zf) {
11 _cmd = @selector(dataTaskDidReceiveResponse);
12 } else {
13 zf = @selector(URLSession:dataTask:willCacheResponse:
completionHandler:) == rbx;
14 if (zf) {
15 zf = @selector(
URLSessionDidFinishEventsForBackgroundURLSession:) == rbx;
16 if (zf) {
17 rax = [self class];
18 rax = [rax instancesRespondToSelector:rbx];
19 temp_14 = rax;
20 temp_32 = temp_14;
21 rax = temp_32;
22 return rax;
23 }
24 _cmd = @selector(didFinishEventsForBackgroundURLSession);
25 } else {
26 _cmd = @selector(dataTaskWillCacheResponse);
27 }
28 }
29 }
30 long rax = _objc_msgSend(self, _cmd);
31 long temp_32 = temp_14;
32 rax = temp_32;
33 return rax;
34 }
1 // +[AFSecurityPolicy certificatesInBundle:]
2 + (id)certificatesInBundle:(id)arg_10 {
3 long var_b0;
4 long temp_44;
5 BOOL cf;
6 long temp_55;
7 long var_f0;
8 id rax = [arg_10 pathsForResourcesOfType:@"cer" inDirectory:@"."];
9 rax = [rax count];
10 rax = [NSMutableSet setWithCapacity:rax];
11 long var_f8 = rax;
12 var_f0->off_0 = 0;
13 long var_100 = rax;
14 long temp_27 = &(var_f0);
15 long temp_28 = &(var_b0);
16 long zf = rax;
17 for (id temp_44 in rax) {
18 rax = [NSData dataWithContentsOfFile:temp_44];
141
19 [var_f8 addObject:rax];
20 temp_55 = temp_55 + 1;
21 cf = temp_55 > rax;
22 }
23 rax = [NSSet setWithSet:var_f8];
24 return rax;
25 }
142