Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

MissingThePointer CodePointerIntegrity

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

2015 IEEE Symposium on Security and Privacy

Missing the Point(er):


On the Effectiveness of Code Pointer Integrity1
Isaac Evans∗ , Sam Fingeret† , Julián González† , Ulziibayar Otgonbaatar† , Tiffany Tang† ,
Howard Shrobe† , Stelios Sidiroglou-Douskos† , Martin Rinard† , Hamed Okhravi∗
† MIT CSAIL, Cambridge, MA

Email: {samfin, jugonz97, ulziibay, fable, hes, stelios, rinard}@csail.mit.edu


∗ MIT Lincoln Laboratory, Lexington, MA
Email: {isaac.evans, hamed.okhravi}@ll.mit.edu

Abstract—Memory corruption attacks continue to be a major retrofitting memory safety into C/C++ applications can cause
vector of attack for compromising modern systems. Numerous significant overhead (up to 4x slowdown) [36] or may require
defenses have been proposed against memory corruption attacks, annotations [37, 28].
but they all have their limitations and weaknesses. Stronger
defenses such as complete memory safety for legacy languages In response to these perceived shortcomings, research has
(C/C++) incur a large overhead, while weaker ones such as focused on alternative techniques that can reduce the risk
practical control flow integrity have been shown to be ineffective. of code injection and code reuse attacks without significant
A recent technique called code pointer integrity (CPI) promises performance overhead and usability constraints. One such
to balance security and performance by focusing memory safety technique is Data Execution Prevention (DEP). DEP enables
on code pointers thus preventing most control-hijacking attacks
while maintaining low overhead. CPI protects access to code a system to use memory protection to mark pages as non-
pointers by storing them in a safe region that is protected by executable, which can limit the introduction of new executable
instruction level isolation. On x86-32, this isolation is enforced code during execution. Unfortunately, DEP can be defeated
by hardware; on x86-64 and ARM, isolation is enforced by using code reuse attacks such as return-oriented program-
information hiding. We show that, for architectures that do ming [11, 17], jump-oriented programming [10] and return-
not support segmentation in which CPI relies on information
hiding, CPI’s safe region can be leaked and then maliciously into-libc attacks [56].
modified by using data pointer overwrites. We implement a proof- Randomization-based techniques, such as Address Space
of-concept exploit against Nginx and successfully bypass CPI Layout Randomization (ASLR) [43] and its medium- [30], and
implementations that rely on information hiding in 6 seconds with fine-grained variants [57] randomize the location of code and
13 observed crashes. We also present an attack that generates data segments thus providing probabilistic guarantees against
no crashes and is able to bypass CPI in 98 hours. Our attack
demonstrates the importance of adequately protecting secrets in code reuse attacks. Unfortunately, recent attacks demonstrate
security mechanisms and the dangers of relying on difficulty of that even fine-grained memory randomization techniques may
guessing without guaranteeing the absence of memory leaks. be vulnerable to memory disclosure attacks [52]. Memory dis-
closure may take the form of direct memory leakage [53] (i.e.,
I. I NTRODUCTION as part of the system output), or it can take the form of indirect
Despite considerable effort, memory corruption bugs and memory leakage, where fault or timing side-channel analysis
the subsequent security vulnerabilities that they enable remain attacks are used to leak the contents of memory [9, 47]. Other
a significant concern for unmanaged languages such as C/C++. forms of randomization-based techniques include instruction
They form the basis for attacks [14] on modern systems in the set randomization (ISR) [8] or the multicompiler techniques
form of code injection [40] and code reuse [49, 14]. [26]. Unfortunately, they are also vulnerable to information
The power that unmanaged languages provide, such as leakage attacks [53, 47].
low-level memory control, explicit memory management and Control flow integrity (CFI) is a widely researched runtime
direct access to the underlying hardware, make them ideal enforcement technique that can provide practical protection
for systems development. However, this level of control against code injection and code reuse attacks [3, 61, 62].
comes at a significant cost, namely lack of memory safety. CFI provides runtime enforcement of the intended control
Rewriting systems code with managed languages has had flow transfers by disallowing transfers that are not present in
limited success [24] due to the perceived loss of control that the application’s control flow graph (CFG). However, precise
mechanisms such as garbage collection may impose, and the enforcement of CFI can have a large overhead [3]. This has
fact that millions of lines of existing C/C++ code would need motivated the development of more practical variants of CFI
to be ported to provide similar functionality. Unfortunately, that have lower performance overhead but enforce weaker
restrictions [61, 62]. For example, control transfer checks are
1 This work is sponsored by the Assistant Secretary of Defense for Research
relaxed to allow transfers to any valid jump targets as opposed
& Engineering under Air Force Contract #FA8721-05-C-0002. Opinions,
interpretations, conclusions and recommendations are those of the author and to the correct target. Unfortunately, these implementations
are not necessarily endorsed by the United States Government. have been shown to be ineffective because they allow enough

© 2015, Isaac Evans. Under license to IEEE. 781


DOI 10.1109/SP.2015.53
valid transfers to enable an attacker to build a malicious a data-pointer overwrite attack to modify the safe region and
payload [21]. tamper with base and bounds information for a code pointer
A recent survey of protection mechanisms [55] shows that we need for the actual payload. This can be summarized
that most available solutions are either (a) incomplete, (b) in the following steps:
bypassable using known attacks, (c) require source code
1) Launch Timing Side-channel Attack: A data-pointer
modifications or (d) impose significant performance overhead.
overwrite vulnerability is used to control a data pointer
Recently a new technique, code pointer integrity (CPI),
that is subsequently used to affect control flow (e.g.,
promises to bridge the gap between security guarantees and
number of loop iterations) is used to reveal the contents
performance/usability. CPI enforces selective memory safety
of the pointer under control (i.e., byte values). The data
on code pointers (i.e., it does not protect data pointers) without
pointer can be overwritten to point to a return address on
requiring any source code modifications. The key idea behind
the stack, revealing where code is located, or a location
CPI is to isolate and protect code pointers in a separate safe
in the code segment, revealing what code is located
region and provide runtime checks that verify the code pointer
there.
correctness on each control transfer. Since modification of a
2) Data Collection: Using the data pointer vulnerability,
code pointer is necessary to implement a control hijacking
we measure round-trip response times to our attack
attack, the authors of CPI argue that it is effective against
application in order to collect the timing samples. We
the most malicious types of memory corruption attacks. As
create a mapping between the smallest cumulative delay
code pointers represent a small fraction of all pointers, CPI
slope and byte 0, and the largest slope and byte 255. We
is significantly more efficient than established techniques for
use these two mappings to interpolate cumulative delay
enforcing complete memory safety (average 2.9% for C, 8.4%
slopes for all possible byte values (0-255). This enables
for C/C++) [31].
us to read the contents of specific memory locations with
In this paper, we present an attack on CPI that uses a
high accuracy.
data pointer vulnerability to launch a timing side-channel that
3) Locate Safe Region: Using information about the
leaks information about the protected safe region. Our attack
possible location of the safe region with respect to the
takes advantage of two design weaknesses in CPI. First, on
randomized location of mmap, we launch a search that
architectures that do not support segmentation protection, such
starts at a reliably mapped location within the safe region
as x86-64 and ARM, CPI uses information hiding to protect
and traverse the safe region until we discover a sequence
the safe region. Second, to achieve the low performance
of bytes that indicates the location of a known library
overhead, CPI focuses protection on code pointers. Since the
(e.g., the base of libc). Under the current implementation
safe region is kept in the same memory space as the code it
of CPI, discovering the base of libc allows us to trivially
is protecting, to avoid expensive context switches, it is also
compute the base address of the safe region. Up to this
subject to leakage and overwrite attacks. We show that an
point, the attack is completely transparent to CPI and
attacker can disclose the location of the safe region using
may not cause any crash or detectable side effect.
a timing side-channel attack. Once the location of a code
4) Attack Safe Region: Using the safe region table
pointer in the safe region is known, the metadata of the
address, we can compute the address of any code pointer
pointer is modified to allow the location of a ROP chain.
stored in the safe region. At this point, we can change
Then the pointer is modified to point to a ROP chain that
a code pointer and any associate metadata to enable a
can successfully complete the hijacking attack.
control hijacking attack (e.g., a ROP gadget). CPI does
In our evaluation of CPIs implementation, we discovered a
not detect the redirection as a violation because we have
number of implementation flaws that can facilitate an attack
already modified its safe region to accept the new base
against CPI. In this paper, we focus on an attack that exploits
and bound for the code pointer.
a flaw in the use of information hiding to protect the safe
region for architectures that do not provide hardware isolation In the CPI paper, the authors argue that leaking large parts of
(e.g., x86-64 and ARM). In other words, for the x86-64 memory or brute-forcing the safe region causes a large number
and ARM architectures, we assume the weakest assumptions of crashes that can be detected using other means [31]. We
for the attacker. In fact, the only assumption necessary for show that this assumption is incorrect and in fact leaking large
an attacker to break CPI is control of the stack, which is parts of the safe region can happen without causing any crash
consistent with other code reuse attacks and defenses in the in the target process. Another assumption in CPI is that if there
literature [49, 23, 57]. For the remainder of the paper, when is no pointer into the safe region, its location cannot be leaked.
referring to CPI, we are referring to the information-hiding We show that this assumption is also incorrect. By jumping
based implementations of CPI. into a randomly selected location in mmap, the attack can start
At a high level our attack works as follows. First, by leaking the safe region without requiring any pointer to it.
controlling the stack, we use a data pointer overwrite to To evaluate our attack, we construct a proof-of-concept at-
redirect a data pointer to a random location in memory map tack on a CPI-protected version on Nginx [45]. Our evaluation
(mmap) which is used by CPI. Using a timing side-channel shows that in Ubuntu Linux with ASLR, it takes 6 seconds
attack, we leak large parts of the safe region. We then use to bypass CPI with 13 crashes. Our analysis also shows

782
that an attack can be completed without any crashes in ∼98 the section begins with an overview of CPI and continues
hours for the most performant and complete implementation with information about remote leakage attacks. For additional
of CPI. This implementation relies on ASLR support from the information, we refer the reader to the CPI paper [31] and a
operating system. recent remote leakage attack paper [47].
A. Contributions
A. CPI Overview
This paper make the following contributions:
• Attack on CPI: We show that an attacker can defeat
CPI consists of three major components: static analysis,
CPI, on x86-64 architectures, assuming only control of instrumentation, and safe region isolation.
the stack. Specifically, we show how to reveal the location 1) Static Analysis: CPI uses type-based static analysis to
of the safe region using a data-pointer overwrite without determine the set of sensitive pointers to be protected. CPI
causing any crashes, which was assumed to be impossible treats all pointers to functions, composite types (e.g., arrays
by the CPI authors. or structs containing sensitive types), universal pointers
• Proof of Concept Attack on Nginx: We implement a
(e.g., void* and char*), and pointers to sensitive types
proof-of-concept attack on a CPI protected version of the as sensitive types (note the recursive definition). CPI protects
popular Nginx web server. We demonstrate that our attack against the redirection of sensitive pointers that can result in
is accurate and efficient (it takes 6 seconds to complete control-hijacking attacks. The notion of sensitivity is dynamic
with only 13 crashes). in nature: at runtime, a pointer may point to a benign integer
• Experimental Results: We present experimental results
value (non-sensitive) and it may also point to a function
that demonstrate the ability of our attack to leak the safe pointer (sensitive) at some other part of the execution. Using
region using a timing side-channel attack. the results of the static analysis, CPI stores the metadata for
• Countermeasures: We present several possible improve-
checking the validity of code pointers in its safe region. The
ments to CPI and analyze their susceptibility to different metadata includes the value of the pointer and its lower and
types of attacks. upper thresholds. An identifier is also stored to check for
temporal safety, but this feature is not used in the current
Next, Section II describes our threat model which is con-
implementation of CPI. Note that static analysis has its own
sistent with CPI’s threat model. Section III provides a brief
limitations and inaccuracies [33] the discussion of which is
background on CPI and the side-channel attacks necessary
beyond the scope of this paper.
for understanding the rest of the paper. Section IV describes
our attack procedure and its details. Section V presents the 2) Instrumentation: CPI adds instrumentation that propa-
results of our attack. Section VI describes a few of CPI’s gates metadata along pointer operations (e.g. pointer arithmetic
implementation flaws. Section VII provides some insights into or assignment). Instrumentation is also used to ensure that
the root cause of the problems in CPI and discusses possible only CPI intrinsic instructions can manipulate the safe region
patch fixes and their implications. Section VIII describes and that no pointer in the code can directly reference the safe
the possible countermeasures against our attack. Section IX region. This is to prevent any code pointers from disclosing the
reviews the related work and Section X concludes the paper. location of the safe region using a memory disclosure attack
(on code pointers).
II. T HREAT M ODEL 3) Safe Region Isolation: On the x86-32 architecture CPI
In this paper, we assume a realistic threat model that is relies on segmentation protection to isolate the safe region.
both consistent with prior work and the threat model assumed On architectures that do not support segmentation protection,
by CPI [31]. For the attacker, we assume that there exists such as x86-64 and ARM, CPI uses information hiding to
a vulnerability that provides control of the stack (i.e., the protect the safe region. There are two major weaknesses in
attacker can create and modify arbitrary values on the stack). CPI’s approach to safe region isolation in x86. First, the x86-
We also assume that the attacker cannot modify code in 32 architecture is slowly phased out as systems migrate to
memory (e.g., memory is protected by DEP [41]). We also 64-bit architectures and mobile architectures. Second, as we
assume the presence of ASLR [43]. As the above assumptions show in our evaluation, weaknesses in the implementation of
prevent code injection, the attacker would be required to the segmentation protection in CPI makes it bypassable. For
construct a code reuse attack to be successful. protection in the x86-64 architecture, CPI relies on the size of
We also assume that CPI is properly configured and cor- the safe region (242 bytes), randomization and sparsity of its
rectly implemented. As we will discuss later, CPI has other safe region, and the fact that there are no direct pointers to its
implementation flaws that make it more vulnerable to attack, safe region. We show that these are weak assumptions at best.
but for this paper we focus on its design decision to use CPI authors also present a weaker but more efficient version
information hiding to protect the safe region. of CPI called Code Pointer Separation (CPS). CPS enforces
safety for code pointers, but not pointers to code pointers.
III. BACKGROUND Because the CPI authors present CPI as providing the strongest
This section presents the necessary background informa- security guarantees, we do not discuss CPS and the additional
tion required to understand our attack on CPI. Specifically, safe stack feature further. Interested readers can refer to the

783
original publication for more in-depth description of these CPI safe region, an attacker can land inside an allocated mmap
features. page with high probability. In our evaluation we show that
this probability is as high as 1 for the average case. In other
B. Side Channels via Memory Corruption words, since the size of the mmap region is much larger than
Side channel attacks using memory corruption come in two the entropy in its start address, an attacker can effectively land
broad flavors: fault and timing analysis. They typically use in a valid location inside mmap without causing crashes.
a memory corruption vulnerability (e.g., a buffer overflow) as
the basis from which to leak information about the contents of IV. ATTACK M ETHODOLOGY
memory. They are significantly more versatile than traditional This section presents a methodology for performing attacks
memory disclosure attacks [54] as they can limit crashes, they on applications protected with CPI. As outlined in Section II,
can disclose information about a large section of memory, the attacks on CPI assume an attacker with identical capa-
and they only require a single exploit to defeat code-reuse bilities as outlined in the CPI paper [31]. The section begins
protection mechanisms. with a high-level description of the attack methodology and
Blind ROP (BROP) [9] is an example of a fault analysis then proceeds to describe a detailed attack against Nginx [45]
attack that uses the fault output of the application to leak using the approach.
information about memory content (i.e., using application At a high level, our attack takes advantage of two design
crashes or freezes). BROP intentionally uses crashes to leak weaknesses in CPI. First, on architectures that do not support
information and can therefore be potentially detected by segmentation protection, such as x86-64 and ARM, CPI uses
mechanisms that monitor for an abnormal number of program information hiding to protect the safe region. Second, to
crashes. achieve low performance overhead, CPI focuses protection
Seibert, et al. [47] describe a variety of timing- and fault- on code pointers (i.e., it does not protect data pointers).
analysis attacks. In this paper, we focus on using timing This section demonstrates that these design decisions can be
channel attacks via data-pointer overwrites. This type of exploited to bypass CPI.
timing attack can prevent unwanted crashes by focusing timing Intuitively, our attack exploits the lack of data pointer
analysis on allocated pages (e.g., the large memory region protection in CPI to perform a timing side channel attack that
allocated as part of the safe region). can leak the location of the safe region. Once the location of
Consider the code sequence below. If ptr can be over- a code pointer in the safe region is known, the code pointer,
written by an attacker to point to a location in memory, the along with its metadata, is modified to point to a ROP chain
execution time of the while loop will be correlated with the that completes the hijacking attack. We note that using a data-
byte value to which ptr is pointing. For example, if ptr is pointer overwrite to launch a timing channel to leak the safe
stored on the stack, a simple buffer overflow can corrupt its region location can be completely transparent to CPI and may
value to point to an arbitrary location in memory. This delay avoid any detectable side-effects (i.e., it does not cause the
is small (on the order of nanoseconds); however, by making application to crash).
numerous queries over the network and keeping the fastest The attack performs the following steps:
samples (cumulative delay analysis), an attacker can get an 1) Find data pointer vulnerability
accurate estimate of the byte values [47, 16]. In our attack, 2) Gather data
we show that this type of attack is a practical technique for
• Identify statistically unique memory sequences
disclosing CPI’s safe region.
• Collect timing data on data pointer vulnerability
1 i = 0; 3) Locate safe region
2 while (i < ptr->value) 4) Attack safe region
3 i++;
Next, we describe each of these steps in detail.

C. Memory Entropy A. Vulnerability


One of the arguments made by the authors of the CPI The first requirement to launch an attack on CPI is to
technique is that the enormous size of virtual memory makes discover a data pointer overwrite vulnerability in the CPI-
guessing or brute force attacks difficult if not impossible. protected application. Data pointers are not protected by CPI;
Specifically, they mention that the 48 bits of addressable space CPI only protects code pointers.
in x86-64 is very hard to brute force. We show that in practice The data pointer overwrite vulnerability is used to launch
this assumption is incorrect. First, the entropy faced by an a timing side-channel attack [47], which, in turn, can leak
attacker is not 48 bits but rather 28 bits: the entropy for the information about the safe region. In more detail, the data
base address of the mmap, where CPI’s safe region is stored, pointer overwrite vulnerability is used to control a data pointer
is 28 bits [39]. Second, an attacker does not need to know that is subsequently used to affect control flow (in our case,
the exact start address of mmap. The attacker only needs to the number of iterations of a loop) and can be used to
redirect the data pointer to any valid location inside mmap. reveal the contents of the pointer (i.e., byte values) via timing
Since large parts of the mmap are used by libraries and the information. For example, if the data pointer is stored on the

784
stack, it can be overwritten using a stack overflow attack; if it
is stored in heap, it can be overwritten via a heap corruption stack
higher memory addresses
attack.
lower memory addresses
In the absence of complete memory safety, we assume that
such vulnerabilities will exist. This assumption is consistent stack gap (at least 128MB)
with the related work in the area [50, 12]. In our proof-of- max mmap_base
concept exploit, we use a stack buffer overflow vulnerability
similar to previous vulnerabilities [1] to redirect a data pointer random mmap_base
in Nginx. linked libraries
min mmap_base =
B. Data Collection max-2^28*PAGE_SIZE

Given a data-pointer vulnerability, the next step is collect min mmap_base -


enough data to accurately launch a timing side-channel attack size of linked libraries
that will reveal the location of the safe region.
safe region
The first step is to generate a request that redirects the vul- 2^42 bytes always allocated
nerable data pointer to a carefully chosen address in memory
(see Section IV-C). Next, we need to collect enough informa-
tion to accurately estimate the byte value that is dereferenced max mmap_base - 2^42 -
by the selected address. To estimate the byte value, we use size of linked libraries
the cumulative delay analysis described in Equation 1. dynamically loaded
libraries,

s any heap allocations
backed by mmap
byte = c (di − baseline) (1) end of mmap region
i=1

In the above equation, baseline represents the average Fig. 1. Safe Region Memory Layout.
round trip time (RTT) that the server takes to process requests
for a byte value of zero. di represents the delay sample RTT for
a nonzero byte value, and s represents the number of samples C. Locate Safe Region
taken. Figure 1 illustrates the memory layout of a CPI-protected
Once we set byte = 0, the above equation simplifies to: application on the x86-64 architecture. The stack is located
 at the top of the virtual address space and grows downwards
di (towards lower memory addresses) and it is followed by the
baseline =
s stack gap. Following the stack gap is the base of the mmap
region (mmap base), where shared libraries (e.g., libc) and
Due to additional delays introduced by networking condi- other regions created by the mmap() system call reside. In
tions, it is important to establish an accurate baseline. In a systems protected by ASLR, the location of mmap base
sense, the baseline acts as a band-pass filter. In other words, is randomly selected to be between max mmap base (lo-
we subtract the baseline from di in Eq. 1 so that we are cated immediately after the stack gap) and min mmap base.
only measuring the cumulative differential delay caused by min mmap base is computed as:
our chosen loop.
We then use the set of delay samples collected for byte min mmap base =
255 to calculate the constant c. Once we set byte = 255, the max mmap base − aslr entropy ∗ page size
equation is as follows:
where aslr entropy is 228 in 64-bit systems, and the
page size is specified as an operating system parameter
(typically 4KB). The safe region is allocated directly after any
255
c= linked libraries are loaded on mmap base and is 242 bytes.

s
Immediately after the safe region lies the region in memory
(di ) − s ∗ baseline
where any dynamically loaded libraries and any mmap-based
i=1
heap allocations are made.
Given that the safe region is allocated directly after all
Once we obtain c, which provides of the ratio between the
linked libraries are loaded, and that the linked libraries are
byte value and cumulative differential delay, we are able to
linked deterministically, the location of the safe region can
estimate byte values.

785
be computed by discovering a known location in the linked
libraries (e.g., the base of libc) and subtracting the size of
the safe region (242 ) from the address of the linked library.
A disclosure of any libc address or an address in another
linked library trivially reveals the location of the safe region
in the current CPI implementation. Our attack works even if
countermeasures are employed to allocate the safe region in a
randomized location as we discuss later.
To discover the location of a known library, such as the
base of libc, the attack needs to scan every address starting
at min mmap base, and using the timing channel attack
described above, search for a signature of bytes that uniquely
identify the location.
The space of possible locations to search may require
aslr entropy∗page size scans in the worst case. As the base Fig. 3. Tolerated Number of Crashes
address of mmap is page aligned, one obvious optimization is
to scan addresses that are multiples of page size, thus greatly
reducing the number of addresses that need to be scanned to: mmap base may crash the application. If the application
restarts after a crash without rerandomizing its address space,
(aslr entropy ∗ page size)/page size then we can use this information to perform a search with the
In fact, this attack can be made even more efficient. In the goal of finding an address x such that x can be dereferenced
x86-64 architecture, CPI protects the safe region by allocating safely but x + libc size causes a crash. This implies that x
a large region (242 bytes) that is very sparsely populated with lies inside the linked library region, thus if we subtract the
pointer metadata. As a result, the vast majority of bytes inside size of all linked libraries from x, we will obtain an address
the safe region are zero bytes. This enables us to determine in the safe region that is near libc and can reduce to the case
with high probability whether we are inside the safe region or a above. Note that it is not guaranteed that x is located at the
linked library by sampling bytes for zero/nonzero values (i.e., top of the linked library region: within this region there are
without requiring accurate byte estimation). Since we start in pages which are not allocated and there are also pages which
the safe region and libc is allocated before the safe region, do not have read permissions which would cause crashes if
if we go back in memory by the size of libc, we can avoid dereferenced.
crashing the application. This is because any location inside To find such an address x, the binary search proceeds
the safe region has at least the size of libc allocated memory as follows: if we crash, our guessed address was too high,
on top of it. As a result, the improved attack procedure is as otherwise our guess was too low. Put another way, we maintain
follows: the invariant that the high address in our range will cause
a crash while the lower address is safe, and we terminate
1) Redirect a data pointer into the always allocated part of when the difference reaches the threshold of libc size. This
the safe region (see Fig. 1). approach would only require at most log2 219 = 19 reads and
2) Go back in memory by the size of libc. will crash at most 19 times (9.5 times on average).
3) Scan some bytes. If the bytes are all zero, goto step 2.
More generally, given that T crashes are allowed for our
Else, scan more bytes to decide where we are in libc.
scanning, we would like to characterize the minimum number
4) Done.
of page reads needed to locate a crashing boundary under the
Note that discovery of a page that resides in libc directly optimum scanning strategy. A reason for doing that is when
reveals the location of the safe region. T < 19, our binary search method is not guaranteed to find a
Using this procedure, the number of scans can be reduced crashing boundary in the worst case.
to: We use dynamic programming to find the optimum scanning
strategy for a given T . Let f (i, j) be the maximum amount of
(aslr entropy ∗ page size)/libc size memory an optimum scanning strategy can cover, incurring
up to i crashes, and performing j page reads. Note that to
Here libc size, in our experiments, is approximately 221 . In
cause a crash, you need to perform a read. Thus, we have the
other words, the estimated number of memory scans is: 228 ∗
recursion
212 /221 = 219 . This non-crashing scan strategy is depicted on
the left side of Fig. 2. f (i, j) = f (i, j − 1) + f (i − 1, j − 1) + 1
We can further reduce the number of memory scans if we
are willing to tolerate crashes due to dereferencing an address This recursion holds because in the optimum strategy for
not mapped to a readable page. Because the pages above f (i, j), the first page read will either cause a crash or not.
mmap base are not mapped, dereferencing an address above

786
2nd page scan Crash!

4th page scan Crash!

MMAP base MMAP base


… …
Kth page scan
Size L

libc Nth page scan libc


libc found! libc found!
L


5th page scan
5th page scan
L
4th page scan
L 3rd page scan
3rd page scan
L
2nd page scan
L First dereference loc. First dereference loc.
safe region 1st page scan safe region 1st page scan

… …
Non-crashing scan strategy Crashing scan strategy

Fig. 2. Non-Crashing and Crashing Scan Strategies.

When a crash happens, it means that libc is below the first 1


of reads will be reduced by a factor of T +1 . With probability
1
page we read, thus the amount of memory we have to search 1 − T +1 , this will crash the application immediately and we
is reduced to a value that is at most f (i − 1, j − 1). As for will have to try again. In expectation, this strategy will crash
the latter case, the amount we have to search is reduced to a T times before succeeding.
value that is at most f (i, j − 1). Note that in the rerandomization case, any optimal strategy
Having calculated a table of values from our recursion, we will choose a starting address based on how many crashes
can use it to inform us about the scanning strategy that incurs can be tolerated and if this first scan does not crash, then
at most T crashes. Fig. 3 shows the number of reads performed the difference between consecutive addresses scanned will be
by this strategy for different T values. at most libc size. If the difference is ever larger than this
Because we know the layout of the library region in number, then it may be the case that libc is jumped over,
advance, when we find a crash boundary we know that causing a crash, and all knowledge about the safe region is
subtracting 8 ∗ libc size from x will guarantee an address lost due to the rerandomization. If the difference between
in the safe region because this amount is greater than the size consecutive addresses x, y satisfies y − x > libc size, then
of all linked libraries combined. Thus, at most 8 more reads replacing x by y − libc size and shifting all addresses before
will be needed to locate an address in libc. The crashing scan x by y − libc size − x yields a superior strategy since the risk
strategy is depicted on the right side of Fig. 2. of crashing is moved to the first scan while maintaining the
We can still obtain a significant improvement even if the same probability of success.
application does rerandomize its address space when it restarts Once the base address of mmap is discovered using the
after a crash. Suppose that we can tolerate T crashes on timing side channel, the address of the safe region table can
average. Rather than begin our scan at address: be computed as follows:
min mmap base =
table address = libc base − 242
max mmap base − aslr entropy ∗ page size (2)
we begin at: D. Attack Safe Region
Using the safe region table address, the address of a
1
max mmap base − (aslr entropy ∗ page size) code pointer of interest in the CPI protected application,
T +1 ptr_address, can be computed by masking with the
1
With probability T +1 , it will be the case that mmap base cpi_addr_mask, which is 0x00fffffffff8, and then
is above this address and we will not crash, and the number multiplying by the size of the table entry, which is 4.

787
Armed with the exact address of a code pointer in the safe A. Vulnerability
region, the value of that pointer can be hijacked to point to a We patch Nginx to introduce a stack buffer overflow vul-
library function or the start of a ROP chain to complete the nerability allowing the user to gain control of a parameter
attack. used as the upper loop bound in the Nginx logging system.
This is similar to the effect that an attacker can achieve with
E. Attack Optimizations (CVE-2013-2028) seen in previous Nginx versions [1]. The
A stronger implementation of CPI might pick an arbitrary vulnerability enables an attacker to place arbitrary values on
address for its safe region chosen randomly between the the stack in line with the threat model assumed by CPI (see
bottom of the linked libraries and the end of the mmap region. Section II). We launch the vulnerability over a wired LAN
Our attack still works against such an implementation and can connection, but as shown in prior work, the attack is also
be further optimized. possible over wireless networks [47].
We know that the safe region has a size of 242 bytes. Using the vulnerability, we modify a data pointer in
Therefore, there are 248 /242 = 26 = 64 possibilities for the Nginx logging module to point to a carefully chosen
where we need to search. In fact, in a real world system address. The relevant loop can be found in the source code
like Ubuntu 14.04 there are only 246.5 addresses available to in nginx_http_parse.c.
1
mmap on Ubuntu x86-64 –thus there is a 25 chance of getting
the right one, even with the most extreme randomization for (i = 0; i < headers->nelts; i++)
assumptions. Furthermore, heap and dynamic library address
disclosures will increase this chance. We note that CPI has The data pointer vulnerability enables control over the
a unique signature of a pointer value followed by an empty number of iterations executed in the loop. Using the timing
slot, followed by the lower and upper bounds, which will make analysis presented in Section IV, we can distinguish between
it simple for an attacker to verify that the address they have zero pages and nonzero pages. This optimization enables the
reached is indeed in the safe region. Once an address within attack to efficiently identify the end of the safe region, where
the safe region has been identified, it is merely a matter of nonzero pages indicate the start of the linked library region.
time before the attacker is able to identify the offset of the safe
address relative to the table base. There are many options to B. Timing Attack
dramatically decrease the number of reads to identify exactly We begin the timing side channel attack by measuring the
where in the safe region we have landed. For instance, we HTTP request round trip time (RTT) for a static web page
might profile a local application’s safe region and find the most (0.6 KB) using Nginx. We collect 10,000 samples to establish
frequently populated addresses modulo the system’s page size the average baseline delay. For our experiments, the average
(since the base of the safe region must be page-aligned), then RTT is 3.2ms. Figure 4 and 5 show the results of our byte
search across the safe region in intervals of the page size at estimation experiments. The figures show that byte estimation
that offset. Additionally, we can immediately find the offset if using cumulative differential delay is accurate to within 2%
we land on any value that is unique within the safe region by (±20).
comparing it to our local reference copy.
We can now make some general observations about choos-
ing the variable of interest to target during the search. We
would be able to search the fastest if we could choose a pointer
from the largest set of pointers in a program that has the same
addresses modulo the page size. For instance, if there are 100
pointers in the program that have addresses that are 1 modulo
the page size, we greatly increase our chances of finding one
of them early during the scan of the safe region.
Additionally, the leakage of any locations of other libraries
(making the strong randomization assumption) will help iden-
tify the location of the safe region. Note that leaking all other
libraries is within the threat model of CPI.

V. M EASUREMENTS AND R ESULTS


We next present experimental results for the attack described
in Section IV on Nginx 1.6.2, a popular web server. We
compile Nginx with clang/CPI 0.2 and the -flto -fcpi
flags. Nginx is connected to the attacker via a 1Gbit wired
Fig. 4. Timing Measurement for Nginx 1.6.2 over Wired LAN
LAN connection. We perform all tests on a server with a quad-
core Intel i5 processor with 4 GB RAM.

788
our false positive rate at the cost of a factor of 2 in speed.
In our experiments, we found that scanning 5 extra bytes in
addition to the two signature bytes can yield 100% accuracy
using 30 samples per byte and considering the error in byte
estimation. Figure 6 illustrates the sum of the chosen offsets
for our scan of zero pages leading up to libc. Note that we
jump by the size of libc until we hit a non-zero page. The dot
on the upper-right corner of the figure shows the first non-zero
page.
In short, we scan 30∗7 = 210 bytes per size of libc to decide
whether we are in libc or the safe region. Table I summarizes
the number of false positives, i.e. the number of pages we
estimate as nonzero, which are in fact 0. The number of data
samples and estimation samples, and their respective fastest
percentile used for calculation all affect the accuracy. Scanning
5 extra bytes (in addition to the two signature bytes for a page)
and sampling 30 times per bytes yields an accuracy of 100% in
Fig. 5. Observed Byte Estimation our setup. As a result, the attack requires (2 + 5) ∗ 219 ∗ 30 =
7 ∗ 219 ∗ 30 = 110, 100, 480 scans on average, which takes
about 97 hours with our attack setup.
C. Locate Safe Region Once we have a pointer to a nonzero page in libc, we send
more requests to read additional bytes with high accuracy
After we determine the average baseline delay, we redi- to determine which page of libc we have found. Figure 7
rect the nelts pointer to the region between address illustrates that we can achieve high accuracy by sending
0x7bfff73b9000 and 0x7efff73b9000. As mentioned 10, 000 samples per byte.
in the memory analysis, this is the range of the CPI safe region Despite the high accuracy, we have to account for errors
we know is guaranteed to be allocated, despite ASLR being in estimation. For this, we have developed a fuzzy n−gram
enabled. We pick the the top of this region as the first value matching algorithm that, given a sequence of noisy bytes, tells
of our pointer. us the libc offset at which those bytes are located by comparing
A key component of our attack is the ability to quickly the estimated bytes with a local copy of libc. In determining
determine whether a given page lies inside the safe region or zero and nonzero pages, we only collect 30 samples per byte as
inside the linked libraries by sampling the page for zero bytes. we do not need very accurate measurements. After landing in a
Even if we hit a nonzero address inside the safe region, which nonzero page in libc, we do need more accurate measurements
will trigger the search for a known signature within libc, the to identify our likely location. Our measurements show that
nearby bytes we scan will not yield a valid libc signature and 10, 000 samples are necessary to estimate each byte to within
we can identify the false positive. In our tests, every byte read 20. We also determine that reading 70 bytes starting at
from the high address space of the safe region yielded zero. page offset 3333 reliably is enough for the fuzzy n−gram
In other words, we observed no false positives. matching algorithm to determine where exactly we are in libc.
One problematic scenario occurs if we sample zero bytes This offset was computed by looking at all contiguous byte
values while inside libc. In this case, if we mistakenly interpret sequences for every page of libc and choosing the one which
this address as part of the safe region, we will skip over required the fewest bytes to guarantee a unique match. This
libc and the attack will fail. We can mitigate this probability orientation inside libc incurs additional 70∗10, 000 = 700, 000
by choosing the byte offset per page we scan intelligently. requests, which adds another hour to the total time of the attack
Because we know the memory layout of libc in advance, for a total of 98 hours.
we can identify page offsets that have a large proportion of After identifying our exact location in libc, we know the
nonzero bytes, so if we choose a random page of libc and read exact base address of the safe region:
the byte at that offset, we will likely read a nonzero value.
In our experiments, page offset 4048 yielded the highest safe region address = libc base − 242
proportion of non-zero values, with 414 out of the 443 pages
of libc having a nonzero byte at that offset. This would give D. Fast Attack with Crashes
our strategy an error rate of 1 − 414/443 = 6.5%. We note We can make the above attack faster by tolerating 12 crashes
that we can reduce this number to 0 by scanning two bytes per on average. The improved attack uses binary search as opposed
page instead at offsets of our choice. In our experiments, if we to linear search to find libc after landing in the safe region as
scan the bytes at offsets 1272 and 1672 in any page of libc, described in section IV-C. We also use an alternative strategy
one of these values is guaranteed to be nonzero. This reduces for discovering the base of libc. Instead of sampling individual
pages, we continue the traversal until we observe a crash that

789
TABLE I
E RROR RATIO IN ESTIMATION OF 100 ZERO PAGES USING OFFSETS 1, 2, 3,
4, 5, 1272, 1672

# Data samples # Estimation samples False positive ratio


(%-tile used) (%-tile used)
1,000 (10%) 1,000 (10%) 0%
10,000 (1%) 1,000 (10%) 0%
1,000 (10%) 100 (10%) 0%
10,000 (1%) 100 (10%) 0%
1,000 (10%) 50 (20%) 0%
10,000 (1%) 50 (20%) 3%
1,000 (10%) 30 (33%) 2%
10,000 (1%) 30 (33%) 0%
1,000 (10%) 20 (50%) 5%
10,000 (1%) 20 (50%) 13%
1,000 (10%) 10 (100%) 91%
10,000 (1%) 10 (100%) 92%
1,000 (10%) 5 (100%) 68%
10,000 (1%) 5 (100%) 86%
1,000 (10%) 1 (100%) 54%
10,000 (1%) 1 (100%) 52%

Fig. 7. Actual Bytes Estimation of a Nonzero Page in LIBC.

F. Summary
In summary, we show a practical attack on a version of
Nginx protected with CPI, ASLR and DEP. The attack uses a
data pointer overwrite vulnerability to launch a timing side
channel attack that can leak the safe region in 6 seconds
with 13 observed crashes. Alternatively, this attack can be
completed in 98 hours without any crashes.

VI. I MPLEMENTATION F LAWS OF CPI


The published implementation (simpletable) of CPI uses
a fixed address for the table for all supported architectures,
Fig. 6. Estimation of Zero Pages in Safe Region.
providing no protection in its default configuration. We assume
this is due to the fact that the version of CPI we evaluated is
still in “early preview.” We kept this in mind throughout our
indicates the location of the non-readable section of libc. This evaluation, and focused primarily on fundamental problems
reveals the exact address of libc. In our setup, the binary search with the use of information hiding in CPI. Having said that,
caused 11 crashes; discovering the base of libc required an we found that as currently implemented there was almost no
additional 2 crashes. focus on protecting the location of the safe region.
The two alternate implementations left in the source,
hashtable and lookuptable, use mmap directly without a fixed
E. Attack Safe Region address, which is an improvement but is of course relying on
mmap for randomization. This provides no protection against
After finding the safe region, we then use the same data an ASLR disclosure, which is within the threat model of the
pointer overwrite to change the read_handler entry of the CPI paper. We further note that the safe stack implementation
safe region. We then modify the base and bound of the code also allocates pages using mmap without a fixed address, thus
pointer to hold the location of the system call (sysenter). making it similarly vulnerable to an ASLR disclosure. This
Since we can control what system call sysenter invokes by vulnerability makes the safe stack weaker than the protection
setting the proper values in the registers, finding sysenter offered by a stack canary, as any ASLR disclosure will allow
allows us to implement a variety of practical payloads. After the safe stack location to be determined, whereas a stack
this, the attack can proceed simply by redirecting the code canary needs a more targeted disclosure (although it can be
pointer to the start of a ROP chain that uses the system call. bypassed in other ways).
CPI does not prevent the redirection because its entry for the In the default implementation (simpletable), the
code pointer is already maliciously modified to accept the ROP location of the table is stored in a static variable
chain.
The entire crashing attack takes 6 seconds to complete.

790
(__llvm__cpi_table) which is not zeroed after its and the size of the safe region in the 64-bit one. However, since
value is moved into the segment register. Thus, it is trivially the safe region is stored in the same address space to avoid
available to an attacker by reading a fixed offset in the data performance expensive context switches, these protections are
segment. In the two alternate implementations, the location not enough and as illustrated in our attacks they are easy
of the table is not zeroed because it is never protected by to bypass. Note that the motivation for techniques such as
storage in the segment registers at all. Instead it is stored as CPI is the fact that existing memory protection defenses such
a local variable. Once again, this is trivially vulnerable to as ASLR are broken. Ironically, CPI itself relies on these
an attack who can read process memory, and once disclosed defenses to protect its enforcement. For example, relying on
will immediately compromise the CPI guarantees. Note that randomization of locations to hide the safe region has many
zeroing memory or registers is often difficult to perform of the weaknesses of ASLR that we have illustrated.
correctly in C in the presence of optimizing compilers [44]. 2) Detecting Crashes: Second, it is assumed that leaking
We note that CPI’s performance numbers rely on support large parts of memory requires causing numerous crashes
for superpages (referred to as huge pages on Linux). In the which can be detected using other mechanisms. This in fact is
configurations used for performance evaluation, ASLR was not not correct. Although attacks such as Blind ROP [9] and brute
enabled (FreeBSD does not currently have support for ASLR, force [51] do cause numerous crashes, it is also possible on
and as of Linux kernel 3.13, the base for huge table allocations current CPI implementations to avoid such crashes using side-
in mmap is not randomized, although a patch adding support channel attacks. The main reason for this is that in practice
has since been added). We note this to point out a difference large number of pages are allocated and in fact, the entropy
between CPI performance tests and a real world environment, in the start address of a region is much smaller than its
although we have no immediate reason to suspect a large size. This allows an attacker to land correctly inside allocated
performance penalty from ASLR being enabled. space which makes the attack non-crashing. In fact, CPI’s
It is unclear exactly how the published CPI implementation implementation exacerbates this problem by allocating a very
intends to use the segment registers on 32-bit systems. The large mmap region.
simpletable implementation, which uses the %gs register, 3) Memory Disclosure: Third, it is also implicitly assumed
warns that it is not supported on x86, although it compiles. that large parts of memory cannot leak. Direct memory dis-
We note that using the segment registers may conflict in closure techniques may have some limitations. For example,
Linux with thread-local storage (TLS), which uses the %gs they may be terminated by zero bytes or may be limited to
register on x86-32 and the %fs register on x86-64 [18]. As areas adjacent to a buffer [54]. However, indirect leaks using
mentioned, the default implementation, simpletable, does not dangling data pointers and timing or fault analysis attacks do
support 32-bit systems, and the other implementations do not not have these limitations and they can leak large parts of
use the segment registers at all, a flaw noted previously, so memory.
currently this flaw is not easily exposed. A quick search of 4) Memory Isolation: Fourth, the assumption that the safe
32-bit libc, however, found almost 3000 instructions using the region cannot leak because there is no pointer to it is incorrect.
%gs register. Presumably this could be fixed by using the %fs As we show in our attacks, random searching of the mmap
register on 32-bit systems; however, we note that this may region can be used to leak the safe region without requiring
cause compatibility issues with applications expecting the %fs an explicit pointer into that region.
register to be free, such as Wine (which is explicitly noted in To summarize, the main weakness of CPI is its reliance
the Linux kernel source) [2]. on secrets which are kept in the same space as the process
Additionally, the usage of the %gs and %fs segment being protected. Arguably, this problem has contributed to the
registers might cause conflicts if CPI were applied to protect weaknesses of many other defenses as well [59, 51, 54, 47].
kernel-mode code, a stated goal of the CPI approach. The
Linux and Windows kernels both have special usages for these B. Patching CPI
registers. Our attacks may immediately bring to mind a number
of patch fixes to improve CPI. We considered several of
VII. D ISCUSSION these fixes here and discuss their effectiveness and limitations.
In this section we discuss some of the problematic CPI Such fixes will increase the number of crashes necessary for
design assumptions and discuss possible fixes. successful attacks, but they cannot completely prevent attacks
on architectures lacking segmentation (x86-64 and ARM).
A. Design Assumptions 1) Increase Safe Region Size: The first immediate idea is
1) Enforcement Mechanisms: First, the authors of CPI to randomize the location of the safe region base within an
focus on extraction and enforcement of safety checks, but even larger mmap- allocated region. However, this provides no
they do not provide enough protection for their enforcement benefit: the safe region base address must be strictly greater
mechanisms. This is arguably a hard problem in security, but than the beginning of the returned mmap region, effectively
the effectiveness of defenses rely on such protections. In the increasing the amount of wasted data in the large region but
published CPI implementation, protection of the safe region is not preventing our side channel attack from simply continuing
very basic, relying on segmentation in the 32-bit architecture to scan until it finds the safe region. Moreover, an additional

791
register must be used to hide the offset and then additional this flag for 400.perlbench to run). In our setup, no bench-
instructions must be used to load the value from that register, mark compiled with the CPI hashtable produced correct output
add it to the safe region segment register, and then add the on 400.perlbench, 403.gcc and 483.xalancbmk.
actual table offset. This can negatively impact performance. Table II lists the overhead results for SPECint. N T in the
2) Randomize Safe Region Location: The second fix can table denotes “Not terminated after 8 hours”. In this table, we
be to specify a fixed random address for the mmap allocation have listed the performance of the default CPI hashtable size
using mmap_fixed. This has the advantage that there will (233 ). Using a hashtable size of 226 , CPI reports that it has run
be much larger portions of non-mapped memory, raising the out of space in its hashtable (i.e. it has exceed a linear probing
probability that an attack might scan through one of these maximum limit) for 471.omnetpp and 473.astar. Using
regions and trigger a crash. However, without changing the a hashtable size of 220 , CPI runs out of space in the safe region
size of the safe region an attacker will only need a small num- for those tests, as well as 445.gobmk and 464.h264ref.
ber of crashes in order to discover the randomized location. The other tests incurred an average overhead of 17% with
Moreover, this approach may pose portability problems; as the the worst case overhead of 131% for 471.omnetpp. While
mmap man page states, “the availability of a specific address in general decreasing the CPI hashtable size leads to a small
range cannot be guaranteed, in general.” Platform-dependent performance increase, these performance overheads can still
ASLR techniques could exacerbate these problems. There are be impractically high for some real-world applications, partic-
a number of other plausible attacks on this countermeasure: ularly C++ applications like 471.omnetpp.
• Unless the table spans a smaller range of virtual memory, Table III lists the overhead results for SPECfp. IR in
attacks are still possible based on leaking the offsets and the table denotes “Incorrect results.” For SPECfp and a
knowing the absolute minimum and maximum possible CPI hashtable size of 226 , two benchmarks run out of
mmap_fixed addresses, which decrease the entropy of space: 433.milc and 447.dealII. In addition, two
the safe region. other benchmarks return incorrect results: 450.soplex and
• Induce numerous heap allocations (at the threshold caus- 453.povray. The 453.povray benchmark also returns
ing them to be backed by mmap) and leak their ad- incorrect results with CPI’s default hashtable size.
dresses. When the addresses jump by the size of the
TABLE II
safe region, there is a high probability it has been found. SPEC INT 2006 B ENCHMARK P ERFORMANCE BY CPI FLAVOR
This is similar to heap spraying techniques and would be
particularly effective on systems employing strong heap Benchmark No CPI CPI simpletable CPI hashtable
randomization. 401.bzip2 848 sec 860 (1.42%) 845 (-0.35%)
429.mcf 519 sec 485 (-6.55%) 501 (-3.47%)
• Leak the addresses of any dynamically loaded libraries. 445.gobmk 712 sec 730 (2.53%) 722 (1.40%)
If the new dynamically loaded library address increases 456.hmmer 673 sec 687 (2.08%) 680 (1.04%)
over the previous dynamic library address by the size of 458.sjeng 808 sec 850 (5.20%) 811 (0.37%)
462.libquantum 636 sec 713 (12.11%) 706 (11.01%)
the safe region, there is a high probability the region has
464.h264ref 830 sec 963 (16.02%) 950 (14.46%)
been found. 471.omnetpp 582 sec 1133 (94.67%) 1345 (131.10%)
3) Use Hash Function for Safe Region: The third fix can 473.astar 632 sec 685 (8.39%) 636 (0.63%)
400.perlbench 570 sec NT NT
be to use the segment register as a key for a hash function into 403.gcc 485 sec 830 (5.99%) NT
the safe region. This could introduce prohibitive performance 483.xalancbmk 423 sec 709 (67.61%) NT
penalties. It is also still vulnerable to attack as a fast hash
function will not be cryptographically secure. This idea is
similar to using cryptography mechanisms to secure CFI [35]. TABLE III
4) Reduce Safe Region Size: The fourth fix can be to make SPEC FP 2006 B ENCHMARK P ERFORMANCE BY CPI FLAVOR
the safe region smaller. This is plausible, but note that if mmap Benchmark No CPI CPI simpletable CPI hashtable
is still contiguous an attacker can start from a mapped library 433.milc 696 sec 695 (-0.14%) 786 (12.9%)
and scan until they find the safe region, so this fix must be 444.namd 557 sec 571 (2.51%) 574 (3.05%)
combined with a non-contiguous mmap. Moreover, making the 447.dealII 435 sec 539 (23.9%) 540 (24.1%)
450.soplex 394 sec 403 (2.28%) 419 (6.34%)
safe region compact will also result in additional performance 453.povray 250 sec IR IR
overhead (for example, if a hashtable is being used, there will 470.lbm 668 sec 708 (5.98%) 705 (5.53%)
be more hashtable collisions). A smaller safe region also runs a 482.sphinx3 863 sec 832 (-3.59%) 852 (-1.27%)
higher risk of running out of space to store “sensitive” pointers
more easily. To evaluate the effectiveness of a scheme which might
In order to evaluate the viability of this proposed fix, we dynamically expand and reduce the hashtable size to reduce
compiled and ran the C and C++ SPECint and SPECfp 2006 the attack surface at the cost of an unknown performance
benchmarks [22] with several sizes of CPI hashtables on an penalty and loss of some real-time guarantees, we also ran the
Ubuntu 14.04.1 machine with 4GB RAM. All C benchmarks SPEC benchmarks over an instrumented hashtable implemen-
were compiled using the -std=gnu89 flag (clang requires tation to discover the maximum number of keys concurrently

792
resident in the hashtable; our analysis showed this number to VIII. P OSSIBLE C OUNTERMEASURES
be 223 entries, consuming 228 bytes. However, some tests did In this section we discuss possible countermeasures against
not complete correctly unless the hashtable size was at least control hijacking attacks that use timing side channels for
228 entries, consuming 233 bytes. Without any other mmap memory disclosure.
46
allocations claiming address space, we expect 2228 = 218 a) Memory Safety: Complete memory safety can defend
46
crashes with an expectation of 217 , or 2233 = 213 crashes with against all control hijacking attacks, including the attack
an expectation of 212 . This seems to be a weak guarantee of outline in this paper. Softbound with the CETS extensions [36]
the security of CPI on programs with large numbers of code enforces complete spatial and temporal pointer safety albeit at
pointers. For instance, a program with 2GB of memory in a significant cost (up to 4x slowdown).
which only 10% of pointers are found to be sensitive using a On the other hand, experience has shown that low overhead
CPI hashtable with a load factor of 25% would have a safe mechanisms that trade off security guarantees for performance
region of size (2 ∗ 109 /8 ∗ 8% ∗ 4 ∗ 32 bytes). The expected (e.g., approximate [48] or partial [5] memory safety) eventu-
number of crashes before identifying this region would be only ally get bypassed [9, 52, 21, 11, 17].
slightly more than 214 . This number means that the hashtable Fortunately, hardware support can make complete memory-
implementation of CPI is not effective for protecting against a safety practical. For instance, Intel memory protection ex-
local attacker and puts into question the guarantees it provides tensions (MPX) [25] can facilitate better enforcement of
on any remote system that is not monitored by non-local memory safety checks. Secondly, the fat-pointer scheme shows
logging. As a comparison, it is within an order of magnitude that hardware-based approaches can enforce spatial memory
of the number of crashes incurred in the Blind ROP [9] attack. safety at very low overhead [32]. Tagged architectures and
capability-based systems can also provide a possible direction
5) Use Non Contiguous Randomized mmap: Finally, the for mitigating such attacks [58].
fifth fix can be to use a non-contiguous, per-allocation random- b) Randomization: One possible defense against timing
ized mmap. Such non-contiguous allocations are currently only channel attacks, such as the one outlined in this paper, is to
available using customized kernels such as PaX [43]. However, continuously rerandomize the safe region and ASLR, before
even with non-contiguous allocations, the use of super pages an attacker can disclose enough information about the memory
for virtual memory can still create weaknesses. An attacker layout to make an attack practical. One simple strategy is to
can force heap allocation of large objects, which use mmap use a worker pool model that is periodically re-randomized
directly to generate entries that reduce total entropy. Moreover, (i.e., not just on crashes) by restarting worker processes.
knowing the location of other libraries further reduces the Another approach is to perform runtime rerandomization [20]
entropy of the safe region because of its large size. As a by migrating running process state.
result, such a technique must be combined with a reduction Randomization techniques provide probabilistic guarantees
in safe region size to be viable. More accurate evaluation of that are significantly weaker than complete memory safety
the security and performance of such a fix would require an at low overhead. We note that any security mechanism that
actual implementation which we leave to future work. trades security guarantees for performance may be vulnerable
to future attacks. This short term optimization for the sake of
The lookuptable implementation of CPI (which was non- practicality is one reason for the numerous attacks on security
functional at the time of our evaluation) could support this systems [9, 52, 21, 11, 17].
approach by a design which randomly allocates the address of c) Timing Side Channel Defense: One way to defeat
each subtable at runtime. This would result in a randomized attacks that use side channels to disclose memory is to remove
scattering of the much smaller subtables across memory. There execution timing differences. For example, timing channels
46
are, however, only 32∗2222entries = 219 slots for the lookup can be removed by causing every execution (or path) to take
table’s subtable locations. The expectation for finding one of the same amount of time. The obvious disadvantage of this
19
these is 22K crashes, where K is the number of new code approach is that average-case execution time now becomes
pointers introduced that cause a separate subtable table to be worst-case execution time. This change in expected latency
allocated. If there are 25 such pointers (which would be the might be too costly for many systems. We note here that
case for a 1GB process with at least one pointer across the adding random delays to program execution cannot effectively
address space), that number goes to 213 crashes in expectation, protect against side channel attacks [19].
which as previously argued does not provide strong security
IX. R ELATED W ORK
guarantees.
Memory corruption attacks have been used since the early
We argue that we can identify a subtable because of 70’s [6] and they still pose significant threats in modern
the recognizable CPI structure, and search it via direct/side- environments [14]. Memory unsafe languages such as C/C++
channel attacks. While we cannot modify any arbitrary code are vulnerable to such attacks.
pointer, we believe that it is only a matter of time until an Complete memory safety techniques such as the SoftBound
attacker discovers a code pointer that enables remote code technique with its CETS extension [36] can mitigate mem-
execution. ory corruption attacks, but they incur large overhead to the

793
execution (up to 4x slowdown). “fat-pointer” techniques such attacks.
as CCured [37] and Cyclone [28] have also been proposed On the attack side, direct memory disclosure attacks have
to provide spatial pointer safety, but they are not compatible been known for many years [54]. Indirect memory leakage
with existing C codebases. Other efforts such as Cling [4], such as fault analysis attacks (using crash, non-crash signal)
Memcheck [38], and AddressSanitizer [48] only provide tem- [9] or in general other forms of fault and timing analysis
poral pointer safety to prevent dangling pointer bugs such as attacks [47] have more recently been studied.
use-after-free. A number of hardware-enforced memory safety Non-control data attacks [13], not prevented by CPI, can
techniques have also been proposed including the Low-Fat also be very strong in violating many security properties;
pointer technique [32] and CHERI [58] which minimize the however, since they are not within the threat model of CPI
overhead of memory safety checks. we leave their evaluation to future work.
The high overhead of software-based complete memory
safety has motivated weaker memory defenses that can be X. C ONCLUSION
categorized into enforcement-based and randomization-based We present an attack on the recently proposed CPI tech-
defenses. In enforcement-based defenses, certain correct code nique. We show that the use of information hiding to protect
behavior that is usually extracted at compile-time is enforced the safe region is problematic and can be used to violate the
at runtime to prevent memory corruption. In randomization- security of CPI. Specifically, we show how a data pointer
based defenses different aspects of the code or the execution overwrite attack can be used to launch a timing side channel
environment are randomized to make successful attacks more attack that discloses the location of the safe region on x86-
difficult. 64. We evaluate the attack using a proof-of-concept exploit
The randomization-based category includes address space on a version of the Nginx web server that is protected with
layout randomization (ASLR) [43] and its medium-grained CPI, ASLR and DEP. We show that the most performant
[30] and fine-grained variants [57]. Different ASLR imple- and complete implementation of CPI (simpletable) can be
mentations randomize the location of a subset of stack, bypassed in 98 hours without crashes, and 6 seconds if a small
heap, executable, and linked libraries at load time. Medium- number of crashes (13) can be tolerated. We also evaluate
grained ASLR techniques such as Address Space Layout the work factor required to bypass other implementations
Permutation [30] permutes the location of functions within of CPI including a number of possible fixes to the initial
libraries as well. Fine-grained forms of ASLR such as Binary implementation. We show that information hiding is a weak
Stirring [57] randomize the location of basic blocks within paradigm that often leads to vulnerable defenses.
code. Other randomization-based defenses include in-place
instruction rewriting such as ILR [23], code diversification XI. ACKNOWLEDGMENT
using a randomizing compiler such as the multi-compiler This works is sponsored by the Office of Naval Research
technique [27], or Smashing the Gadgets technique [42]. under the Award #N00014-14-1-0006, entitled Defeating Code
Unfortunately, these defenses are vulnerable to information Resue Attacks Using Minimal Hardware Modifications and
leakage (memory disclosure) attacks [54]. It has been shown DARPA (Grant FA8650-11-C-7192). The opinions, interpre-
that even one such vulnerability can be used repeatedly by an tations, conclusions and recommendations are those of the
attacker to bypass even fine-grained forms of randomization authors and do not reflect official policy or position of the
[52]. Other randomization-based techniques include Genesis Office of Naval Research or the United States Government.
[60], Minestrone [29], or RISE [8] implement instruction set The authors would like to sincerely thank Dr. William
randomization using an emulation, instrumentation, or binary Streilein, Fan Long, the CPI team, Prof. David Evans, and
translation layer such as Valgrind [38], Strata [46], or Intel Prof. Greg Morrisett for their support and insightful comments
PIN [34] which in itself incurs a large overhead, sometimes and suggestions.
as high as multiple times slowdown to the applications.
In the enforcement-based category, control flow integrity R EFERENCES
(CFI) [3] techniques are the most prominent ones. They [1] Vulnerability summary for cve-2013-2028, 2013.
enforce a compile-time extracted control flow graph (CFG) at [2] Linux cross reference, 2014.
runtime to prevent control hijacking attacks. Weaker forms of [3] M. Abadi, M. Budiu, U. Erlingsson, and J. Ligatti.
CFI have been implemented in CCFIR [61] and bin-CFI [62] Control-flow integrity. In Proceedings of the 12th ACM
which allow control transfers to any valid target as opposed conference on Computer and communications security,
to the exact ones, but such defenses have been shown to pages 340–353. ACM, 2005.
be vulnerable to carefully crafted control hijacking attacks [4] P. Akritidis. Cling: A memory allocator to mitigate
that use those targets to implement their malicious intent dangling pointers. In USENIX Security Symposium, pages
[21]. The technique proposed by Backes et al. [7] prevents 177–192, 2010.
memory disclosure attacks by marking executable pages as [5] P. Akritidis, C. Cadar, C. Raiciu, M. Costa, and M. Cas-
non-readable. A recent technique [15] combines aspects of tro. Preventing memory error exploits with wit. In
enforcement (non-readable memory) and randomization (fine- Security and Privacy, 2008. SP 2008. IEEE Symposium
grained code randomization) to prevent memory disclosure on, pages 263–277. IEEE, 2008.

794
[6] J. P. Anderson. Computer security technology planning [21] E. Göktas, E. Athanasopoulos, H. Bos, and G. Portoka-
study. volume 2. Technical report, DTIC Document, lidis. Out of control: Overcoming control-flow integrity.
1972. In IEEE S&P, 2014.
[7] M. Backes, T. Holz, B. Kollenda, P. Koppe, [22] J. L. Henning. Spec cpu2006 benchmark descrip-
S. Nürnberger, and J. Pewny. You can run but tions. SIGARCH Comput. Archit. News, 34(4):1–17, Sept.
you can’t read: Preventing disclosure exploits in 2006.
executable code. In Proceedings of the 2014 ACM [23] J. Hiser, A. Nguyen, M. Co, M. Hall, and J. Davidson.
SIGSAC Conference on Computer and Communications Ilr: Where’d my gadgets go. In IEEE Symposium on
Security, pages 1342–1353. ACM, 2014. Security and Privacy, 2012.
[8] E. G. Barrantes, D. H. Ackley, T. S. Palmer, D. Ste- [24] G. Hunt, J. Larus, M. Abadi, M. Aiken, P. Barham,
fanovic, and D. D. Zovi. Randomized instruction set M. Fähndrich, C. Hawblitzel, O. Hodson, S. Levi,
emulation to disrupt binary code injection attacks. In N. Murphy, et al. An overview of the singularity project.
Proceedings of the 10th ACM Conference on Computer 2005.
and Communications Security, CCS ’03, pages 281–289, [25] intel. Introduction to intel memory protection extensions,
New York, NY, USA, 2003. ACM. 2013.
[9] A. Bittau, A. Belay, A. Mashtizadeh, D. Mazieres, and [26] T. Jackson, A. Homescu, S. Crane, P. Larsen, S. Brun-
D. Boneh. Hacking blind. In Proceedings of the 35th thaler, and M. Franz. Diversifying the software stack
IEEE Symposium on Security and Privacy, 2014. using randomized nop insertion. In Moving Target
[10] T. Bletsch, X. Jiang, V. Freeh, and Z. Liang. Jump- Defense, pages 151–173. 2013.
oriented programming: A new class of code-reuse attack. [27] T. Jackson, B. Salamat, A. Homescu, K. Manivannan,
In Proc. of the 6th ACM Symposium on Info., Computer G. Wagner, A. Gal, S. Brunthaler, C. Wimmer, and
and Comm. Security, pages 30–40, 2011. M. Franz. Compiler-generated software diversity. Moving
[11] N. Carlini and D. Wagner. Rop is still dangerous: Break- Target Defense, pages 77–98, 2011.
ing modern defenses. In USENIX Security Symposium, [28] T. Jim, J. G. Morrisett, D. Grossman, M. W. Hicks,
2014. J. Cheney, and Y. Wang. Cyclone: A safe dialect of c. In
[12] S. Checkoway, L. Davi, A. Dmitrienko, A. Sadeghi, USENIX Annual Technical Conference, General Track,
H. Shacham, and M. Winandy. Return-oriented program- pages 275–288, 2002.
ming without returns. In Proc. of the 17th ACM CCS, [29] A. D. Keromytis, S. J. Stolfo, J. Yang, A. Stavrou,
pages 559–572, 2010. A. Ghosh, D. Engler, M. Dacier, M. Elder, and D. Kien-
[13] S. Chen, J. Xu, E. C. Sezer, P. Gauriar, and R. K. Iyer. zle. The minestrone architecture combining static and
Non-control-data attacks are realistic threats. In Usenix dynamic analysis techniques for software security. In
Security, volume 5, 2005. SysSec Workshop (SysSec), 2011 First, pages 53–56.
[14] X. Chen, D. Caselden, and M. Scott. New zero-day IEEE, 2011.
exploit targeting internet explorer versions 9 through 11 [30] C. Kil, J. Jun, C. Bookholt, J. Xu, and P. Ning. Address
identified in targeted attacks, 2014. space layout permutation (aslp): Towards fine-grained
[15] S. Crane, C. Liebchen, A. Homescu, L. Davi, P. Larsen, randomization of commodity software. In Proc. of
A.-R. Sadeghi, S. Brunthaler, and M. Franz. Readactor: ACSAC’06, pages 339–348. Ieee, 2006.
Practical code randomization resilient to memory disclo- [31] V. Kuznetsov, L. Szekeres, M. Payer, G. Candea,
sure. In IEEE Symposium on Security and Privacy, 2015. R. Sekar, and D. Song. Code-pointer integrity. 2014.
[16] S. A. Crosby, D. S. Wallach, and R. H. Riedi. Opportu- [32] A. Kwon, U. Dhawan, J. Smith, T. Knight, and A. Dehon.
nities and limits of remote timing attacks. ACM Trans- Low-fat pointers: compact encoding and efficient gate-
actions on Information and System Security (TISSEC), level implementation of fat pointers for spatial safety and
12(3):17, 2009. capability-based security. In Proceedings of the 2013
[17] L. Davi, D. Lehmann, A.-R. Sadeghi, and F. Monrose. ACM SIGSAC conference on Computer & communica-
Stitching the gadgets: On the ineffectiveness of coarse- tions security, pages 721–732. ACM, 2013.
grained control-flow integrity protection. In USENIX [33] W. Landi. Undecidability of static analysis. ACM Letters
Security Symposium, 2014. on Programming Languages and Systems (LOPLAS),
[18] U. Drepper. Elf handling for thread-local storage, 2013. 1(4):323–337, 1992.
[19] F. Durvaux, M. Renauld, F.-X. Standaert, L. v. O. tot [34] C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser,
Oldenzeel, and N. Veyrat-Charvillon. Efficient removal of G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood.
random delays from embedded software implementations Pin: building customized program analysis tools with dy-
using hidden markov models. Springer, 2013. namic instrumentation. ACM Sigplan Notices, 40(6):190–
[20] C. Giuffrida, A. Kuijsten, and A. S. Tanenbaum. En- 200, 2005.
hanced operating system security through efficient and [35] A. J. Mashtizadeh, A. Bittau, D. Mazieres, and D. Boneh.
fine-grained address space randomization. In USENIX Cryptographically enforced control flow integrity. arXiv
Security Symposium, pages 475–490, 2012. preprint arXiv:1408.1451, 2014.

795
[36] S. Nagarakatte, J. Zhao, M. M. Martin, and S. Zdancewic. S. Lachmund, and T. Walter. Breaking the memory
Cets: compiler enforced temporal safety for c. In ACM secrecy assumption. In Proc. of EuroSec’09, pages 1–
Sigplan Notices, volume 45, pages 31–40. ACM, 2010. 8, 2009.
[37] G. C. Necula, S. McPeak, and W. Weimer. Ccured: Type- [54] R. Strackx, Y. Younan, P. Philippaerts, F. Piessens,
safe retrofitting of legacy code. ACM SIGPLAN Notices, S. Lachmund, and T. Walter. Breaking the memory
37(1):128–139, 2002. secrecy assumption. In Proceedings of EuroSec ’09,
[38] N. Nethercote and J. Seward. Valgrind: a framework for 2009.
heavyweight dynamic binary instrumentation. In ACM [55] L. Szekeres, M. Payer, T. Wei, and D. Song. Sok: Eternal
Sigplan Notices, volume 42, pages 89–100. ACM, 2007. war in memory. In Proc. of IEEE Symposium on Security
[39] H. Okhravi, T. Hobson, D. Bigelow, and W. Streilein. and Privacy, 2013.
Finding focus in the blur of moving-target techniques. [56] M. Tran, M. Etheridge, T. Bletsch, X. Jiang, V. Freeh,
IEEE Security & Privacy, 12(2):16–26, Mar 2014. and P. Ning. On the expressiveness of return-into-libc
[40] A. One. Smashing the stack for fun and profit. Phrack attacks. In Proc. of RAID’11, pages 121–141, 2011.
magazine, 7(49):14–16, 1996. [57] R. Wartell, V. Mohan, K. W. Hamlen, and Z. Lin. Binary
[41] OpenBSD. Openbsd 3.3, 2003. stirring: Self-randomizing instruction addresses of legacy
[42] V. Pappas, M. Polychronakis, and A. D. Keromytis. x86 binary code. In Proceedings of the 2012 ACM
Smashing the gadgets: Hindering return-oriented pro- conference on Computer and communications security,
gramming using in-place code randomization. In IEEE pages 157–168. ACM, 2012.
Symposium on Security and Privacy, 2012. [58] R. N. Watson, J. Woodruff, P. G. Neumann, S. W. Moore,
[43] PaX. Pax address space layout randomization, 2003. J. Anderson, D. Chisnall, N. Dave, B. Davis, B. Laurie,
[44] C. Percival. How to zero a buffer, Sept. 2014. S. J. Murdoch, R. Norton, M. Roe, S. Son, M. Vadera,
[45] W. Reese. Nginx: the high-performance web server and and K. Gudka. Cheri: A hybrid capability-system archi-
reverse proxy. Linux Journal, 2008(173):2, 2008. tecture for scalable software compartmentalization. In
[46] K. Scott, N. Kumar, S. Velusamy, B. Childers, J. W. IEEE Symposium on Security and Privacy, 2015.
Davidson, and M. L. Soffa. Retargetable and reconfig- [59] Y. Weiss and E. G. Barrantes. Known/chosen key
urable software dynamic translation. In Proceedings of attacks against software instruction set randomization.
the international symposium on Code generation and op- In Computer Security Applications Conference, 2006.
timization: feedback-directed and runtime optimization, ACSAC’06. 22nd Annual, pages 349–360. IEEE, 2006.
pages 36–47. IEEE Computer Society, 2003. [60] D. Williams, W. Hu, J. W. Davidson, J. D. Hiser, J. C.
[47] J. Seibert, H. Okhravi, and E. Soderstrom. Information Knight, and A. Nguyen-Tuong. Security through diver-
Leaks Without Memory Disclosures: Remote Side Chan- sity: Leveraging virtual machine technology. Security &
nel Attacks on Diversified Code. In Proceedings of the Privacy, IEEE, 7(1):26–33, 2009.
21st ACM Conference on Computer and Communications [61] C. Zhang, T. Wei, Z. Chen, L. Duan, L. Szekeres,
Security (CCS), Nov 2014. S. McCamant, D. Song, and W. Zou. Practical control
[48] K. Serebryany, D. Bruening, A. Potapenko, and flow integrity and randomization for binary executables.
D. Vyukov. Addresssanitizer: A fast address sanity In Security and Privacy (SP), 2013 IEEE Symposium on,
checker. In USENIX Annual Technical Conference, pages pages 559–573. IEEE, 2013.
309–318, 2012. [62] M. Zhang and R. Sekar. Control flow integrity for cots
[49] H. Shacham. The geometry of innocent flesh on the binaries. In USENIX Security, pages 337–352, 2013.
bone: Return-into-libc without function calls (on the
x86). In Proceedings of the 14th ACM conference on
Computer and communications security, pages 552–561.
ACM, 2007.
[50] H. Shacham. The geometry of innocent flesh on the bone:
Return-into-libc without function calls (on the x86). In
Proc. of ACM CCS, pages 552–561, 2007.
[51] H. Shacham, M. Page, B. Pfaff, E.-J. Goh, N. Modadugu,
and D. Boneh. On the effectiveness of address-space
randomization. In Proc. of ACM CCS, pages 298–307,
2004.
[52] K. Z. Snow, F. Monrose, L. Davi, A. Dmitrienko,
C. Liebchen, and A.-R. Sadeghi. Just-in-time code reuse:
On the effectiveness of fine-grained address space layout
randomization. In Security and Privacy (SP), 2013 IEEE
Symposium on, pages 574–588. IEEE, 2013.
[53] R. Strackx, Y. Younan, P. Philippaerts, F. Piessens,

796

You might also like