http://www.diva-portal.org
Postprint
This is the accepted version of a paper presented at ACM International Conference on
Computing Frontiers 2019. April 30 - May 2, 2019, Alghero, Sardinia, Italy.
Citation for the original published paper:
Sakalis, C., Alipour, M., Ros, A., Jimborean, A., Kaxiras, S. et al. (2019)
Ghost Loads: What is the Cost of Invisible Speculation?
In: CF ’19 Proceedings of the 16th ACM International Conference on Computing
Frontiers (pp. 153-163). Association for Computing Machinery (ACM)
https://doi.org/10.1145/3310273.3321558
N.B. When citing this work, cite the original published paper.
Permanent link to this version:
http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-383173
Ghost Loads:
What is the Cost of Invisible Speculation?
Christos Sakalis
Mehdi Alipour
Alberto Ros
Uppsala University
Uppsala, Sweden
christos.sakalis@it.uu.se
Uppsala University
Uppsala, Sweden
mehdi.alipour@it.uu.se
University of Murcia
Murcia, Spain
aros@ditec.um.es
Alexandra Jimborean
Stefanos Kaxiras
Magnus Själander
Uppsala University
Uppsala, Sweden
alexandra.jimborean@it.uu.se
Uppsala University
Uppsala, Sweden
stefanos.kaxiras@it.uu.se
Norwegian University of Science and
Technology
Trondheim, Norway
magnus.sjalander@ntnu.no
ABSTRACT
CCS CONCEPTS
Speculative execution is necessary for achieving high performance
on modern general-purpose CPUs but, starting with Spectre and
Meltdown, it has also been proven to cause severe security flaws.
In case of a misspeculation, the architectural state is restored to
assure functional correctness but a multitude of microarchitectural
changes (e.g., cache updates), caused by the speculatively executed
instructions, are commonly left in the system. These changes can be
used to leak sensitive information, which has led to a frantic search
for solutions that can eliminate such security flaws. The contribution of this work is an evaluation of the cost of hiding speculative
side-effects in the cache hierarchy, making them visible only after
the speculation has been resolved. For this, we compare (for the
first time) two broad approaches: i) waiting for loads to become
non-speculative before issuing them to the memory system, and ii)
eliminating the side-effects of speculation, a solution consisting of
invisible loads (Ghost loads) and performance optimizations (Ghost
Buffer and Materialization). While previous work, InvisiSpec, has
proposed a similar solution to our latter approach, it has done so
with only a minimal evaluation and at a significant performance
cost. The detailed evaluation of our solutions shows that: i) waiting for loads to become non-speculative is no more costly than
the previously proposed InvisiSpec solution, albeit much simpler,
non-invasive in the memory system, and stronger security-wise;
ii) hiding speculation with Ghost loads (in the context of a relaxed
memory model) can be achieved at the cost of 12% performance
degradation and 9% energy increase, which is significantly better
that the previous state-of-the-art solution.
· Security and privacy → Side-channel analysis and countermeasures.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
CF ’19, April 30-May 2, 2019, Alghero, Italy
© 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.
©ACM, 2019. This is the author’s version of the work. It is posted here by permission of
ACM for your personal use. Not for redistribution. The definitive version was published
in CF’19.
ACM ISBN 978-1-4503-6685-4/19/05. . . $15.00
https://doi.org/10.1145/3310273.3321558
ACM Reference Format:
Christos Sakalis, Mehdi Alipour, Alberto Ros, Alexandra Jimborean, Stefanos
Kaxiras, and Magnus Själander. 2019. Ghost Loads: What is the Cost of
Invisible Speculation?. In Proceedings of the 16th conference on Computing
Frontiers (CF ’19), April 30-May 2, 2019, Alghero, Italy. ACM, New York, NY,
USA, 11 pages. https://doi.org/10.1145/3310273.3321558
1
INTRODUCTION
Side-channel attacks rely on shared microarchitectural state and
behavior to leak information. Side-channel attacks on the cache
system have been practically demonstrated in many forms for the
L1 (when the attacker can share the same core as the target) [5], the
shared LLC cache (when the attacker can share the LLC) [40], and
the coherence protocol (when the attacker can simply be collocated
in the same system, under a single coherence domain, with the
target) [14]. While side-channel attacks have been known to the
architecture and the security communities for years, a new type
of speculative side-channel attacks has recently surfaced, with the
most well known ones being Spectre [19] and Meltdown [24].
As far as the target program is concerned, leaking information
across a covert side-channel is not illegal because it does not affect
the functional behavior of the program. The stealthy nature of a
speculative side-channel attack depends on the microarchitectural
state being changed by speculation even when the architectural
state remains unaffected.
In this paper, we are concerned with evaluating methods to
defend against these kinds of attacks and their performance impact.
We are not concerned with how the target program is coerced into
executing code that leaks information, as this is orthogonal to the
existence of speculative covert side-channels that leak information
to the attacker. Instead, the question we are answering is: What
is the cost of shutting down the speculative covert side-channels
existing in the cache hierarchy?
The main target is to guarantee that no microarchitectural state
throughout the system can be observed changing during speculative
execution. The obvious ways to achieve this are:
CF ’19, April 30-May 2, 2019, Alghero, Italy
(1) Do not speculate (e.g., wait until memory-access instructions
become non-speculative). This is not an attractive solution
for general-purpose computing, as speculative execution
offers substantial performance benefits.
(2) Speculate but obfuscate microarchitectural changes so that
an attacker cannot discern microarchitectural changes due
to speculation [35ś37].
(3) Speculate but do not change microarchitectural state until
the speculation can be resolved. The insight behind this idea
is that speculative execution by itself is not the problem,
rather the problem is speculative execution of instructions
that should not have been executed to begin with, i.e., transient instructions.
The first choice is, intuitively, detrimental for performance, as we
will have to wait for all memory-access instructions to become nonspeculative. Waiting for loads to become non-speculative is similar
to disabling speculation in general, as applications contain a large
number of loads and all computations depend on loaded values.
Still, we evaluate the cost of disabling speculation for loads and
compare it against other solutions. Our evaluation indicates that,
even though the cost is high (−50% to −74% performance, depending
on the implementation), competing solutions (e.g., InvisiSpec) might
come at a similar cost.
The second choice is akin to existing proposals for preventing side-channel attacks (for example partitioning or randomization [35]), but to the best of our knowledge, no such solution exists
for speculative attacks. Current obfuscation approaches can only
protect from side-channel attacks that take place with the attacker
on a different address space than the victim. As the authors of
Spectre show [19], it is also possible to perform devastating attacks
from within the same context, for example using the JavaScript
JIT compiler found on all modern web browsers. Furthermore, a
lot of the work around obfuscating the access patterns focuses on
protecting small regions of code that hold sensitive data such as
encryption keys. However, the encryption keys are not the only
sensitive data in the system. For example, on a web browser, the
user’s passwords are sensitive information, but so are a lot of the
rendered web pages. Because of these reasons we do not evaluate
this as a viable solution against speculative side-channel attacks.
The third choice is having speculative memory-access instructions that are untraceable. In our proposal, we call such accesses
Ghost loads. For example, a speculative load that hits in the cache is
untraceable if it does not modify the replacement state. If it misses
in the cache, it does not cause an eviction, and if it reaches the coherence domain it does not modify the coherence-protocol state. Any
prefetches generated because of that load are also made untraceable, preventing attackers from leaking information by training the
prefetcher. Recent works [16, 39], as well as industry have shown
interest in this type of solution. We propose our own variation
that we evaluate in detail to get insights into its performance and
energy cost. Since this is a new type of solution for a new type of
problems, we are interested in understanding the behavior of such
untraceable accesses in the memory system.
In this paper, we first explore the trade-off between delaying
an access (no speculation) and issuing it as a Ghost load. We then
explore the performance implications of Ghost loads, and how they
C. Sakalis, S. Kaxiras, A. Ros, A. Jimborean, and M. Själander
can be improved, achieving high security with low-performance
cost. We compare both the non-speculative solutions and the Ghost
loads with the current state-of-the-art solution, InvisiSpec [39]. We
show that, even though Ghost loads take a similar approach to
InvisiSpec, such a detailed performance evaluation is critical, as
the added complexity introduced by InvisiSpec is not supported
by the performance achieved. In fact, we will show that similar
performance can be achieved simply by delaying all speculative
accesses, without the need for any modifications to the memory
hierarchy and the cache coherence protocol.
In addition to loads, speculative stores must have similar properties with the additional requirement that the stored value remains
speculative. This is already accomplished by the use of a store queue,
so in this paper we are only concerned with loads. We focus on
presenting a detailed evaluation of single-threaded applications,
and as such, due to space constraints, coherence implications will
not be evaluated in detail.
In summary, we evaluate the following:
• InvisiSpec: We evaluate and compare InvisiSpec [39], the
current state-of-the-art solution, with our own proposals.
• Non-Speculative (Non-Spec): Speculative loads are not
issued and are instead delayed until they are no longer speculative. We evaluate two versions, one where all loads are
delayed until they reach the head of the reorder buffer (ROB)
and one where they are only delayed until they are guaranteed to not be squashed by another instruction. We call
these two versions Naive and Eager, respectively.
• Ghost loads: Speculative loads are executed as Ghost loads,
which are not allowed to modify any architecturally visible
state. In practice, this prevents caching for a large percentage
of the loads in the system. To mitigate the performance cost,
we also evaluate two additions to the Ghosts: The Ghost
Buffer (GhB), a small cache used exclusively by Ghost loads,
and Materialization (Mtz), which instantiates the side-effects
of Ghost loads after the speculation has been resolved.
• Ghost Prefetching: We propose and evaluate a method for
performing prefetching of Ghosts loads, something that is
missing from the current state-of-the-art solution.
Our results reveal that the Non-Spec solutions incur significant
costs, with 75% and 50% performance loss for the Naive and Eager
version, respectively. However, so does InvisiSpec, which shows
similar performance to the Eager Non-Spec solution, but with additional hardware complexity. Ghost loads, with the Ghost Buffer,
Materialization, and prefetching, show only 12% performance loss,
accompanied by a 9% increase in energy usage. Finally, without
prefetching of Ghost loads, the performance loss is increased to 22%.
2
SPECULATIVE SHADOWS
Speculative execution works by executing instructions and hiding
their architectural side-effects until it is certain that the speculation was correct. In case of a misspeculation, the misspeculated
instructions are squashed and execution is restarted from the initial
misspeculation point. The instructions that were executed but then
squashed are often referred to as transient instructions. In practice,
Ghost Loads: What is the Cost of Invisible Speculation?
1 Not
to be confused with InvisiSpec’s Speculation Buffer.
1.0
Ghost Loads
0.8
0.6
0.4
0.2
bz
ip
2
bw gcc
a
ga ve
m s
es
ms
cf
ze mil
u c
ca gro smp
ct m
us ac
A s
les DM
lie
na 3d
go md
b
so mk
hmplex
m
Ge
m sj er
lib sFDeng
qu T
an D
h2 tum
64
to ref
nt
om lb o
ne m
tp
as p
ta
r
sp wrf
G hin
M x
ea
n
0.0
1.0
Other
Load
Store
Control
0.8
0.6
0.4
0.2
0.0
ip
2
bw gcc
av
ga es
m
es
s
m
cf
ze milc
u
ca gro smp
ct ma
us c
A s
les DM
lie
na 3d
go md
b
so mk
hmplex
m
Ge
e
m sj r
lib sFDeng
qu T
an D
h2 tum
64
r
to ef
nt
o
om lb
ne m
tp
as p
ta
r
sp wrf
hi
nx
Normalized Causes of Speculation
Figure 1: The ratio of loads executed speculatively.
bz
almost all instructions are executed speculatively, with few exceptions. In modern out-of-order (OoO) processors, non-speculative
execution is achieved by waiting until an instruction is at the head
of the reorder buffer (ROB) before being executed. For our work,
we need to be more specific with which instructions are speculative
and which are not, so we define the concept of speculative shadows.
When an instruction that can cause the CPU to misspeculate is
inserted in the ROB, it casts a speculative shadow to all following
instructions. The shadow can be lifted either when the instruction
leaves the ROB, or if possible, when it can be determined that the
instruction can no longer cause a misspeculation. For example, an
unresolved branch causes a shadow to all instructions following it,
but after the branch is resolved and the branch target can be compared with the speculated branch target, the speculative shadow
can be lifted. Essentially, sources of speculation are all instructions
that can cause the wrong instructions to be executed speculatively,
which will then have to be squashed. We have categorized the
causes of speculation into four major classes:
Control: If the target of a branch is not known, then it may
be mispredicted, causing a misspeculation. This includes not only
branches but also all instructions that the branch predictor and the
branch target buffer (BTB) might identify as a branch.
Stores: Stores can cast a speculative shadow for three reasons.
Since they are memory operations, they might try to access either
memory that is i) unmapped or ii) memory that the current execution context does not have write permissions for. In that case, an
exception will be thrown and execution will have to be diverted.
Additionally, iii) if unknown addresses are involved, the store might
be conflicting with another store or a load on the same location.
Loads: Much like stores, they cast speculative shadows because
of exceptions or conflicts with other memory operations. Additionally, the coherence protocol can dictate that a speculatively loaded
value has to be invalidated, to enforce the CPU’s memory model.
Operations causing exceptions: This includes every floating
point operation, and integer division. For floating point operations,
exception throwing can usually be controlled by the programmer,
allowing the system to know in advance if floating point operations
can throw exceptions or not. Other instruction types that can cause
exceptions are rare in benchmark suites like SPEC so we do not
consider them for this work, but they can be handled the same way
we handle arithmetic operations.
We make the observation that, in order to keep track of whether
a load is under a speculative shadow or not, it is enough to know
if the oldest shadow casting instruction is older than the load in
question. We leverage this to track shadows in a structure similar to
a reorder buffer (ROB) but much smaller as only a small identifier
instead of the complete instruction is needed to be stored. We call
this structure the shadow buffer or SB for short1 . Shadow-casting
instructions are entered into the shadow buffer in the order they
are dispatched. Once an instruction no longer causes a shadow,
e.g., once a branch has been resolved, then the SB entry is updated.
Only when an instruction reaches the head of the SB and no longer
casts a shadow does it get removed from the SB. This assures that
the oldest shadow-casting instruction is always at the head of the
SB and that they exit the SB in program order, similar to how the
CF ’19, April 30-May 2, 2019, Alghero, Italy
Figure 2: A breakdown of the instructions casting speculative shadows on executed loads.
ROB assures that instructions are committed in order. To determine
whether a load is no longer covered by a shadow it is enough
to i) mark the load at the time it is dispatched with the current
youngest shadow casting instruction and ii) compare the load’s
shadow casting instruction with the head of the SB. If the load’s
shadow casting instructions is older than the head of the SB, then
the load is not under a speculative shadow. This simple mechanism
assures that there does not exist any shadow casting instruction
older than the load.
Figure 1 displays the ratio of loads executed speculatively in each
benchmark, based on the conditions discussed above. The ratio of
speculatively executed loads is high for all benchmarks, ranging
from 67% (h264ref) and up to 97% (GemsFDTD), with a mean of 87%.
This strongly indicates that any proposed solution will have to have
low overhead, as the majority of the load operations will be affected.
We will further discuss this subject in the evaluation, see Section 6.
Figure 2 presents a breakdown of the type of instructions casting
shadows over executed load instructions in the applications found
in the SPEC2006 [1] benchmark suite. The hardware parameters of
the evaluated system can be found in Table 1. Note that only the
oldest shadow is taken into consideration and eliminating one of the
shadow types will not necessarily lead to an equal decrease in the
number of speculatively executed loads. Previous work by Alipour
et al. [2] discusses the implications of eliminating different causes of
speculation in modern out-of-order processors. We observe that the
CF ’19, April 30-May 2, 2019, Alghero, Italy
majority of the speculation is caused by the first three categories:
control (branches), stores, and loads. For applications that have
frequent and irregular control flow, such as gcc, the branches cause
the majority of the speculation. On the other hand, in applications
that utilize more regular operations, such as the mathematically
heavy bwaves and cactusADM, the speculation is caused mostly
by loads or stores. Overall, for the majority of the applications,
we observe that not just one type of operation is responsible for
causing speculative execution of loads in each benchmark.
3
NON-SPECULATIVE LOADS
An intuitive solution for hiding speculative loads is to not perform
speculative loads in the first place. We evaluate two versions of this
solution, the Naive and the Eager Non-Speculative (Non-Spec). In
the Naive version, all loads in the application are delayed until they
reach the head of the ROB, ensuring that they cannot be squashed
by any other instructions. In the Eager version, loads are only
delayed until they are no longer covered by a speculative shadow.
With the eager approach, loads are delayed for a smaller period of
time and some loads are not delayed at all (Figure 1). With these
two versions, we provide both an upper and a lower bound for the
cost of disallowing the execution of speculative loads.
An interesting property of the solutions that simply delay speculative loads is that no additional steps are necessary in order to
support the TSO memory model. Out-of-order CPUs that provide
TSO already have all the necessary mechanisms to ensure that instructions being scheduled and executed out-of-order do not break
the guarantees of the memory model. Since the non-speculative solutions described only affect the scheduling of the load instructions,
TSO can be supported out-of-the-box. Expectedly, more relaxed
memory models, such as the popular Release Consistency (RC), are
also supported without any modifications.
Another benefit of the non-speculative solutions is that they
prevent visible side-effects not only in the caches but also in the
TLBs, the main memory, the coherence state, and any other part
of the system a data fetch operation might affect. Other solutions
have to either explicitly provide ways of hiding the side-effects
in the memory system or risk leaking information. For example,
Pessl et al. [32] have already developed a side-channel that exploits
the timing of the DRAM system instead of the cache hierarchy.
4
GHOST LOADS
The principle behind invisible speculation we focus on is performing speculative loads as uncacheable accesses that do not
alter the cache state. In our work, we call these uncacheable accesses łGhostsž. A Ghost load is a load operation that is undetectable
in the memory hierarchy, specifically in the cache hierarchy.
Ghost loads have the following characteristics:
(1) They are issued like any other memory request.
(2) They can hit on any level of the memory hierarchy including
private caches, shared caches, and main memory, in which
case the response data are returned directly to the core. The
replacement state in the cache remains unchanged.
(3) In case of a miss, no cache fills are performed with the response data, and no coherence states are modified.
C. Sakalis, S. Kaxiras, A. Ros, A. Jimborean, and M. Själander
(4) They use a separate set of miss status handling registers
(MSHRs) that are not accessible by regular loads. Coalescing
between Ghosts is allowed only if they belong to the same
context, and so is coalescing Ghosts into in-flight regular
loads. Coalescing regular loads into Ghosts is not allowed.
(5) Any prefetches caused by Ghost loads are also marked as
Ghosts. This assures that an attacker will not be able to train
the prefetcher and abuse it as a side-channel. How Ghost
prefetches are made possible is discussed in Section 4.2.
(6) Similarly to the data caches, the relevant translation lookaside buffers (TLBs) are also not updated during the lookups
performed by Ghost requests.
In practice, it is not possible to have memory operations that
are completely side-effect free. For example, even if we disregard
any updates to the state of the system, simply by performing a
memory request it is possible to introduce detectable contention in
the system. Ghost loads and similar techniques aim to balance the
exposure of the side-effects of speculation while also limiting the
performance and energy costs.
As Ghosts do not interact with the coherence mechanisms of
the memory system, only memory models that by default do not
enforce any memory ordering are supported, such as the popular RC
model. Under RC, memory ordering is enforced through explicitly
placed fences, while all other (non-synchronizing) instructions are
free to execute with any order. When a special instruction, such
as a memory fence or an atomic operation is detected, it acts a
speculation barrier, preventing loads that proceed the fence in
program order to be issued before it. This way, no Ghost loads
can be reordered with the fence and the memory order is enforced
through the underlying coherence mechanism. More restrictive
memory models, such as TSO, require additional mechanisms (e.g.
Validations in InvisiSpec) that can lead to additional performance
overheads. Evaluating such mechanisms is beyond the scope of
this work, so we will assume that the Ghost loads mechanism only
supports RC or other similarly relaxed memory models.
4.1
Materialization
Performing the majority of load accesses as Ghosts can lead to
a significant performance degradation, caused primarily by the
disruption of caching. To regain some of that lost performance, the
data used by a Ghost load can be installed in the cache after the
load is no longer speculative. Materialization (Mtz) is a mechanism
for achieving that, by performing all the microarchitectural sideeffects of the memory request after the load is no longer speculative.
When a load is ready to be committed, an Mtz request is sent to the
memory system. The request will act as a regular load request, with
the difference that it will not load any data into a CPU register. As
such, it will install the cache line into the appropriate caches and
update the replacement data. However, in order to limit the number
of Mtz requests sent into the memory system, when a cache receives
an Mtz request for data it already contains, it will not forward
that request to any other caches in the hierarchy. Finally, if an
excessively large number of requests is generated, the older requests
will be discarded. An additional, alternative form of Materialization
that does not act as a normal memory request will be discussed in
the next section.
Ghost Loads: What is the Cost of Invisible Speculation?
4.2
Ghost Buffer
The Ghost Buffer (GhB) is a very small (e.g., eight entries for the
L1), read-only cache that is only accessible by Ghost or Mtz requests. Multiple Ghost buffers exist in the system, each attached
to its respective cache. Any data returned by a Ghost request are
placed in the GhB instead of the cache. It is also possible to facilitate prefetching of Ghost requests, by modifying the prefetcher
to recognize Ghost requests and tag prefetches initiated by them
as Ghosts. The prefetched cache lines can later by installed by the
GhB into the cache when the speculation has been resolved.
While introducing the GhB by itself improves the performance
of the Ghosts, it is when combined with the Materialization mechanism that the GhB really excels. Specifically, when an Mtz request
misses in a cache, it then checks the GhB. If the data are found, then
they are installed in the cache, eliminating the need to fetch them
from somewhere else in the memory hierarchy. It is even possible
to not let the Mtz packets reach the main memory, which is what
the evaluation in Section 6 is assuming.
Since the GhB is itself a small cache, it can be susceptible to
the same side-channel attacks that regular caches are, in this case
referred to as łtransient speculative attacksž [16]. These are attacks
that specifically target the structures used to hold the data for transient instructions. To prevent this, the design and behavior of the
GhB needs to be adapted accordingly, both for attacks originating
from a different execution context and for attacks originating from
the same context.
4.2.1 Different Execution Context. For attacks involving a different
execution context, we need to make sure that the entries in the GhB
belonging to different contexts are isolated. Previous works [10, 11,
15, 18, 20, 21, 26, 27, 30, 36, 37] have already identified solutions to
achieve this in regular caches, but given the special characteristics
of the GhB, we propose the following:
For L1 caches. We suggest flushing the L1 GhB every time there
is a context switch. Additionally, to support simultaneous multithreading (SMT), the L1 GhB is statically partitioned between the
different threads. This assures that it is not possible for one execution context to access the L1 GhB of another context. Since the
GhBs are read-only, no write-backs are required during a flush
operation, which can be achieved simply by resetting all the valid
bits in the GhB metadata.
For other caches. We instead suggest using a solution that randomizes the cache placement based on the execution context. This
can be achieved by associating a random bit mask with each context
and then Xoring the address bits with that mask. By changing the
mask during flushing, we can prevent an attacker from deciphering
the access pattern of the application or the mask. Since the GhB
needs to be efficiently flushed only for a specific context, without
flushing the rest of the data, we propose associating each cache
line with its context ID and an epoch timestamp, as proposed by
Yan et al. [39]. Each time the pipeline is squashed due to a misspeculation, the epoch is increased. By only allowing Ghost requests
to access data from the GhB when the context ID and the current
epoch match, we are effectively flushing the cache without the need
to wait for the GhB to actually be flushed, which would introduce
delays for GhBs other than the L1.
CF ’19, April 30-May 2, 2019, Alghero, Italy
4.2.2 Same Execution Context. The solutions described above protect the GhB from attacks from a different execution context, but
it is still possible to orchestrate an attack from within the same
context. For example, a JavaScript JIT compiler running on the
same thread as the main browser process can potentially leak sensitive user information. These attacks are harder to defend against,
since we do not want to isolate the accesses from the same context
from one another. To solve this issue, we flush the GhB every time
a misspeculation is detected and the transient state needs to be
squashed. This prevents an attacker from first using speculative
execution to load data in the GhB and then initiating a separate
speculative region to extract the previously loaded data.
For example, an attacker could use a Meltdown variant to read
a secret value from privileged memory, and then use it to index
a probe array. After the misspeculation has been corrected and
the execution has been restored, the attacker can trigger a second
speculative region where the probe array is probed. By timing the
second region, the attacker can identify if the probe was a hit or
a miss in the GhB, and by extension extract the secret value. By
flushing the GhB between speculative regions, this is no longer
possible. Instead, the attacker has to incorporate everything in
one speculative region. This makes the attack very hard for two
reasons: First, to prevent the program from crashing, the speculative
region needs to misspeculate. This means that the only information
that can be extracted from the speculative region is how long the
execution took. Second, the execution time of the speculative region
depends only on the misspeculated instruction that initiates it. The
moment the instruction is determined to have been misspeculated,
execution is aborted and squashed, unaffected by the timing of any
other instructions in the region. The only possible way to change
the time the region takes to execute is to affect the timing of the
initial speculative instruction, which is not easy to do in a way
that depends on the loaded secret value, as the instructions that
read and use the secret value succeed (in program order) the initial
speculative instruction.
5
INVISISPEC
InvisiSpec [39], much like Ghost loads, blocks speculative sidechannel attacks in the cache hierarchy by hiding the side-effects
of speculative loads until the speculation has been resolved. This
is achieved by preventing speculative loads from disturbing the
cache state in any way and instead installing the data in a small,
temporary buffer in the core. After the speculative shadow has been
resolved, the data are then verified and installed in the L1 cache.
InvisiSpec takes a similar approach to our Ghost loads but with
two major differences. First, the buffer utilized by InvisiSpec to
hide the speculative data has a one-to-one correspondence with
the entries of the load queue (LQ). In contrast, the Ghost Buffer
functions as a read-only cache that might contain any random set
of cache lines. Because of this, Ghost can support prefetching that
is triggered by speculative loads, while InvisiSpec can only safely
prefetch non-speculatively.
The second difference is that, in order to support total store order
(TSO) coherence, InvisiSpec needs to validate the data from the
speculative buffer before installing them in the cache. This means
that load instructions need to wait for the validation to succeed
CF ’19, April 30-May 2, 2019, Alghero, Italy
Table 1: The simulation parameters used for the evaluation.
Parameter
Technology node
Processor type
Processor frequency
ROB/IQ/LQ/SQ entries
Decode/Issue/Commit width
Cache line size
L1 private cache size
L1 private cache access latency
L2 shared cache size
L2 shared cache access latency
Value
22nm
out-of-order x86 CPU
3.4GHz
192/64/32/32
8
64 bytes
32KiB, 8-way, 8 entries GhB
2 cycles
1MiB, 16-way, 256 entries GhB
20 cycles
Average # of Accesses
Regular
Ghost
2.5
2.0
1.5
bz
ip
2
bw gcc
a
ga ve
m s
es
ms
cf
ze mil
u c
ca gro smp
ct m
us ac
A s
les DM
lie
na 3d
go md
b
so mk
hmplex
m
Ge
m sj er
lib sFDeng
qu T
an D
h2 tum
64
to ref
nt
om lb o
ne m
tp
as p
ta
r
sp wrf
hi
nx
1.0
Figure 3: The average number of consecutive, exclusively
regular or Ghost accesses to a cache line in the L1. For each
cache line we maintain a counter that is incremented every
time consecutive accesses of the same type occur and is reset
for every access of the opposite type. The deviation, though
significant, is omitted for clarity.
before being committed, potentially increasing the pressure on the
ROB and the LQ. Additionally, only one validation at a time (per
execution context) can be in-flight in the system. This limits the
amount of memory level parallelism (MLP) that InvisiSpec can take
advantage of, even when optimizations to convert some of the validations to what they call exposures, which are not constraint by the
same limitations. Ghost loads only support release consistency (RC)
and are not constrained by any of these issues. We will see in the
evaluation (Section 6) how these differences affect the performance
of the two approaches.
6
EVALUATION
We start by discussing the characteristics of the Ghost loads and
then proceed to the effects that the various evaluated solutions
have on the memory behavior of the applications, as well as the
performance and energy implications.
6.1
with McPAT [22] and Cacti [23] for the performance and energy
evaluation. Each Ghost Buffer (GhB) is modelled as a small cache
in McPAT, on the same level of the hierarchy as the cache it is
attached to. For the DRAM, we use the power model built into
Gem5, as McPAT does not provide one. We perform the simulation by first skipping one billion instructions in atomic mode and
then simulating in detail for another three billion instructions. The
characteristics of the simulated system can be found in Table 1. We
simulate a system with a private L1 and a shared L2 cache. As the
baseline we use a large, unmodified OoO CPU. For InvisiSpec we
only simulate the TSO version because, according to its authors, the
performance is not improved significantly in the RC version [39].
6.2
3.5
3.0
C. Sakalis, S. Kaxiras, A. Ros, A. Jimborean, and M. Själander
Methodology
We evaluate the different solutions and the suggested improvements using the SPEC2006 benchmark suite [1], from which we
exclude five applications due to simulation issues encountered in
the baseline simulation. We use the Gem5 [4] simulator combined
Ghost Loads
Before discussing the performance and energy implications of the
proposed solutions, we need to first understand the behavior of the
Ghost loads, and how they interact with regular memory accesses.
Figure 3 presents the number of consecutive Ghost loads to a
cache line between two regular loads, and vice versa. To simplify
the figure, only the average is presented. We observe that for all
benchmarks, the average number of accesses is very small, around
two consecutive accesses for most, both for Ghost and for regular
loads. We also know from our data (not shown) that the number of
cycles between consecutive loads, either Ghosts or regular, is very
small, which is not surprising given how common loads are in the
instruction mix. These numbers indicate that the data stored in the
GhB will be short-lived, as when a regular access installs data in the
cache the GhB data becomes obsolete. In addition, Materialization
requests need to be fast, as regular accesses to the same cache line
follow closely after the Ghosts. These observations indicate that
large buffers holding all speculative data are unnecessary, as quite
often some other load instruction will install the data in the cache
before the speculative load has a chance to.
We have also observed that, with the exception of Non-Spec, all
solutions increase the number of loads that are executed as Ghosts,
because the introduced delays in the execution introduce, in turn,
more speculation in the pipeline.
6.3
Memory Behavior
With the exception of the Non-Spec solutions, the proposed methods alter how the cache hierarchy works. Hence, the behavior of
the cache is what primarily affects the performance and energy
characteristics of the system.
Figure 4 features the L1-data and L2 cache miss ratios for all
the different solutions. Both Naive and Eager Non-Spec, which we
present as alternatives to invisible speculation, reduce the number
of L1 and L2 misses, as well as the number of DRAM reads. This
is due to the memory accesses and the overall execution being
slowed down, which provides more time for the memory system to
respond to requests. Additionally, without speculative execution,
only the data that are actually needed by the applications are read
and brought into the caches. However, whether Non-Spec reduces
the amount of cache misses or not is irrelevant, as it does not
change how the cache system works. Instead, we will see in the
next section that the cost of the Non-Spec methods is observed in
Ghost Loads: What is the Cost of Invisible Speculation?
baseline
L1D Miss Ratio
1.00
invisispec
CF ’19, April 30-May 2, 2019, Alghero, Italy
nonspec-naive
nonspec-eager
ghosts
0.75
0.50
0.25
nonspec-naive
nonspec-eager
f
hi
nx
G
M
ea
n
wr
sp
p
ta
r
as
tp
om
ne
o
lb
m
to
nt
lie
3d
na
m
d
go
bm
k
so
pl
ex
hm
m
er
sje
Ge
ng
m
sF
DT
lib
D
qu
an
tu
m
h2
64
re
f
sm
p
gr
om
ac
ca
s
ct
us
AD
M
m
ze
u
les
invisispec
ilc
m
baseline
m
cf
gc
c
bw
av
es
ga
m
es
s
L2 Miss Ratio
1.00
gc
c
bw
av
es
ga
m
es
s
bz
ip
2
0.00
ghosts
0.75
0.50
0.25
n
ea
nx
M
hi
f
wr
wr
sp
ta
r
ta
r
as
tp
p
m
G
om
ne
lb
o
nt
f
to
h2
64
re
m
D
tu
lib
qu
an
ng
DT
er
Ge
m
sF
sje
ex
pl
hm
m
k
so
d
bm
m
na
go
3d
lie
ca
ct
us
om
gr
les
m
ac
s
AD
M
p
ilc
cf
ze
us
m
bz
ip
2
0.00
Figure 4: L1-data & L2 cache miss ratios.
invisispec
nonspec-naive
nonspec-eager
ghosts
Normalized DRAM Reads
3.5
3.0
2.5
2.0
1.5
1.0
0.5
n
ea
f
nx
M
G
hi
sp
as
tp
p
m
ne
om
lb
o
re
nt
f
to
64
h2
lib
qu
an
tu
m
D
DT
ng
er
Ge
m
sF
sje
m
k
ex
hm
pl
so
d
m
bm
go
na
3d
lie
les
s
AD
M
us
ct
ca
gr
om
m
ac
p
ilc
us
ze
m
cf
m
gc
c
bw
av
es
ga
m
es
s
bz
ip
2
0.0
Figure 5: Normalized DRAM reads. The number of DRAM writes is not affected by the different solutions.
the performance of the benchmarks. For this reason, we will only
focus on the Ghosts and InvisiSpec in this section.
Both Ghost loads and InvisiSpec leave the mean L1-data miss
ratio unaffected. If we examine each benchmark individually, we
will see that there are some benchmarks were the miss ratio is
increased, with the worst case being libquantum for InvisiSpec,
but overall there are no significant differences. The same is not
true for the L2 cache, where InvisiSpec shows increased miss ratios
in a number of benchmarks, with the most prominent ones being
zeusmp, leslie3d, and sphinx. Ghosts also see an increase in the
miss ratio, but not as significant, with the most problematic applications begin bwaves and leslie3d. Overall, we observe a bigger
variation in the L2 miss ratios than we do in the L1, with mean miss
ratio increases of 25% for InvisiSpec and 13% for the Ghosts.
We can observe these differences more prominently in the number of DRAM reads performed in the system, as seen in Figure 5.
We only focus on the reads because the number of writes are not
significantly affected by any of the evaluated solutions. InvisiSpec
features a mean increase of 31%, while Ghosts are at 27%. The worst
applications are zeusmp and libquantum for InvisiSpec and mcf
and gromacs for Ghosts, with all four featuring more than 2× reads
when compared to the baseline.
Overall, we can conclude that both InvisiSpec and Ghosts introduce memory system overheads, with the Ghosts outperforming
InvisiSpec by a few percentage points. On the other hand, the NonSpec solutions do not have such negative side-effects.
CF ’19, April 30-May 2, 2019, Alghero, Italy
invisispec
C. Sakalis, S. Kaxiras, A. Ros, A. Jimborean, and M. Själander
nonspec-naive
nonspec-eager
ghosts
Baseline: OoO CPU
Normalized IPC
1.0
0.8
0.6
0.4
0.2
in
x
ea
n
f
G
M
wr
wr
sp
h
as
ta
r
ta
r
tp
p
m
lb
ne
om
to
nt
o
m
d
go
bm
k
so
pl
ex
hm
m
er
sje
Ge
ng
m
sF
DT
lib
D
qu
an
tu
m
h2
64
re
f
3d
na
ca
les
lie
s
ct
us
AD
M
p
ac
gr
om
ilc
m
ze
us
m
cf
m
bw
av
es
ga
m
es
s
gc
c
bz
ip
2
0.0
Figure 6: Normalized performance (IPC).
baseline
Normalized Energy Usage
4
invisispec
nonspec-naive
nonspec-eager
ghosts
3
2
1
n
ea
f
nx
M
G
hi
sp
p
tp
as
m
lb
ne
om
o
nt
f
to
re
64
h2
an
tu
m
D
DT
qu
lib
ng
er
Ge
m
sF
sje
m
hm
ex
k
pl
so
d
bm
go
m
na
3d
lie
les
AD
M
us
om
ct
gr
ca
ze
us
m
ac
s
p
ilc
m
s
cf
m
es
m
es
ga
bw
av
c
gc
bz
ip
2
0
Figure 7: Normalized energy usage. The bottom (shaded) part represents the static (leakage) energy of the system.
6.4
Performance
Figure 6 presents the relative performance of the various simulated
solutions, in the form of instructions per cycle (IPC) normalized
to an unmodified out-of-order CPU. As anticipated, the Non-Spec
solutions, where loads are executed non-speculatively, suffer from
a steep performance loss. We observe a mean performance loss of
75% for the Naive version, and 50% for Eager. Load instructions are
very common in applications, and all computation depends on the
values loaded from memory. We also know that the large majority
of loads in the SPEC2006 benchmarks are speculative (Figure 1).
The combination of these two facts means that with the Non-Spec
solutions not only are most loads delayed, but that these loads also
constitute a large and latency-critical part of the applications. In
essence, by using the Non-Spec solutions, we force the large and
power hungry out-of-order CPU to execute with similar constraints
to that of a slower, strict in-order CPU but without the accompanying area and power reduction benefits. In the Naive Non-Spec in
particular, which makes it impossible to have more than one load
in flight in parallel, the CPU cannot take advantage of the memory
level parallelism (MLP) available in the applications.
However, we can also observe quite similar results with InvisiSpec, which reaches a mean performance loss of 46%, just 4%
points better than the Eager Non-Spec version. InvisiSpec is even
outperformed by the Eager Non-Spec in five benchmarks, namely
zeusmp, leslie3d, libquantum, lbm, and omnetpp. Given the reduced complexity, additional security, and reduced area overhead,
these performance results indicate that simply delaying speculative
instructions is in fact a better alternative to InvisiSpec.
However, neither Eager Non-Spec or InvisiSpec can compete
with the Ghost loads when it comes to performance, because the
latter is featuring a mean performance loss of only 12%. In addition,
Ghosts consistently offer good performance, outperforming InvisiSpec in every single benchmark, and with only two applications,
bwaves and leslie3d, dropping below the −25% mark. For both
of these applications, we observe in Figure 4 that they suffer from
an increase in the L2 miss ratio. However, that in itself does not
explain the performance loss, as other applications have the same
problem. Instead, by analyzing the detailed statistics made available
from Gem5, we observed that they also suffer from a large increase
in the number of MSHR misses, both in the L1 and the L2, and
particularly MSHR misses for Ghost accesses. Other applications
also suffer from an increase in MSHR misses, but without a similar
increase of the L2 miss ratio. This indicates that not only are bwaves
and leslie3d suffering from an increased miss ratio, but also that
Ghost Loads: What is the Cost of Invisible Speculation?
their available MLP is not fully harnessed. Since regular accesses
cannot be coalesced with in-flight Ghosts due to security, and we
know from Figure 3 that regular accesses and Ghosts are tightly
interleaved, not all of the available MLP in the application can be
taken advantage of.
6.5
Energy Efficiency
Figure 7 presents the results of the energy usage evaluation. Both
versions of Non-Spec affect the execution time of the benchmarks
negatively, but not the number of cache misses and accesses to
the main memory, thus affecting more the static energy usage
of the system. We observe a mean energy increase of 2.5× for
the Naive version, and 49% for Eager. The energy usage increases
are not directly proportional to the execution time because i) the
dynamic activity is not increased proportionally (the same number
of instructions is still executed) and ii) the static power increase is
reduced due to power gating, as modelled by McPAT. Given that
a much smaller percentage of loads can be in-flight at the same
time (Figure 1), the resources of the system (e.g., load queue) can be
scaled down to help reduce the energy usage, but we do not take
this into consideration in the evaluation.
As one would expect, for Non-Spec, there is a direct and clear
correlation between the benchmarks that perform badly in terms of
performance and the benchmarks with the highest energy increase.
On the contrary, the remaining solutions also negatively affect the
memory access patterns during execution, which leads to large
changes in the dynamic energy usage of the system as well. For
InvisiSpec, we see an mean energy usage increase of 46%, which
is very close to the energy usage in the Eager Non-Spec version,
further supporting our view that the latter is a better solution.
Finally, Ghosts outperform both InvisiSpec and the Eager NonSpec version significantly, with a mean energy increase of 9% over
the baseline. A large part of this low overhead is due to the small
execution time overhead, while also keeping the GhB sizes small.
6.6
Contribution of each Ghost Mechanism
The proposed Ghost loads solution consists of a combination of
different mechanisms, namely the Ghost Buffer, Materialization,
and Ghost prefetching. When discussing the Ghosts in the rest of
the paper we assume that all of these mechanisms are used, in order
to achieve the best possible performance. However, it is important
to understand how much each of these mechanisms contributes to
the final result, and if all of them are necessary.
Figure 8 contains the performance results for different Ghost
configurations. In addition to the baseline and the full Ghost load
solution, it contains results for four additional Ghost versions, one
with neither the GhB nor Mtz (ghosts-nothing), one without Mtz
(ghosts-nomtz), one without the GhB (ghosts-noghb), and finally one
without Ghost prefetching enabled (ghosts-nopref ).
With the Ghost version that uses neither the GhB nor Mtz (ghostsnothing), we observe a mean performance loss of 61% under the
baseline, which is worse than both InvisiSpec and the Eager NonSpec version. The benchmark that is hurt the most by this version
of the Ghosts is bwaves, a benchmark that is already sensitive to the
other solutions, reaching a performance loss of 93%. Introducing
CF ’19, April 30-May 2, 2019, Alghero, Italy
the GhB (ghosts-nomtz) leads to a significant performance improvement, with a mean performance loss of 33%, outperforming both
InvisiSpec and the Eager Non-Spec version. Since we have support for prefetching Ghost loads, this version benefits from it even
without Mtz support, as the latter is not necessary for training and
triggering the prefetcher. Similar results can be seen when Mtz is
introduced (but without a GhB ś ghosts-noghb), featuring a mean
performance loss of 37%. Note that without a GhB, Ghost prefetching is not possible, which additionally hurts the performance of
this version. We can easily conclude from these results that both
mechanisms are necessary in order to achieve good performance.
Finally, we have evaluated the performance of the Ghost loads
when Ghost prefetching is not available (ghosts-nopref ). The prefetcher is instead trained and triggered by the Materializations sent
once the speculation has been resolved, much like in InvisiSpec.
Note that both the GhB and Mtz are used in these results, only
the mechanism for prefetching based on Ghost loads has been disabled. With this version, we observe a performance loss of 22%
under the baseline, 10% points more than Ghosts with prefetching
(ghosts). This demonstrates the importance of considering prefetching when proposing such solutions, something that is overlooked
by InvisiSpec (and SafeSpec).
7
RELATED WORK
This work was inspired by the Meltdown [24] and Spectre [19]
attacks published in the early 2018. However, as we explain in the
introduction, our goal is not to solve just these attacks but to provide
and evaluate a solution that prevents information leakage from
cache memory accesses during speculative execution in general.
For Meltdown and Spectre, CPU vendors have promised specific
solutions in future microcode updates. Software solutions also exist,
both for operating systems [6] and for compilers [31, 34]. These
solutions can incur very high costs, especially for applications
that perform numerous system calls. Unfortunately, since these
solutions are based on the existing attacks, they might not work
for the new attacks and variants that have been released since the
initial Meltdown and Spectre attacks were discovered.
Non-speculative cache side-channel attacks have existed for
some time [3, 12ś14, 29, 35, 40]. These attacks focus on observing the difference in execution time caused by the cache behavior
in order to leak information from the target application. A lot of
these attacks focus specifically on attacking cryptographic functions that utilize S-boxes or S-box style encryption tables, such as
AES. By detecting the access pattern of the cryptographic algorithm
to the S-box, the secret encryption key can be identified. Since such
keys are extremely sensitive data, numerous solutions have been
proposed [7, 9ś11, 15, 17, 18, 21, 25ś27, 30, 33, 36, 37, 41], usually
utilizing either partitioning, cache locking, or obfuscation through
random noise introduced to the access patterns of the application.
These solutions focus on preventing the side-channel attacks by
either preventing or hiding the timing differences observed by the
cache accesses. In our work, we focus instead on preventing or
hiding the side-effects of speculative execution that, in combination
with the traditional cache attacks mentioned above, could otherwise be used to leak sensitive information. Additionally, many of
these methods focusing on AES and similar algorithms only protect
CF ’19, April 30-May 2, 2019, Alghero, Italy
ghosts
C. Sakalis, S. Kaxiras, A. Ros, A. Jimborean, and M. Själander
ghosts-nothing
ghosts-nomtz
ghosts-noghb
ghosts-nopref
Baseline: OoO CPU
Normalized IPC
1.0
0.8
0.6
0.4
0.2
x
ea
n
in
M
G
f
wr
sp
h
as
ta
r
tp
p
m
lb
ne
om
3d
to
nt
o
na
m
d
go
bm
k
so
pl
ex
hm
m
er
sje
Ge
ng
m
sF
DT
lib
D
qu
an
tu
m
h2
64
re
f
ca
les
lie
s
ct
us
AD
M
p
ac
gr
om
ilc
us
m
ze
m
cf
m
bw
av
es
ga
m
es
s
2
bz
ip
gc
c
0.0
Figure 8: The contribution of each Ghost mechanism to performance (IPC).
against attacks that try to determine the access pattern to the S-box.
This means that they only work for small amounts of explicitly
specified data, which is different from our solution, which secures
the whole address space.
When it comes to protecting against speculative attacks, in addition to InvisiSpec by Yan et al. [39], Khasawneh et al. [16] have
also been working on a similar solution, named SafeSpec, but their
approach differs from our in a number of different aspects. First of
all, they discuss instructions caches, something that both we and
Yan et al. have left as future work. However, their approach only
considers branches as the source of speculation. Instead, both Ghost
loads and InvisiSpec consider all instructions that might cause a
misspeculation and lead to squashing in the pipeline. Additionally,
prefetching is not discussed, except in the context of the prefetching
effect that previously squashed loads have in the system. As we
have shown in the evaluation disregarding prefetching can lead
to a significant performance degradation. Furthermore, similarly
to InvisiSpec, SafeSpec requires a buffer large enough to hold all
in-flight loads. The exact buffer size not specified in their work,
but it is implied that it is larger than the 8-entry L1 GhB the Ghost
loads utilize in our evaluation. Unfortunately, we are not aware of
the paper having been published to a peer-reviewed venue and as
we cannot ascertain the implementation details of their solution,
we cannot compare the designs and the performance differences.
Finally, attacks and defences for the rest of the memory system
also exist [8, 28, 32, 35, 38, 42]. These focus on different areas from
our proposal and should be considered as complementary solutions.
8
FUTURE WORK
Coherence is an integral part of the caches in modern CPUs, so
developing a solution and evaluating it in detail is important. Similarly, other parts of the cache hierarchy, such as the TLBs and the
instruction caches, need an equally in-depth evaluation.
In parallel to further reducing the side-effects of speculative execution that are exposed to the system, we need to also investigate
ways of further improving the performance and reducing the energy cost. In the current implementation, all Ghost loads that are
successfully committed issue a Materialization packet to the cache.
This is not efficient, as we have determined that a large number
of times this results to an L1 hit, which leads to the cache simply
discarding the Mtz packet. If we could know in advance (or predict)
if a Materialization is necessary, we could avoid the unnecessary
L1 lookups. Furthermore, not all speculative loads need to be executed as Ghosts. For example, not all data in a system are sensitive
and need to be secured. If these data constitute a large enough
part of the memory accessed during an applications execution, the
performance and energy costs of our proposed solutions can be
reduced. Finally, the number of speculative loads can be reduced if
speculative instructions are disambiguated in advance, using either
hardware or compiler techniques.
9
CONCLUSION
We have evaluated in detail the performance and energy costs of
different solutions for the problem of speculative execution leaking
information through microarchitectural side-effects in the cache
hierarchy. Namely, we have evaluated three different solutions, a
non-speculative approach, where speculative loads are delayed until they can be safely issued, Ghost loads, where loads are issued
but their side-effects are kept hidden, and InvisiSpec, the current
state-of-the-art solution. We have shown that while the cost of the
non-speculative solution is, expectedly, high, it is similar to that
of InvisiSpec. At the same time, the non-speculative solution is
simpler, as it requires no modifications to the cache hierarchy, has
lower area and energy overhead, and protects from a wider range
of speculative side-channel attacks. We have also shown that is
is possible to reduce the cost of hiding speculation even further
using our Ghost loads, a solution similar to InvisiSpec but with key
design differences that lead to significant performance improvements. Overall, we have not only provided more efficient solutions
than the current state of the art, but we have also shown, through
our detailed evaluation, that a more thorough understanding of the
problem and the performance implications is necessary in order to
formulate effective solutions.
ACKNOWLEDGMENTS
This work was funded by Vetenskapsrådet project 2015-05159. The
computations were performed on resources provided by SNIC
through UPPMAX and by UNINETT Sigma2.
Ghost Loads: What is the Cost of Invisible Speculation?
CF ’19, April 30-May 2, 2019, Alghero, Italy
REFERENCES
[1] 2006. SPEC CPU Benchmark Suite. http://www.specbench.org/osg/cpu2006/.
[2] M. Alipour, T. E. Carlson, and S. Kaxiras. 2017. Exploring the Performance Limits
of Out-of-order Commit. In Proceedings of the ACM International Conference on
Computing Frontiers. ACM, New York, NY, USA, 211ś220. https://doi.org/10.
1145/3075564.3075581
[3] D. J. Bernstein. 2005. Cache-timing attacks on AES. (2005).
[4] N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness,
D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D.
Hill, and D. A. Wood. 2011. The gem5 Simulator. ACM SIGARCH Computer
Architecture News 39, 2 (Aug. 2011), 1ś7. Issue 2. https://doi.org/10.1145/2024716.
2024718
[5] J. Bonneau and I. Mironov. 2006. Cache-Collision Timing Attacks Against AES.
In Cryptographic Hardware and Embedded Systems. Springer Berlin Heidelberg,
201ś215.
[6] J. Corbet. 2017. KAISER: hiding the kernel from user space. https://lwn.net/
Articles/738975/.
[7] L. Domnitser, A. Jaleel, J. Loew, N. Abu-Ghazaleh, and D. Ponomarev. 2012.
Non-monopolizable Caches: Low-complexity Mitigation of Cache Side Channel
Attacks. ACM Transactions on Architecture and Code Optimization 8, 4 (Jan. 2012),
35:1ś35:21. https://doi.org/10.1145/2086696.2086714
[8] X. Dong, Z. Shen, J. Criswell, A. Cox, and S. Dwarkadas. 2018. Spectres, Virtual
Ghosts, and Hardware Support. In Proceedings of the International Workshop
on Hardware and Architectural Support for Security and Privacy. ACM, 5:1ś5:9.
https://doi.org/10.1145/3214292.3214297
[9] H. Fang, S. S. Dayapule, F. Yao, M. Doroslovac̆ki, and G. Venkataramani. 2018.
Prefetch-guard: Leveraging hardware prefetches to defend against cache timing channels. In Proceedings of the IEEE International Symposium on Hardware
Oriented Security and Trust. 187ś190. https://doi.org/10.1109/HST.2018.8383912
[10] A. Fuchs and R. B. Lee. 2015. Disruptive Prefetching: Impact on Side-channel Attacks and Cache Designs. In Proceedings of the 8th ACM International Systems and
Storage Conference. ACM, 14:1ś14:12. https://doi.org/10.1145/2757667.2757672
[11] D. Gruss, J. Lettner, F. Schuster, O. Ohrimenko, I. Haller, and M. Costa. 2017.
Strong and efficient cache side-channel protection using hardware transactional
memory. In Proceedings of the USENIX Security Symposium. USENIX Association,
217ś233.
[12] D. Gruss, R. Spreitzer, and S. Mangard. 2015. Cache Template Attacks: Automating
Attacks on Inclusive Last-Level Caches.. In Proceedings of the USENIX Security
Symposium. 897ś912.
[13] D. Gullasch, E. Bangerter, and S. Krenn. 2011. Cache Games ś Access-Based
Cache Attacks on AES to Practice. In Proceedings of the IEEE Symposium on
Security and Privacy. 490ś505. https://doi.org/10.1109/SP.2011.22
[14] G. Irazoqui, T. Eisenbarth, and B. Sunar. 2016. Cross Processor Cache Attacks.
In Proceedings of the ACM on Asia Conference on Computer and Communications
Security. ACM, 353ś364. https://doi.org/10.1145/2897845.2897867
[15] G. Keramidas, A. Antonopoulos, D. N. Serpanos, and S. Kaxiras. 2008. Non
deterministic caches: a simple and effective defense against side channel attacks.
Design Automation for Embedded Systems 12, 3 (Sept. 2008), 221ś230. https:
//doi.org/10.1007/s10617-008-9018-y
[16] K. N. Khasawneh, E. M. Koruyeh, C. Song, D. Evtyushkin, D. Ponomarev, and
N. Abu-Ghazaleh. 2018. SafeSpec: Banishing the Spectre of a Meltdown with
Leakage-Free Speculation. arXiv:1806.05179 [cs] (June 2018). arXiv:1806.05179
http://arxiv.org/abs/1806.05179
[17] T. Kim, M. Peinado, and G. Mainar-Ruiz. 2012. STEALTHMEM: System-level
Protection Against Cache-based Side Channel Attacks in the Cloud. In Proceedings
of the USENIX Security Symposium. USENIX Association, 11ś11. http://dl.acm.
org/citation.cfm?id=2362793.2362804
[18] V. Kiriansky, I. Lebedev, S. Amarasinghe, S. Devadas, and J. Emer. 2018. DAWG:
A Defense Against Cache Timing Attacks in Speculative Execution Processors.
[19] P. Kocher, D. Genkin, D. Gruss, W. Haas, M. Hamburg, M. Lipp, S. Mangard, T.
Prescher, M. Schwarz, and Y. Yarom. 2018. Spectre Attacks: Exploiting Speculative
Execution. arXiv:1801.01203 [cs] (Jan. 2018). arXiv:1801.01203 http://arxiv.org/
abs/1801.01203
[20] J. Kong, O. Aciicmez, J.-P. Seifert, and H. Zhou. 2008. Deconstructing New
Cache Designs for Thwarting Software Cache-based Side Channel Attacks. In
Proceedings of the ACM Workshop on Computer Security Architectures. ACM, 25ś34.
https://doi.org/10.1145/1456508.1456514
[21] J. Kong, O. Aciicmez, J. P. Seifert, and H. Zhou. 2009. Hardware-software integrated approaches to defend against software cache-based side channel attacks.
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
In Proceedings of the International Symposium High-Performance Computer Architecture. 393ś404. https://doi.org/10.1109/HPCA.2009.4798277
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi.
2009. McPAT: An Integrated Power, Area, and Timing Modeling Framework
for Multicore and Manycore Architectures. In Proceedings of the ACM/IEEE International Symposium on Microarchitecture. 469ś480. https://doi.org/10.1145/
1669112.1669172
S. Li, K. Chen, J. H. Ahn, J. B. Brockman, and N. P. Jouppi. 2011. CACTI-P:
Architecture-Level Modeling for SRAM-based Structures with Advanced Leakage
Reduction Techniques. In Proceedings of the IEEE/ACM International Conference
on Computer-Aided Design. IEEE, 694ś701. http://dx.doi.org/10.1109/ICCAD.
2011.6105405
M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas, S. Mangard, P. Kocher, D.
Genkin, Y. Yarom, and M. Hamburg. 2018. Meltdown. arXiv:1801.01207 [cs] (Jan.
2018). arXiv:1801.01207 http://arxiv.org/abs/1801.01207
F. Liu and R. B. Lee. 2013. Security Testing of a Secure Cache Design. In Proceedings
of the International Workshop on Hardware and Architectural Support for Security
and Privacy. ACM, 3:1ś3:8. https://doi.org/10.1145/2487726.2487729
F. Liu and R. B. Lee. 2014. Random Fill Cache Architecture. In Proceedings of
the ACM/IEEE International Symposium on Microarchitecture. 203ś215. https:
//doi.org/10.1109/MICRO.2014.28
F. Liu, H. Wu, K. Mai, and R. B. Lee. 2016. Newcache: Secure Cache Architecture
Thwarting Cache Side-Channel Attacks. IEEE Micro 36, 5 (Sept. 2016), 8ś16.
https://doi.org/10.1109/MM.2016.85
R. Martin, J. Demme, and S. Sethumadhavan. 2012. TimeWarp: Rethinking
Timekeeping and Performance Monitoring Mechanisms to Mitigate Side-channel
Attacks. In Proceedings of the International Symposium on Computer Architecture.
IEEE Computer Society, 118ś129. http://dl.acm.org/citation.cfm?id=2337159.
2337173
D. A. Osvik, A. Shamir, and E. Tromer. 2006. Cache attacks and countermeasures:
the case of AES. In Proceedings of the RSA Conference. Springer, 1ś20.
D. Page. 2005. Partitioned Cache Architecture as a Side-Channel Defence Mechanism. IACR Cryptology ePrint archive 2005, 280 (2005).
A. Pardoe. 2018. Spectre mitigations in MSVC. https://blogs.msdn.microsoft.
com/vcblog/2018/01/15/spectre-mitigations-in-msvc/.
P. Pessl, D. Gruss, C. Maurice, M. Schwarz, and S. Mangard. 2016. DRAMA:
Exploiting DRAM Addressing for Cross-CPU Attacks.. In Proceedings of the
USENIX Security Symposium. USENIX Association, 565ś581.
M. K. Qureshi. 2018. CEASER: Mitigating Conflict-Based Cache Attacks via
Encrypted-Address and Remapping. In Proceedings of the ACM/IEEE International
Symposium on Microarchitecture.
P. Turner. 2018. Retpoline: a software construct for preventing branch-targetinjection. https://support.google.com/faqs/answer/7625886.
Z. Wang and R. B. Lee. 2006. Covert and Side Channels Due to Processor Architecture. In Proceedings of the Annual Computer Security Applications Conference.
473ś482. https://doi.org/10.1109/ACSAC.2006.20
Z. Wang and R. B. Lee. 2007. New Cache Designs for Thwarting Software Cachebased Side Channel Attacks. In Proceedings of the International Symposium on
Computer Architecture. ACM, 494ś505. https://doi.org/10.1145/1250662.1250723
Z. Wang and R. B. Lee. 2008. A Novel Cache Architecture with Enhanced Performance and Security. In Proceedings of the ACM/IEEE International Symposium on Microarchitecture (MICRO 41). IEEE Computer Society, 83ś93. https:
//doi.org/10.1109/MICRO.2008.4771781
Z. Wu, Z. Xu, and H. Wang. 2012. Whispers in the Hyper-space: High-speed
Covert Channel Attacks in the Cloud.. In Proceedings of the USENIX Security
Symposium. USENIX Association, 159ś173.
M. Yan, J. Choi, D. Skarlatos, A. Morrison, C. W. Fletcher, and J. Torrellas. 2018.
InvisiSpec: Making Speculative Execution Invisible in the Cache Hierarchy. In
Proceedings of the ACM/IEEE International Symposium on Microarchitecture.
Y. Yarom and K. Falkner. 2014. FLUSH+ RELOAD: A High Resolution, Low Noise,
L3 Cache Side-Channel Attack.. In Proceedings of the USENIX Security Symposium,
Vol. 1. 22ś25.
Y. Zhang and M. K. Reiter. 2013. Düppel: Retrofitting Commodity Operating
Systems to Mitigate Cache Side Channels in the Cloud. In Proceedings of the ACM
SIGSAC Conference on Computer & Communications Security. ACM, 827ś838.
https://doi.org/10.1145/2508859.2516741
X. Zhuang, T. Zhang, and S. Pande. 2004. HIDE: An Infrastructure for Efficiently
Protecting Information Leakage on the Address Bus. In Proceedings of the Architectural Support for Programming Languages and Operating Systems. ACM, 72ś84.
https://doi.org/10.1145/1024393.1024403