DAWG: A Defense Against Cache Timing Attacks in Speculative Execution Processors
DAWG: A Defense Against Cache Timing Attacks in Speculative Execution Processors
DAWG: A Defense Against Cache Timing Attacks in Speculative Execution Processors
set metadata
3) Reducing privacy leakage from caches: Since Spectre
== == == ==
policy-masked
attacks are outside of the threat model anticipated by prior work, hit
way hits
isolation
most prior defenses are ineffective. LLC defenses against cross-
core attacks, such as SHARP [58] and RIC [28], do not stop cache line
1.1
8/16 ways 9/16 11/16 13/16 15/16 16/16 ways
1
15
bc pr tc
0.8 10
234578 234578 234578 234578 234578 234578
blackscholes facesim fluidanimate freqmine raytrace x264 5
Ways allocated
0
12 13 14 15 16 17 18 19 20 12 13 14 15 16 17 18 19 20 12 13 14 15 16 17 18 19 20
Fig. 8. Way partitioning performance at low associativity in all caches (8-way
L1, 8-way L2, and 16-way L3). 3
bfs cc sssp
locality [5]. The power law structure, however, implies that 0.9
there is diminishing return from each additional L3 way. As 0.8
shown, at half cache capacity (8/16 L3, Section VI-B1), there 15 16 17 18 19 20 21 22 23 15 16 17 18 19 20 21 22 23 15 16 17 18 19 20 21 22 23
is at most 15% slowdown (bc and tc benchmarks) at the Graph Size (log N)
20
largest simulated size (2 vertices). A characteristic eye is
formed when the performance curves of different configurations Fig. 10. Read-only sharing effects of two instances using Shared vs Private
cross over the working set boundary (e.g., graph size of 217 ). data of varying scale (1-thread instances). Actual Haswell 20-way 30 MB L3.
Performance with working sets smaller or larger than the
effective cache capacity is unaffected — at the largest size cc, while CAT may lose block history effectively exhibiting random
pr, and sssp show 1–4% slowdown. replacement – a minor, workload-dependent perturbation. In
Reserving for the OS (Section VI-B3), one way (6% of LLC simulations (not shown), we replicate a known observation that
capacity) adds no performance overhead to most workloads. random replacement occasionally performs better than LRU
The only exception would be a workload caught in the eye, near cache capacity. We did not observe this effect with NRU
e.g., PageRank at 217 has 30% overhead (Fig. 9), while at 216 replacement.
or 218 — 0% difference. 2) Read-only Sharing: CAT QoS guarantees a lower bound
on a workload’s effective cache capacity, while DAWG isolation
D. CAT versus DAWG forces a tight upper bound. DAWG’s isolation reduces cache
We analyze and evaluate scenarios based on the degree of capacity compared to CAT when cache lines are read-only
code and data sharing across domains. shared across mutually untrusting protection domains. CAT
1) No Sharing: There is virtually no performance differ- permits hits across partitions where code or read-only data are
ence between secure DAWG partitioning, and insecure CAT unsafely shared. We focus on read-only data in our evaluation,
partitioning in the absence of read-sharing across domains. as benchmarks with low L1i MPKI like GAPBS, PARSEC, or
DAWG reduces interference in replacement metadata updates SPECCPU are poorly suited to study code cache sensitivity.
and enforces the intended replacement strategy within a domain, We analyze real applications using one line modifications
to GAPBS to fork (a single-thread process) either before or We are grateful to Carl Waldspurger for his valuable feedback
after creating in-memory graph representations. The first results on the initial design as well as the final presentation of this
in a private graph for each process, while the latter simulates paper. We also thank our anonymous reviewers and Julian Shun
mmap of a shared graph. The shared graphs access read-only for helpful questions and comments.
data across domains in the baseline and CAT, while DAWG R EFERENCES
has to replicate data in domain-private ways. Since zsim does
[1] ARM, “ARM Cortex-A72 MPCore processor technical reference manual,”
not simulate TLBs, we ensure different virtual addresses are 2015.
used to avoid false sharing. We first verified in simulation [2] ARM, “ARM Software Speculation Barrier,” https://github.com/ARM-
that DAWG, with memory shared across protection domains, software/speculation-barrier, January 2018.
[3] S. Banescu, “Cache timing attacks,” 2011, [Online; accessed 26-January-
behaves identically to CAT and the baseline with private data. 2014].
Next, we demonstrate (in Fig. 10) that these benchmarks [4] S. Beamer, K. Asanović, and D. A. Patterson, “The GAP
show little performance difference on real hardware [15] for benchmark suite,” CoRR, vol. abs/1508.03619, 2015. [Online]. Available:
http://arxiv.org/abs/1508.03619
most data sizes; Shared baseline models Shared CAT, while [5] S. Beamer, K. Asanović, and D. A. Patterson, “Locality exists in graph
Private baseline models Shared DAWG. The majority of cycles processing: Workload characterization on an Ivy Bridge server,” in 2015
are spent on random accesses to read-write data, while read- IEEE International Symposium on Workload Characterization, IISWC
2015, Atlanta, GA, USA, October 4-6, 2015, 2015, pp. 56–65.
only data is streamed sequentially. Although read-only data is [6] C. Bienia, “Benchmarking modern multiprocessors,” Ph.D. dissertation,
much larger than read-write data (e.g., 16 times more edges Princeton University, January 2011.
than vertices), prefetching and scan- and thrash- resistant [7] J. Bonneau and I. Mironov, “Cache-collision timing attacks against
AES,” in Cryptographic Hardware and Embedded Systems-CHES 2006.
policies [26], [45] further reduce the need for cache resident Springer, 2006, pp. 201–215.
read-only data. Note that even at 223 vertices these effects are [8] B. B. Brumley and N. Tuveri, “Remote timing attacks are still practical,”
immaterial; real-world graphs have billions of people or pages. in Computer Security–ESORICS. Springer, 2011.
[9] D. Brumley and D. Boneh, “Remote timing attacks are practical,”
Computer Networks, 2005.
E. Domain copy microbenchmark [10] D. Brumley and D. Boneh, “Remote timing attacks are practical,”
Computer Networks, 2005.
We simulated a privilege level change at simulated system [11] C. Carruth, “Introduce the ”retpoline” x86 mitigation technique for variant
calls for user-mode TCP/IP. Since copy_from_user and #2 of the speculative execution vulnerabilities,” http://lists.llvm.org/
copy_to_user permit hits in the producer’s ways, there is pipermail/llvm-commits/Week-of-Mon-20180101/513630.html, January
2018.
no performance difference against the baseline (not shown). [12] D. Chakrabarti, Y. Zhan, and C. Faloutsos, “R-MAT: A recursive model
for graph mining,” in Proceedings of the Fourth SIAM International
VII. C ONCLUSION Conference on Data Mining, Lake Buena Vista, Florida, USA, April
22-24, 2004, 2004, pp. 442–446.
DAWG protects against attacks that rely on a cache state- [13] J. Corbet, “KAISER: hiding the kernel from user space,” https://lwn.net/
Articles/738975/, November 2017.
based channel, which are commonly referred to as cache-timing [14] L. Domnitser, A. Jaleel, J. Loew, N. Abu-Ghazaleh, and D. Ponomarev,
attacks, on speculative execution processors with reasonable “Non-monopolizable caches: Low-complexity mitigation of cache side
overheads. The same policies can be applied to any set- channel attacks,” Transactions on Architecture and Code Optimization
(TACO), 2012.
associative structure, e.g., TLB or branch history tables. DAWG [15] E5v3, “Intel Xeon Processor E5-2680 v3(30M Cache, 2.50
has its limitations and additional techniques are required to GHz),” http://ark.intel.com/products/81908/Intel-Xeon-Processor-E5-
block exfiltration channels different from the cache channel. 2680-v3-30M-Cache-2 50-GHz.
[16] N. El-Sayed, A. Mukkara, P.-A. Tsai, H. Kasture, X. Ma, and D. Sanchez,
We believe that techniques like DAWG are needed to restore “KPart: A hybrid cache partitioning-sharing technique for commodity
our confidence in public cloud infrastructure, and hardware and multicores,” in Proceedings of the 24th international symposium on High
software co-design will help minimize performance overheads. Performance Computer Architecture (HPCA-24), February 2018.
[17] C. W. Fletcher, L. Ren, X. Yu, M. V. Dijk, O. Khan, and S. Devadas,
A good proxy for the performance overheads of secure “Suppressing the oblivious RAM timing channel while making infor-
DAWG is Intel’s existing, though insecure, CAT hardware. mation leakage and program efficiency trade-offs,” in 2014 IEEE 20th
Traditional QoS uses of CAT, however, differ from desired International Symposium on High Performance Computer Architecture
(HPCA), Feb 2014, pp. 213–224.
DAWG protection domains’ configurations. Research on soft- [18] Graph500, “Graph 500 benchmark,” http://www.graph500.org/
ware resource management strategies can therefore commence specifications.
with evaluation of large scale workloads on CAT. CPU vendors [19] D. Gruss, J. Lettner, F. Schuster, O. Ohrimenko, I. Haller, and M. Costa,
“Strong and efficient cache side-channel protection using hardware
can similarly analyze the cost-benefits of increasing cache transactional memory,” in 26th USENIX Security Symposium (USENIX
capacity and associativity to accommodate larger numbers of Security 17). Vancouver, BC: USENIX Association, 2017, pp. 217–233.
active protection domains. [Online]. Available: https://www.usenix.org/conference/usenixsecurity17/
technical-sessions/presentation/gruss
[20] D. Gruss, C. Maurice, K. Wagner, and S. Mangard, “Flush+Flush: a fast
VIII. ACKNOWLEDGMENTS and stealthy cache attack,” in International Conference on Detection of
Intrusions and Malware, and Vulnerability Assessment. Springer, 2016,
Funding for this research was partially provided by NSF pp. 279–299.
grant CNS-1413920; DARPA contracts HR001118C0018, [21] A. Herdrich, E. Verplanke, P. Autee, R. Illikkal, C. Gianos, R. Singhal, and
HR00111830007, and FA87501720126; Delta Electronics, R. Iyer, “Cache QoS: From concept to reality in the Intel Xeon processor
E5-2600 v3 product family,” in 2016 IEEE International Symposium
DARPA & SPAWAR contract N66001-15-C-4066; DoE award on High Performance Computer Architecture (HPCA), March 2016, pp.
DE-FOA0001059, and Toyota grant LP-C000765-SR. 657–668.
[22] J. Horn, “Reading privileged memory with a side-channel,” https: Can you have it both ways?” in Proceedings of the 48th
//googleprojectzero.blogspot.com/2018/01/, January 2018. International Symposium on Microarchitecture, ser. MICRO-48. New
[23] Intel Corp., “Improving real-time performance by utilizing Cache York, NY, USA: ACM, 2015, pp. 1–12. [Online]. Available:
Allocation Technology,” April 2015. http://doi.acm.org/10.1145/2830772.2830773
[24] G. Irazoqui, M. S. Inci, T. Eisenbarth, and B. Sunar, “Wait a minute! a ser. ISCA ’07. New York, NY, USA: ACM, 2007, pp. 381–391.
fast, cross-VM attack on AES,” in International Workshop on Recent [Online]. Available: http://doi.acm.org/10.1145/1250662.1250709
Advances in Intrusion Detection. Springer, 2014, pp. 299–319. [46] Richard Grisenthwaite, “Cache Speculation Side-channels,” January 2018.
[25] A. Jaleel, J. Nuzman, A. Moga, S. C. Steely, and J. Emer, “High
[47] D. Sanchez and C. Kozyrakis, “Vantage: Scalable and efficient fine-
performing cache hierarchies for server workloads: Relaxing inclusion
grain cache partitioning,” in 38th Annual International Symposium on
to capture the latency benefits of exclusive caches,” in 2015 IEEE 21st
Computer Architecture (ISCA), June 2011, pp. 57–68.
International Symposium on High Performance Computer Architecture
(HPCA), Feb 2015, pp. 343–353. [48] D. Sanchez and C. Kozyrakis, “ZSim: Fast and accurate microarchitectural
[26] A. Jaleel, K. B. Theobald, S. C. S. Jr., and J. S. Emer, “High simulation of thousand-core systems,” in Proceedings of the 40th
performance cache replacement using re-reference interval prediction Annual International Symposium on Computer Architecture-ISCA, vol. 13.
(RRIP),” in 37th International Symposium on Computer Architecture Association for Computing Machinery, 2013, pp. 23–27.
(ISCA 2010), June 19-23, 2010, Saint-Malo, France, 2010, pp. 60–71. [49] R. Sprabery, K. Evchenko, A. Raj, R. B. Bobba, S. Mohan,
[Online]. Available: http://doi.acm.org/10.1145/1815961.1815971 and R. H. Campbell, “A novel scheduling framework leveraging
[27] S. Kanev, J. P. Darago, K. Hazelwood, P. Ranganathan, T. Moseley, hardware cache partitioning for cache-side-channel elimination in
G. Y. Wei, and D. Brooks, “Profiling a warehouse-scale computer,” in clouds,” CoRR, vol. abs/1708.09538, 2017. [Online]. Available:
2015 ACM/IEEE 42nd Annual International Symposium on Computer http://arxiv.org/abs/1708.09538
Architecture (ISCA), June 2015, pp. 158–169. [50] G. Taylor, P. Davies, and M. Farmwald, “The TLB slice - a low-cost high-
[28] M. Kayaalp, K. N. Khasawneh, H. A. Esfeden, J. Elwell, N. Abu- speed address translation mechanism,” SIGARCH Computer Architecture
Ghazaleh, D. Ponomarev, and A. Jaleel, “Ric: Relaxed inclusion caches News, 1990.
for mitigating llc side-channel attacks,” in 2017 54th ACM/EDAC/IEEE [51] L. Torvalds, “Re: Page colouring,” 2003. [Online]. Available:
Design Automation Conference (DAC), June 2017, pp. 1–6. http://yarchive.net/comp/linux/cache coloring.html
[29] R. E. Kessler and M. D. Hill, “Page placement algorithms for large
real-indexed caches,” Transactions on Computer Systems (TOCS), 1992. [52] C. Trippel, D. Lustig, and M. Martonosi, “MeltdownPrime and Spec-
[30] V. Kiriansky and C. Waldspurger, “Speculative buffer overflows: Attacks trePrime: Automatically-synthesized attacks exploiting invalidation-based
and defenses,” ArXiv e-prints, Jul. 2018. coherence protocols,” arXiv preprint arXiv:1802.03802, 2018.
[31] P. Kocher, D. Genkin, D. Gruss, W. Haas, M. Hamburg, M. Lipp, [53] P. Turner, “Retpoline: a software construct for preventing branch-
S. Mangard, T. Prescher, M. Schwarz, and Y. Yarom, “Spectre attacks: target-injection,” https://support.google.com/faqs/answer/7625886, Jan-
Exploiting speculative execution,” ArXiv e-prints, Jan. 2018. uary 2018.
[32] P. C. Kocher, “Timing attacks on implementations of Diffie-Hellman, [54] C. A. Waldspurger, “Memory resource management in VMware
RSA, DSS, and other systems,” in Advances in Cryptology (CRYPTO). ESX server,” in Proceedings of the 5th Symposium on Operating
Springer, 1996. Systems Design and implementationCopyright Restrictions Prevent
[33] J. Kong, O. Aciicmez, J.-P. Seifert, and H. Zhou, “Deconstructing new ACM from Being Able to Make the PDFs for This Conference
cache designs for thwarting software cache-based side channel attacks,” Available for Downloading, ser. OSDI ’02. Berkeley, CA, USA:
in workshop on Computer security architectures. ACM, 2008. USENIX Association, 2002, pp. 181–194. [Online]. Available:
[34] J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan, “Gaining http://dl.acm.org/citation.cfm?id=1060289.1060307
insights into multicore cache partitioning: Bridging the gap between [55] Y. Wang, A. Ferraiuolo, D. Zhang, A. C. Myers, and G. E. Suh, “SecDCP:
simulation and real systems,” in HPCA. IEEE, 2008. Secure dynamic cache partitioning for efficient timing channel protection,”
[35] M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas, S. Mangard, in Proceedings of the 53rd Annual Design Automation Conference, ser.
P. Kocher, D. Genkin, Y. Yarom, and M. Hamburg, “Meltdown,” ArXiv DAC ’16. New York, NY, USA: ACM, 2016, pp. 74:1–74:6. [Online].
e-prints, Jan. 2018. Available: http://doi.acm.org/10.1145/2897937.2898086
[36] F. Liu, Q. Ge, Y. Yarom, F. Mckeen, C. Rozas, G. Heiser, and R. B.
Lee, “CATalyst: Defeating last-level cache side channel attacks in cloud [56] Z. Wang and R. B. Lee, “New cache designs for thwarting software cache-
computing,” in HPCA, Mar 2016. based side channel attacks,” in International Symposium on Computer
[37] F. Liu and R. B. Lee, “Random fill cache architecture,” in Microarchi- Architecture (ISCA), 2007.
tecture (MICRO). IEEE, 2014. [57] Y. Xu, W. Cui, and M. Peinado, “Controlled-channel attacks: Deter-
[38] F. Liu, Y. Yarom, Q. Ge, G. Heiser, and R. B. Lee, “Last-level cache ministic side channels for untrusted operating systems,” in 2015 IEEE
side-channel attacks are practical,” in Security and Privacy. IEEE, 2015. Symposium on Security and Privacy, May 2015, pp. 640–656.
[39] Y. Oren, V. P. Kemerlis, S. Sethumadhavan, and A. D. Keromytis, “The [58] M. Yan, B. Gopireddy, T. Shull, and J. Torrellas, “Secure
spy in the sandbox – practical cache attacks in javascript,” arXiv preprint hierarchy-aware cache replacement policy (sharp): Defending against
arXiv:1502.07373, 2015. cache-based side channel atacks,” in Proceedings of the 44th Annual
[40] D. A. Osvik, A. Shamir, and E. Tromer, “Cache attacks and counter- International Symposium on Computer Architecture, ser. ISCA ’17.
measures: the case of AES,” in Topics in Cryptology–CT-RSA 2006. New York, NY, USA: ACM, 2017, pp. 347–360. [Online]. Available:
Springer, 2006, pp. 1–20. http://doi.acm.org/10.1145/3079856.3080222
[41] G. Ottoni and B. Maher, “Optimizing function placement for large-scale [59] F. Yao, M. Doroslovacki, and G. Venkataramani, “Are coherence protocol
data-center applications,” in 2017 IEEE/ACM International Symposium states vulnerable to information leakage?” in 2018 IEEE International
on Code Generation and Optimization (CGO), Feb 2017, pp. 233–244. Symposium on High Performance Computer Architecture (HPCA), Feb
[42] M. S. Papamarcos and J. H. Patel, “A low-overhead coherence solution 2018, pp. 168–179.
for multiprocessors with private cache memories,” SIGARCH Comput.
Archit. News, vol. 12, no. 3, pp. 348–354, Jan. 1984. [60] Y. Yarom and K. Falkner, “FLUSH+RELOAD: A high resolution, low
[43] A. Pardoe, “Spectre mitigations in MSVC,” https://blogs.msdn.microsoft. noise, L3 cache side-channel attack.” in USENIX Security Symposium,
com/vcblog/2018/01/15/spectre-mitigations-in-msvc/, January 2018. 2014.
[44] B. Pham, J. Veselý, G. H. Loh, and A. Bhattacharjee, “Large pages [61] X. Zhang, S. Dwarkadas, and K. Shen, “Towards practical page
and lightweight memory management in virtualized environments: coloring-based multicore cache management,” in Proceedings of the 4th
[45] M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. Steely, and J. S. Emer, ACM European Conference on Computer Systems, ser. EuroSys ’09.
“Adaptive insertion policies for high performance caching,” in Proceedings New York, NY, USA: ACM, 2009, pp. 89–102. [Online]. Available:
of the 34th Annual International Symposium on Computer Architecture, http://doi.acm.org/10.1145/1519065.1519076