research-article

Open access

A Software Cache Partitioning System for Hash-Based Caches

Authors:

Alberto Scolari,

Davide Basilio Bartolini,

Marco Domenico SantambrogioAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 13, Issue 4

Article No.: 57, Pages 1 - 24

https://doi.org/10.1145/3018113

Published: 16 December 2016 Publication History

Abstract

Contention on the shared Last-Level Cache (LLC) can have a fundamental negative impact on the performance of applications executed on modern multicores. An interesting software approach to address LLC contention issues is based on page coloring, which is a software technique that attempts to achieve performance isolation by partitioning a shared cache through careful memory management. The key assumption of traditional page coloring is that the cache is physically addressed. However, recent multicore architectures (e.g., Intel Sandy Bridge and later) switched from a physical addressing scheme to a more complex scheme that involves a hash function. Traditional page coloring is ineffective on these recent architectures.

In this article, we extend page coloring to work on these recent architectures by proposing a mechanism able to handle their hash-based LLC addressing scheme. Just as for traditional page coloring, the goal of this new mechanism is to deliver performance isolation by avoiding contention on the LLC, thus enabling predictable performance. We implement this mechanism in the Linux kernel, and evaluate it using several benchmarks from the SPEC CPU2006 and PARSEC 3.0 suites. Our results show that our solution is able to deliver performance isolation to concurrently running applications by enforcing partitioning of a Sandy Bridge LLC, which traditional page coloring techniques are not able to handle.

References

[1]

Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). ACM, New York, NY, 72--81.

Digital Library

[2]

Brian K. Bray, William L. Lunch, and Michael J. Flynn. 1990. Page Allocation to Reduce Access Time of Physical Caches. Technical Report. Stanford, CA, USA.

Digital Library

[3]

Jacob Brock, Chencheng Ye, Chen Ding, Yechen Li, Xiaolin Wang, and Yingwei Luo. 2015. Optimal cache partition-sharing. In Proceedings of the 44th International Conference on Parallel Processing (ICPP’15). IEEE Computer Society, Washington, DC, 749--758.

Digital Library

[4]

Cavium. 2004. Octeon processors family by Cavium Networks. Retrieved December 2, 2016 from http://www.cavium.com/newsevents_octeon_cavium.html.

[5]

Henry Cook, Miquel Moreto, Sarah Bird, Khanh Dao, David A. Patterson, and Krste Asanovic. 2013. A hardware evaluation of cache partitioning to improve utilization and energy efficiency while preserving responsiveness. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). ACM, New York, NY, 308--319.

Digital Library

[6]

Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware scheduling for heterogeneous datacenters. ACM SIGARCH Computer Architecture News 41, 1, 77--88.

Digital Library

[7]

Xiaoning Ding, Kaibo Wang, and Xiaodong Zhang. 2011. SRM-buffer: An OS buffer management technique to prevent last level cache from thrashing in multicores. In Proceedings of EuroSys.

Digital Library

[8]

Alexandra Fedorova, Sergey Blagodurov, and Sergey Zhuravlev. 2010. Managing contention for shared resources on multicore processors. Communications of the ACM 53, 2, 49--57.

Digital Library

[9]

S. Gupta and H. Zhou. 2015. Spatial locality-aware cache partitioning for effective cache sharing. In 2015 44th International Conference on Parallel Processing. 150--159.

Digital Library

[10]

John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Computer Architecture News 34, 4, 1--17.

Digital Library

[11]

R. Hund, C. Willems, and T. Holz. 2013. Practical timing side channel attacks against kernel space ASLR. In IEEE Symposium on Security and Privacy (SP’13). 191--205.

Digital Library

[12]

Intel Corp. 2015. Improving Real-Time Performance by Utilizing Cache Allocation Technology. Technical Report. Retrieved December 2, 2016 from http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/cache-allocation-technology-white-paper.pdf.

[13]

Gorka Irazoqui, Thomas Eisenbarth, and Berk Sunar. 2015. Systematic Reverse Engineering of Cache Slice Selection in Intel Processors. Cryptology ePrint Archive, Report 2015/690. Retrieved December 2, 2016 from http://eprint.iacr.org/.

[14]

Aamer Jaleel, Kevin B. Theobald, Simon C. Steely, Jr., and Joel Emer. 2010. High performance cache replacement using re-reference interval prediction (RRIP). In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10). ACM, New York, NY, 60--71.

Digital Library

[15]

Xinxin Jin, Haogang Chen, Xiaolin Wang, Zhenlin Wang, Xiang Wen, Yingwei Luo, and Xiaoming Li. 2009. A simple cache partitioning approach in a virtualized environment. In Proceedings of ISPA.

[16]

S. Khan, A. R. Alameldeen, C. Wilkerson, O. Mutlu, and D. A. Jimenezz. 2014. Improving cache performance using read-write partitioning. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA’14). 452--463.

[17]

M. Kharbutli, M. Jarrah, and Y. Jararweh. 2013. SCIP: Selective cache insertion and bypassing to improve the performance of last-level caches. In IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT’13). 1--6.

[18]

Hyoseung Kim, Arvind Kandhalu, and Ragunathan (Raj) Rajkumar. 2013. A coordinated approach for practical OS-level cache management in multi-core real-time systems. In Proceedings of the 25th Euromicro Conference on Real-Time Systems (ECRTS’13). IEEE Computer Society, Washington, DC, 80--89.

Digital Library

[19]

JongWon Kim, Jinkyu Jeong, Hwanju Kim, and Joonwon Lee. 2011. Explicit non-reusable page cache management to minimize last level cache pollution. In Proceedings of ICCIT.

[20]

Kenneth C. Knowlton. 1965. A fast storage allocator. Commun. ACM 8, 10 (Oct. 1965), 623--624.

Digital Library

[21]

Oded Lempel. 2011. 2nd Generation Intel Core Processor Family: Intel Core i7, i5 and i3. Retrieved December 2, 2016 from http://www.hotchips.org/wp-content/uploads/hc_archives/hc23/HC23.19.9-Desktop-CPUs/HC23.19.911-Sandy-Bridge-Lempel-Intel-Rev%207.pdf.

[22]

Lingda Li, Dong Tong, Zichao Xie, Junlin Lu, and Xu Cheng. 2012. Optimal bypass monitor for high performance last-level caches. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). ACM, New York, NY, 315--324.

Digital Library

[23]

Xiaofei Liao, Rentong Guo, Danping Yu, Hai Jin, and Li Lin. 2014. A phase behavior aware dynamic cache partitioning scheme for CMPs. International Journal of Parallel Programming 1--19.

Digital Library

[24]

Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang, and P. Sadayappan. 2008. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In Proceedings of HPCA.

[25]

L. Liu, Y. Li, C. Ding, H. Yang, and C. Wu. 2016. Rethinking memory management in modern operating system: Horizontal, vertical or random? IEEE Transactions on Computers 65, 6, 1921--1935.

Digital Library

[26]

Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 248--259.

Digital Library

[27]

Paul Menage. 2004. Control Group Linux documentation. Retrieved December 2, 2016 from https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt.

[28]

Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, and Joel Emer. 2007. Adaptive insertion policies for high performance caching. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07). ACM, New York, NY, 381--391.

Digital Library

[29]

Moinuddin K. Qureshi and Yale N. Patt. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of MICRO.

Digital Library

[30]

Daniel Sanchez and Christos Kozyrakis. 2010. The ZCache: Decoupling ways and associativity. In Proceedings of MICRO.

Digital Library

[31]

Daniel Sanchez and Christos Kozyrakis. 2011. Vantage: Scalable and efficient fine-grain cache partitioning. In Proceedings of ISCA.

Digital Library

[32]

A. Sandberg, A. Sembrant, E. Hagersten, and D. Black-Schaffer. 2013. Modeling performance variation due to cache sharing. In IEEE 19th International Symposium on High Performance Computer Architecture (HPCA’13). 155--166.

Digital Library

[33]

Vivek Seshadri, Onur Mutlu, Michael A. Kozuch, and Todd C. Mowry. 2012. The evicted-address filter: A unified mechanism to address both cache pollution and thrashing. In Proceedings of PACT.

Digital Library

[34]

Akbar Sharifi, Shekhar Srikantaiah, Mahmut Kandemir, and Mary Jane Irwin. 2012. Courteous cache sharing: Being nice to others in capacity management. In Proceedings of the 49th Annual Design Automation Conference.

Digital Library

[35]

Livio Soares, David Tam, and Michael Stumm. 2008. Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 258--269.

Digital Library

[36]

David Tam, Reza Azimi, Livio Soares, and Michael Stumm. 2007. Managing shared L2 caches on multicore systems in software. In Proceedings of WIOSCA.

[37]

Ruisheng Wang and Lizhong Chen. 2014. Futility scaling: High-associativity cache partitioning. In 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’14). IEEE, 356--367.

Digital Library

[38]

Xiaolin Wang, Xiang Wen, Yechen Li, Yingwei Luo, Xiaoming Li, and Zhenlin Wang. 2012. A dynamic cache partitioning mechanism under virtualization environment. In Trust, Security and Privacy in Computing and Communications (TrustCom’12). IEEE, 1907--1911.

Digital Library

[39]

Zhipeng Wei, Zehan Cui, and Mingyu Chen. 2015. Cracking Intel Sandy Bridge’s cache hash function. arXiv preprint arXiv:1508.03767.

[40]

Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. 2013. Bubble-flux: Precise online QoS management for increased utilization in warehouse scale computers. In ACM SIGARCH Computer Architecture News, Vol. 41. ACM, 607--618.

Digital Library

[41]

Ying Ye, Richard West, Zhuoqun Cheng, and Ye Li. 2014. COLORIS: A dynamic cache partitioning system using page coloring. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT’14). ACM, New York, NY, 381--392.

Digital Library

[42]

Xiao Zhang, Sandhya Dwarkadas, and Kai Shen. 2009. Towards practical page coloring-based multi-core cache management. In Proceedings of EuroSys.

Digital Library

Cited By

Lozano JGarcia-Saavedra ALi XPerez X(2024)AIRIC: Orchestration of Virtualized Radio Access Networks With Noisy NeighboursIEEE Journal on Selected Areas in Communications10.1109/JSAC.2023.333974942:2(432-445)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1109/JSAC.2023.3339749
Lugo TLozano SFernandez JCarretero J(2022)A Survey of Techniques for Reducing Interference in Real-Time Applications on Multicore PlatformsIEEE Access10.1109/ACCESS.2022.315189110(21853-21882)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3151891
Shahrad MElnikety SBianchini R(2021)Provisioning Differentiated Last-Level Cache Allocations to VMs in Public CloudsProceedings of the ACM Symposium on Cloud Computing10.1145/3472883.3487006(319-334)Online publication date: 1-Nov-2021
https://dl.acm.org/doi/10.1145/3472883.3487006
Show More Cited By

Index Terms

A Software Cache Partitioning System for Hash-Based Caches
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multicore architectures
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Main memory

Recommendations

Temporal-based multilevel correlating inclusive cache replacement

Inclusive caches have been widely used in Chip Multiprocessors (CMPs) to simplify cache coherence. However, they have poor performance compared with noninclusive caches not only because of the limited capacity of the entire cache hierarchy but also due ...
Combining recency of information with selective random and a victim cache in last-level caches

Memory latency has become an important performance bottleneck in current microprocessors. This problem aggravates as the number of cores sharing the same memory controller increases. To palliate this problem, a common solution is to implement cache ...
MRU-Tour-based Replacement Algorithms for Last-Level Caches
SBAC-PAD '11: Proceedings of the 2011 23rd International Symposium on Computer Architecture and High Performance Computing

Memory hierarchy design is a major concern in current microprocessors. Many research work focuses on the Last-Level Cache (LLC), which is designed to hide the long miss penalty of accessing to main memory. To reduce both capacity and conflict misses, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 13, Issue 4

December 2016

648 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/3012405

Editor:
Koen De Bosschere
Ghent University

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 December 2016

Accepted: 01 November 2016

Revised: 01 November 2016

Received: 01 December 2015

Published in TACO Volume 13, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
1,018
Total Downloads

Downloads (Last 12 months)149
Downloads (Last 6 weeks)27

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lozano JGarcia-Saavedra ALi XPerez X(2024)AIRIC: Orchestration of Virtualized Radio Access Networks With Noisy NeighboursIEEE Journal on Selected Areas in Communications10.1109/JSAC.2023.333974942:2(432-445)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1109/JSAC.2023.3339749
Lugo TLozano SFernandez JCarretero J(2022)A Survey of Techniques for Reducing Interference in Real-Time Applications on Multicore PlatformsIEEE Access10.1109/ACCESS.2022.315189110(21853-21882)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3151891
Shahrad MElnikety SBianchini R(2021)Provisioning Differentiated Last-Level Cache Allocations to VMs in Public CloudsProceedings of the ACM Symposium on Cloud Computing10.1145/3472883.3487006(319-334)Online publication date: 1-Nov-2021
https://dl.acm.org/doi/10.1145/3472883.3487006
Saez JCastro FFanizzi GPrieto-Matias M(2021)LFOC+: A Fair OS-level Cache-Clustering Policy for Commodity Multicore SystemsIEEE Transactions on Computers10.1109/TC.2021.3112970(1-1)Online publication date: 2021
https://doi.org/10.1109/TC.2021.3112970
Kommrusch SHorro MPouchet LRodriguez GTourino J(2021)Optimizing Coherence Traffic in Manycore Processors Using Closed-Form Caching/Home Agent MappingsIEEE Access10.1109/ACCESS.2021.30582809(28930-28945)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3058280
Park JYeom HSon Y(2020)Page Reusability-Based Cache Partitioning for Multi-Core SystemsIEEE Transactions on Computers10.1109/TC.2020.296806669:6(812-818)Online publication date: 1-Jun-2020
https://doi.org/10.1109/TC.2020.2968066
Kim NTang SOtterness NAnderson JSmith FPorter D(2020)Supporting I/O and IPC via fine-grained OS isolation for mixed-criticality real-time tasksReal-Time Systems10.1007/s11241-020-09351-256:4(349-390)Online publication date: 1-Oct-2020
https://dl.acm.org/doi/10.1007/s11241-020-09351-2
Garcia-Garcia ASaez JCastro FPrieto-Matias M(2019)LFOCProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337925(1-10)Online publication date: 5-Aug-2019
https://dl.acm.org/doi/10.1145/3337821.3337925
Farshin ARoozbeh AMaguire GKostić D(2019)Make the Most out of Last Level Cache in Intel ProcessorsProceedings of the Fourteenth EuroSys Conference 201910.1145/3302424.3303977(1-17)Online publication date: 25-Mar-2019
https://dl.acm.org/doi/10.1145/3302424.3303977
Ahn JHyun CLee DNoh SHung CPapadopoulos G(2019)Cache-aware block allocation for memory-technology storage targeted file systemsProceedings of the 34th ACM/SIGAPP Symposium on Applied Computing10.1145/3297280.3297423(1424-1431)Online publication date: 8-Apr-2019
https://dl.acm.org/doi/10.1145/3297280.3297423
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents