Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2184751.2184803acmconferencesArticle/Chapter ViewAbstractPublication PagesicuimcConference Proceedingsconference-collections
research-article

Code-based cache partitioning for improving hardware cache performance

Published: 20 February 2012 Publication History

Abstract

Recently, improving hardware cache performance is getting more important, because the performance gap between processor and memory has caused "memory wall" problem. Most cache designs are based on the LRU replacement policy which is effective for high-locality workloads. However, it is ineffective for the workloads that have a working set greater than available cache size or weak-memory access patterns. To make up for the weakness of LRU policy, we introduce a novel code-based cache partitioning mechanism which does not require any hardware support. In our mechanism, we first collect profile data using binary instrumentation, and then classify the characteristic of code region through the collected code profiles. Finally, while the application is running, page coloring technique is used for code-based cache partitioning. To show effectiveness of our mechanism, we implemented our mechanism in the Linux kernel. Experiments on the workloads including weak-memory access pattern show that the proposed mechanism achieves performance improvement by up to 7.3% and the last-level cache miss reduction by up to 37.8%.

References

[1]
W. A. Wuld and S. A. McKee. Hitting the memory wall: Implications of the obvious. ACM Sigarch Computer Architecture News, 23(1):20--24, 1995.
[2]
A. González, C. Aliagas, and M. Valero. A data cache with multiple caching strategies tuned to different types of locality. In International Conference in Supercomputing (ICS), pages 338--347, 1995.
[3]
W. A. Wong and J.-L. Baer. Modified LRU policies for improving second-level cache behavior. In 6th International Symposium on High-Performance Computing Architecture (HPCA), pages 49--60, 2000.
[4]
R. Subramanian, Y. Smaragdakis, and G. H. Loh. Adaptive caches: effective shaping of cache behavior to workloads. In 39th International Symposium on Microarchitecture (MICRO'06), pages 385--396, 2006.
[5]
M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. Steely, and J. Emer. Adaptive insertion policies for high performance caching. In 34th International Symposium on Computer Architecture (ISCA'07), pages 381--391, 2007.
[6]
C.-H. Chi and H. Dietz. Improving cache performance by selective cache bypass. In Twenty-Second Annual Hawaii International Conference on System Sciences, Architecture Track, pages 277--285, 1989.
[7]
M. Kharbutli and Y. Solihin. Counter-based cache replacement and bypassing algorithms. IEEE Transactions on Computers, 57(4):433--447, 2008.
[8]
T. L. Johnson, D. A. Connors, M. C. Merten, and W. mei W. Hwu. Run-time cache bypassing. IEEE Transactions on Computers, 48(12):1338--1354, 1999.
[9]
H. Dybdahl and P. Stenström. Enhancing last-level cache performance by block bypassing and early miss determination. In Asia-Pacific Computer Systems Architecture Conference, pages 52--66, 2006.
[10]
L. Soares, D. Tam, and M. Stumm. Reducing the harmful effects of last-level cache polluters with an os-level, software-only pollute buffer. In 41st International Symposium on Microarchitecture (MICRO'08), pages 258--269, 2008.
[11]
Q. Lu, J. Lin, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Soft-OLP: improving hardware cache performance through software-controlled object-level partitioning. In 18th International Conference on Parallel Architectures and Compilation Techniques, pages 246--257, 2009.
[12]
N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI'07), pages 89--100, 2007.
[13]
The Valgrind Developers. Valgrind. http://www.valgrind.org/.
[14]
G. Taylor, P. Davies, and M. Farmwald. The TLB slice--a low-cost high-speed address translation mechanism. In 17th International Symposium on Computer Architecture (ISCA'90), pages 355--363, 1990.
[15]
R. E. Kessler and M. D. Hill. Page placement algorithms for large real-indexed caches. ACM Transactions on Computer Systems, 10(4):338--359, 1992.
[16]
Gian-Paolo D. Musumeci and Mike Loukides. System performance tuning. O'REILLY, 2nd Edition, 2002.
[17]
E. W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik, 1(1):269--271, 1959.
[18]
D. Tam, R. Azimi, L. Soares, and M. Stumm. Managing shared L2 caches on multicore systems in software. In Workshop on the Interaction between Operating Systems and Computer Architecture (WIOSCA), 2007.
[19]
J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multi-core cache partitioning: bridging the gap between simulation and real systems. In 14th International Symposium on High-Performance Computing Architecture (HPCA), pages 367--378, 2008.
[20]
X. Zhang, S. Dwarkadas, and K. Shen. Towards practical page coloring-based multi-core cache management. In 4th ACM European Conference on Computer Systems (EuroSys'09), pages 89--102, 2009.
[21]
X. Jin, H. Chen, X. Wang, Z. Wang, X. Wen, Y. Luo, and X. Li. A simple cache partitioning approach in a virtualized environment. In 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, pages 519--524, 2009.

Cited By

View all
  • (2022)Edge-RT: OS Support for Controlled Latency in the Multi-Tenant, Real-Time Edge2022 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS55097.2022.00011(1-13)Online publication date: Dec-2022
  • (2019)Resource Centric Characterization and Classification of Applications Using KMeans for Multicores2019 International Conference on Information Networking (ICOIN)10.1109/ICOIN.2019.8717981(25-30)Online publication date: Jan-2019
  • (2019)Coordination and Synchronization in Multiagent System Based on Tilman Model of Resource Sharing2019 International Conference on Advances in Computing, Communication and Control (ICAC3)10.1109/ICAC347590.2019.9036776(1-6)Online publication date: Dec-2019
  • Show More Cited By

Index Terms

  1. Code-based cache partitioning for improving hardware cache performance

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ICUIMC '12: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
      February 2012
      852 pages
      ISBN:9781450311724
      DOI:10.1145/2184751
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 February 2012

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. cache partitioning
      2. cache performance
      3. page coloring
      4. shared cache management

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      ICUIMC '12
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 251 of 941 submissions, 27%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)8
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 09 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Edge-RT: OS Support for Controlled Latency in the Multi-Tenant, Real-Time Edge2022 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS55097.2022.00011(1-13)Online publication date: Dec-2022
      • (2019)Resource Centric Characterization and Classification of Applications Using KMeans for Multicores2019 International Conference on Information Networking (ICOIN)10.1109/ICOIN.2019.8717981(25-30)Online publication date: Jan-2019
      • (2019)Coordination and Synchronization in Multiagent System Based on Tilman Model of Resource Sharing2019 International Conference on Advances in Computing, Communication and Control (ICAC3)10.1109/ICAC347590.2019.9036776(1-6)Online publication date: Dec-2019
      • (2014)A Survey on Recent Hardware and Software-Level Cache Management TechniquesProceedings of the 2014 IEEE International Symposium on Parallel and Distributed Processing with Applications10.1109/ISPA.2014.41(242-247)Online publication date: 26-Aug-2014
      • (2014)Cache design for mixed criticality real-time systems2014 IEEE 32nd International Conference on Computer Design (ICCD)10.1109/ICCD.2014.6974730(513-516)Online publication date: Oct-2014
      • (2013)Real-time cache management framework for multi-core architecturesProceedings of the 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS.2013.6531078(45-54)Online publication date: 9-Apr-2013

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media