Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3037697.3037701acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Public Access

Kill the Program Counter: Reconstructing Program Behavior in the Processor Cache Hierarchy

Published: 04 April 2017 Publication History

Abstract

Data prefetching and cache replacement algorithms have been intensively studied in the design of high performance microprocessors. Typically, the data prefetcher operates in the private caches and does not interact with the replacement policy in the shared Last-Level Cache (LLC). Similarly, most replacement policies do not consider demand and prefetch requests as different types of requests. In particular, program counter (PC)-based replacement policies cannot learn from prefetch requests since the data prefetcher does not generate a PC value. PC-based policies can also be negatively affected by compiler optimizations. In this paper, we propose a holistic cache management technique called Kill-the-PC (KPC) that overcomes the weaknesses of traditional prefetching and replacement policy algorithms. KPC cache management has three novel contributions. First, a prefetcher which approximates the future use distance of prefetch requests based on its prediction confidence. Second, a simple replacement policy provides similar or better performance than current state-of-the-art PC-based prediction using global hysteresis. Third, KPC integrates prefetching and replacement policy into a whole system which is greater than the sum of its parts. Information from the prefetcher is used to improve the performance of the replacement policy and vice-versa. Finally, KPC removes the need to propagate the PC through entire on-chip cache hierarchy while providing a holistic cache management approach with better performance than state-of-the-art PC-, and non-PC-based schemes. Our evaluation shows that KPC provides 8% better performance than the best combination of existing prefetcher and replacement policy for multi-core workloads.

References

[1]
Standard Performance Evaluation Corporation CPU2006 Benchmark Suite. http://www.spec.org/cpu2006/.
[2]
J.-L. Baer and T.-F. Chen. An effective on-chip preloading scheme to reduce data access penalty. In Supercomputing, 1991. Supercomputing'91. Proceedings of the 1991 ACM/IEEE Conference on, pages 176--186. IEEE, 1991.
[3]
R. R. Curtin, J. R. Cline, N. P. Slagle, W. B. March, P. Ram, N. A. Mehta, and A. G. Gray. MLPACK: A scalable C++ machine learning library. Journal of Machine Learning Research, 14: 801--805, 2013.
[4]
N. D. Enright Jerger, E. L. Hill, and M. H. Lipasti. Friendly fire: understanding the effects of multiprocessor prefetches. In International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 177--188, 2006.
[5]
H. Esmaeilzadeh, E. Blem, R. S. Amant, K. Sankaralingam, and D. Burger. Dark silicon and the end of multicore scaling. In Computer Architecture (ISCA), 2011 38th Annual International Symposium on, pages 365--376. IEEE, 2011.
[6]
V. V. Fedorov, S. Qiu, A. L. Reddy, and P. V. Gratz. Ari: Adaptive llc-memory traffic management. ACM Transactions on Architecture and Code Optimization (TACO), 10 (4): 46, 2013.
[7]
M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In ACM SIGPLAN Notices, volume 47, pages 37--48. ACM, 2012.
[8]
F. M. Harper and J. A. Konstan. The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (TiiS), 5 (4): 19, 2016.
[9]
R. Hegde. Optimizing application performance on intel core microarchitecture using hardware-implemented prefetchers. Intel Software Network, 2008.
[10]
Y. Ishii, M. Inaba, and K. Hiraki. Access map pattern matching for high performance data cache prefetch. Journal of Instruction-Level Parallelism, 13: 1--24, 2011.
[11]
Y. Ishii, M. Inaba, and K. Hiraki. Unified memory optimizing architecture: memory subsystem control with a unified predictor. In Proceedings of the 26th ACM international conference on Supercomputing, pages 267--278. ACM, 2012.
[12]
A. Jain and C. Lin. Back to the future: leveraging belady's algorithm for improved cache replacement. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on, pages 78--89. IEEE, 2016.
[13]
A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely Jr, and J. Emer. Adaptive insertion policies for managing shared caches. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pages 208--219. ACM, 2008.
[14]
A. Jaleel, K. B. Theobald, S. C. Steely Jr, and J. Emer. High performance cache replacement using re-reference interval prediction (rrip). In ACM SIGARCH Computer Architecture News, volume 38, pages 60--71. ACM, 2010.
[15]
D. A. Jiménez. Insertion and promotion for tree-based pseudolru last-level caches. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pages 284--296. ACM, 2013.
[16]
D. Kadjo, J. Kim, P. Sharma, R. Panda, P. Gratz, and D. Jiménez. B-fetch: Branch prediction directed prefetching for chip-multiprocessors. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pages 623--634. IEEE Computer Society, 2014.
[17]
S. Khan, Y. Tian, and D. A. Jiménez. Sampling dead block prediction for last-level caches. In Microarchitecture (MICRO), 2010 43rd Annual IEEE/ACM International Symposium on, pages 175--186. IEEE, 2010.
[18]
S. Khan, A. R. Alameldeen, C. Wilkerson, O. Mutlu, and D. A. Jiménez. Improving cache performance by exploiting read-write disparity. In Proceedings of the 20th Internatial Symposiym on High Performance Computer Architecture (HPCA), pages 452--463. IEEE, 2014.
[19]
J. Kim, S. H. Pugsley, P. V. Gratz, A. N. Reddy, C. Wilkerson, and Z. Chishti. Path confidence based lookahead prefetching. In Microarchitecture (MICRO), 2016 49rd Annual IEEE/ACM International Symposium on. IEEE, 2016.
[20]
A.-C. Lai, C. Fide, and B. Falsafi. Dead-block prediction & dead-block correlating prefetchers. In Computer Architecture, 2001. Proceedings. 28th Annual International Symposium on, pages 144--154. IEEE, 2001.
[21]
M. Li, J. Tan, Y. Wang, L. Zhang, and V. Salapura. Sparkbench: a comprehensive benchmarking suite for in memory data analytic platform spark. In Proceedings of the 12th ACM International Conference on Computing Frontiers, page 53. ACM, 2015.
[22]
H. Liu, M. Ferdman, J. Huh, and D. Burger. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture, pages 222--233, Los Alamitos, CA, USA, 2008. IEEE Computer Society. http://doi.ieeecomputersociety.org/10.1109/MICRO.2008.4771793.
[23]
P. Michaud. A best-offset prefetcher. In High Performance Computer Architecture (HPCA), 2016 IEEE 20th International Symposium on. IEEE, 2016.
[24]
E. Perelman, G. Hamerly, M. Van Biesbrouck, T. Sherwood, and B. Calder. Using simpoint for accurate and efficient simulation. In ACM SIGMETRICS Performance Evaluation Review, volume 31, pages 318--319. ACM, 2003.
[25]
S. H. Pugsley, A. R. Alameldeen, C. Wilkerson, and H. Kim. The 2nd Data Prefetching Championship (DPC-2). http://comparch-conf.gatech.edu/dpc2/.
[26]
S. H. Pugsley, Z. Chishti, C. Wilkerson, P.-f. Chuang, R. L. Scott, A. Jaleel, S.-L. Lu, K. Chow, and R. Balasubramonian. Sandbox prefetching: Safe run-time evaluation of aggressive prefetchers. In High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on, pages 626--637. IEEE, 2014.
[27]
M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. Steely, and J. Emer. Adaptive insertion policies for high performance caching. In ACM SIGARCH Computer Architecture News, volume 35, pages 381--391. ACM, 2007.
[28]
V. Seshadri, O. Mutlu, M. A. Kozuch, and T. C. Mowry. The evicted-address filter: A unified mechanism to address both cache pollution and thrashing. In Proceedings of the 21st international conference on Parallel architectures and compilation techniques, pages 355--366. ACM, 2012.
[29]
V. Seshadri, S. Yedkar, H. Xin, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. Mitigating prefetcher-caused pollution using informed caching policies for prefetched blocks. ACM Transactions on Architecture and Code Optimization (TACO), 11 (4): 51, 2015.
[30]
M. Shevgoor, S. Koladiya, R. Balasubramonian, C. Wilkerson, S. H. Pugsley, and Z. Chishti. Efficiently prefetching complex address patterns. In Proceedings of the 48th Annual IEEE/ACM International Symposium on Microarchitecture, 2015.
[31]
S. Somogyi, T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. Spatial memory streaming. In ACM SIGARCH Computer Architecture News, volume 34, pages 252--263. IEEE Computer Society, 2006.
[32]
S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In High Performance Computer Architecture, 2007. HPCA 2007. IEEE 13th International Symposium on, pages 63--74. IEEE, 2007.
[33]
E. Teran, Y. Tian, Z. Wang, and D. A. Jiménez. Minimal disturbance placement and promotion. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 201--211. IEEE, 2016
[34]
E. Teran, Z. Wang, and D. A. Jiménez. Perceptron learning for reuse prediction. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on, pages 1--12. IEEE, 2016\natexlabb.
[35]
J.-Y. Won, P. Gratz, S. Shakkottai, and J. Hu. Having your cake and eating it too: Energy savings without performance loss through resource sharing driven power management. In Low Power Electronics and Design (ISLPED), 2015 IEEE/ACM International Symposium on, pages 255--260. IEEE, 2015.
[36]
C.-J. Wu, A. Jaleel, W. Hasenplaugh, M. Martonosi, S. C. Steely Jr, and J. Emer. Ship: Signature-based hit predictor for high performance caching. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, pages 430--441. ACM, 2011
[37]
C.-J. Wu, A. Jaleel, M. Martonosi, S. C. Steely Jr, and J. Emer. Pacman: prefetch-aware cache management for high performance caching. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, pages 442--453. ACM, 2011\natexlabb.
[38]
W. A. Wulf and S. A. McKee. Hitting the memory wall: implications of the obvious. SIGARCH Comp. Arch. News, 23: 20--24, March 1995. ISSN 0163--5964.

Cited By

View all
  • (2024)Camouflage: Utility-Aware Obfuscation for Accurate Simulation of Sensitive Program TracesACM Transactions on Architecture and Code Optimization10.1145/365011021:2(1-23)Online publication date: 21-May-2024
  • (2024)Hercules: Enabling Atomic Durability for Persistent Memory with Transient Persistence DomainACM Transactions on Embedded Computing Systems10.1145/360747323:6(1-34)Online publication date: 11-Sep-2024
  • (2023)Last-Level Cache Insertion and Promotion Policy in the Presence of Aggressive PrefetchingIEEE Computer Architecture Letters10.1109/LCA.2023.324217822:1(17-20)Online publication date: Jan-2023
  • Show More Cited By
  1. Kill the Program Counter: Reconstructing Program Behavior in the Processor Cache Hierarchy

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
    April 2017
    856 pages
    ISBN:9781450344654
    DOI:10.1145/3037697
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 April 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cache replacemet policy
    2. data prefetching
    3. memory hierarchy

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ASPLOS '17

    Acceptance Rates

    ASPLOS '17 Paper Acceptance Rate 53 of 320 submissions, 17%;
    Overall Acceptance Rate 535 of 2,713 submissions, 20%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)418
    • Downloads (Last 6 weeks)28
    Reflects downloads up to 25 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Camouflage: Utility-Aware Obfuscation for Accurate Simulation of Sensitive Program TracesACM Transactions on Architecture and Code Optimization10.1145/365011021:2(1-23)Online publication date: 21-May-2024
    • (2024)Hercules: Enabling Atomic Durability for Persistent Memory with Transient Persistence DomainACM Transactions on Embedded Computing Systems10.1145/360747323:6(1-34)Online publication date: 11-Sep-2024
    • (2023)Last-Level Cache Insertion and Promotion Policy in the Presence of Aggressive PrefetchingIEEE Computer Architecture Letters10.1109/LCA.2023.324217822:1(17-20)Online publication date: Jan-2023
    • (2022)Preserving Addressability Upon GC-Triggered Data Movements on Non-Volatile MemoryACM Transactions on Architecture and Code Optimization10.1145/351170619:2(1-26)Online publication date: 24-Mar-2022
    • (2022)Dynamic Set Stealing to Improve Cache Performance2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD55451.2022.00017(60-70)Online publication date: Nov-2022
    • (2022)Berti: an Accurate Local-Delta Data Prefetcher2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO56248.2022.00072(975-991)Online publication date: Oct-2022
    • (2020)Boosting Store Buffer Efficiency with Store-Prefetch Bursts2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00054(568-580)Online publication date: Oct-2020
    • (2020)Bouquet of instruction pointersProceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture10.1109/ISCA45697.2020.00021(118-131)Online publication date: 30-May-2020
    • (2019)Cache Replacement PoliciesSynthesis Lectures on Computer Architecture10.2200/S00922ED1V01Y201905CAC04714:1(1-87)Online publication date: 17-Jun-2019
    • (2019)The impact of cache inclusion policies on cache management techniquesProceedings of the International Symposium on Memory Systems10.1145/3357526.3357547(428-438)Online publication date: 30-Sep-2019
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media