Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2749469.2749473acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Semantic locality and context-based prefetching using reinforcement learning

Published: 13 June 2015 Publication History

Abstract

Most modern memory prefetchers rely on spatio-temporal locality to predict the memory addresses likely to be accessed by a program in the near future. Emerging workloads, however, make increasing use of irregular data structures, and thus exhibit a lower degree of spatial locality. This makes them less amenable to spatio-temporal prefetchers.
In this paper, we introduce the concept of Semantic Locality, which uses inherent program semantics to characterize access relations. We show how, in principle, semantic locality can capture the relationship between data elements in a manner agnostic to the actual data layout, and we argue that semantic locality transcends spatio-temporal concerns.
We further introduce the context-based memory prefetcher, which approximates semantic locality using reinforcement learning. The prefetcher identifies access patterns by applying reinforcement learning methods over machine and code attributes, that provide hints on memory access semantics.
We test our prefetcher on a variety of benchmarks that employ both regular and irregular patterns. For the SPEC 2006 suite, it delivers speedups as high as 2.8X (20% on average) over a baseline with no prefetching, and outperforms leading spatio-temporal prefetchers. Finally, we show that the context-based prefetcher makes it possible for naive, pointer-based implementations of irregular algorithms to achieve performance comparable to that of spatially optimized code.

References

[1]
H. Al-Sukhni, I. Bratt, and D. Connors, "Compiler-directed content-aware prefetching for dynamic data structures," in Intl. Conf. on Parallel Arch. and Compilation Techniques (PACT), Sep 2003.
[2]
P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, "The non-stochastic multiarmed bandit problem," SIAM Journal on Computing, vol. 32, no. 1, pp. 48--77, 2002.
[3]
D. A. Bader and K. Madduri, "Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors," in Intl. Conf. on High Performance Computing (HiPC), Dec 2005. Available: http://dx.doi.org/10.1007/11602569_48
[4]
M. Bekerman, S. Jourdan, R. Ronen, G. Kirshenboim, L. Rappoport, A. Yoaz, and U. Weiser, "Correlated load-address predictors," in Intl. Symp. on Computer Architecture (ISCA), May 1999. Available: http://dx.doi.org/10.1145/300979.300984
[5]
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The Gem5 simulator," Computer Architecture News, vol. 39, no. 2, pp. 1--7, Aug. 2011. Available: http://doi.acm.org/10.1145/2024716.2024718
[6]
S. Chien and N. Immorlica, "Semantic similarity between search engine queries using temporal correlation," in Intl. Conf. on on World Wide Web, May 2005. Available: http://doi.acm.org/10.1145/1060745.1060752
[7]
Y. Etsion and D. G. Feitelson, "L1 cache filtering through random selection of memory references," in Intl. Conf. on Parallel Arch. and Compilation Techniques (PACT), Sep 2007.
[8]
B. Falsafi and T. F. Wenisch, "A primer on hardware prefetching," Synthesis Lectures on Computer Architecture, vol. 9, no. 1, 2014.
[9]
J. W. C. Fu, J. H. Patel, and B. L. Janssens, "Stride directed prefetching in scalar processors," in Intl. Symp. on Microarchitecture (MICRO), Dec 1992. Available: http://dl.acm.org/citation.cfm?id=144953.145006
[10]
E. Ipek, O. Mutlu, J. Martínez, and R. Caruana, "Self-optimizing memory controllers: A reinforcement learning approach," in Intl. Symp. on Computer Architecture (ISCA), Jun 2008.
[11]
A. Jain and C. Lin, "Linearizing irregular memory accesses for improved correlated prefetching," in Intl. Symp. on Microarchitecture (MICRO), Dec 2013.
[12]
A. Jaleel, "Memory characterization of workloads using instrumentation-driven simulation," Intel Corporation, VSSAD, 2010. Available: http://www.jaleels.org/ajaleel/workload/
[13]
D. A. Jiménez and C. Lin, "Dynamic branch prediction with perceptrons," in Symp. on High-Performance Computer Architecture (HPCA), Jan 2001. Available: http://dl.acm.org/citation.cfm?id=580550.876441
[14]
D. Joseph and D. Grunwald, "Prefetching using markov predictors," in Intl. Symp. on Computer Architecture (ISCA), Jun 1997. Available: http://doi.acm.org/10.1145/264107.264207
[15]
M. N. Katehakis and A. F. Veinott Jr, "The multi-armed bandit problem: decomposition and computation," Mathematics of Operations Research, vol. 12, no. 2, pp. 262--268, 1987.
[16]
C. Lattner and V. Adve, "LLVM: a compilation framework for lifelong program analysis transformation," in Intl. Symp. on Code Generation and Optimization (CGO), Mar 2004.
[17]
L. Li, W. Chu, J. Langford, and R. E. Schapire, "A contextual-bandit approach to personalized news article recommendation," in Intl. Conf. on on World Wide Web, Apr 2010. Available: http://doi.acm.org/10.1145/1772690.1772758
[18]
J. Mukundan and J. Martínez, "Morse: Multi-objective reconfigurable self-optimizing memory scheduler," in Symp. on High-Performance Computer Architecture (HPCA), Feb 2012.
[19]
R. C. Murphy, K. B. Wheeler, B. W. Barrett, and J. A. Ang, Introducing the Graph 500, Cray Users Group (CUG), May 2010.
[20]
K. J. Nesbit and J. E. Smith, "Data cache prefetching using a global history buffer," in Symp. on High-Performance Computer Architecture (HPCA), Feb 2004. Available: http://dx.doi.org/10.1109/HPCA.2004.10030
[21]
M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. Steely, and J. Emer, "Adaptive insertion policies for high performance caching," in Intl. Symp. on Computer Architecture (ISCA), Jun 2007.
[22]
M. K. Qureshi, D. N. Lynch, O. Mutlu, and Y. N. Patt, "A case for MLP-aware cache replacement," in Intl. Symp. on Computer Architecture (ISCA), Jun 2006.
[23]
A. Roth, A. Moshovos, and G. S. Sohi, "Dependence based prefetching for linked data structures," in Intl. Conf. on Arch. Support for Programming Languages & Operating Systems (ASPLOS), Oct 1998. Available: http://doi.acm.org/10.1145/291069.291034
[24]
A. Roth and G. S. Sohi, "Effective jump-pointer prefetching for linked data structures," in Intl. Symp. on Computer Architecture (ISCA), May 1999. Available: http://dx.doi.org/10.1145/300979.300989
[25]
J. Shun, G. E. Blelloch, J. T. Fineman, P. B. Gibbons, A. Kyrola, H. V. Simhadri, and K. Tangwongsan, "Brief announcement: the problem based benchmark suite," in Symp. on Parallel Alg. and Arch. (SPAA), Jun 2012.
[26]
S. Somogyi, T. F. Wenisch, A. Ailamaki, and B. Falsafi, "Spatio-temporal memory streaming," in Intl. Symp. on Computer Architecture (ISCA), Jun 2009. Available: http://doi.acm.org/10.1145/1555754.1555766
[27]
S. Somogyi, T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos, "Spatial memory streaming," in Intl. Symp. on Computer Architecture (ISCA), Jun 2006. Available: http://dx.doi.org/10.1109/ISCA.2006.38
[28]
Standard Performance Evaluation Corporation, "SPEC2006," http://www.spec.org.
[29]
M. Tokic, "Adaptive ε-greedy exploration in reinforcement learning based on value differences," in KI 2010: Advances in Artificial Intelligence, ser. Lecture Notes in Computer Science, R. Dillmann, J. Beyerer, U. Hanebeck, and T. Schultz, Eds. Springer Berlin Heidelberg, 2010, vol. 6359, pp. 203--210. Available: http://dx.doi.org/10.1007/978-3-642-16111-7_23
[30]
Z. Wang, D. Burger, K. S. McKinley, S. K. Reinhardt, and C. C. Weems, "Guided region prefetching: A cooperative hardware/software approach," in Intl. Symp. on Computer Architecture (ISCA), Jun 2003. Available: http://doi.acm.org/10.1145/859618.859663
[31]
C. J. C. H. Watkins, "Learning from delayed rewards." Ph.D. dissertation, King's College, Cambridge, May 1989.

Cited By

View all
  • (2024)Camouflage: Utility-Aware Obfuscation for Accurate Simulation of Sensitive Program TracesACM Transactions on Architecture and Code Optimization10.1145/365011021:2(1-23)Online publication date: 21-May-2024
  • (2024)PDG: A Prefetcher for Dynamic Graph UpdatingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333588043:4(1246-1259)Online publication date: Apr-2024
  • (2024)Practical Online Reinforcement Learning for Microprocessors With Micro-Armed BanditIEEE Micro10.1109/MM.2024.340871944:4(80-87)Online publication date: Jul-2024
  • Show More Cited By

Index Terms

  1. Semantic locality and context-based prefetching using reinforcement learning

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture
        June 2015
        768 pages
        ISBN:9781450334020
        DOI:10.1145/2749469
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 13 June 2015

        Permissions

        Request permissions for this article.

        Check for updates

        Qualifiers

        • Research-article

        Funding Sources

        • Israel Science Foundation (ISF)
        • Intel Collaborative Research Institute for Computational Intelligence (ICRI-CI)
        • European Commission

        Conference

        ISCA '15
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 543 of 3,203 submissions, 17%

        Upcoming Conference

        ISCA '25

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)308
        • Downloads (Last 6 weeks)24
        Reflects downloads up to 30 Aug 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Camouflage: Utility-Aware Obfuscation for Accurate Simulation of Sensitive Program TracesACM Transactions on Architecture and Code Optimization10.1145/365011021:2(1-23)Online publication date: 21-May-2024
        • (2024)PDG: A Prefetcher for Dynamic Graph UpdatingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333588043:4(1246-1259)Online publication date: Apr-2024
        • (2024)Practical Online Reinforcement Learning for Microprocessors With Micro-Armed BanditIEEE Micro10.1109/MM.2024.340871944:4(80-87)Online publication date: Jul-2024
        • (2024)A New Formulation of Neural Data Prefetching2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00088(1173-1187)Online publication date: 29-Jun-2024
        • (2024)Enabling Large Dynamic Neural Network Training with Learning-based Memory Management2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00066(788-802)Online publication date: 2-Mar-2024
        • (2024)Differential-Matching Prefetcher for Indirect Memory Access2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00040(439-453)Online publication date: 2-Mar-2024
        • (2024)Caching and Prefetching for Improving ORAM Performance2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)10.1109/DSN-W60302.2024.00016(17-20)Online publication date: 24-Jun-2024
        • (2024)PatternS: An intelligent hybrid memory scheduler driven by page pattern recognitionJournal of Systems Architecture10.1016/j.sysarc.2024.103178153(103178)Online publication date: Aug-2024
        • (2024)RL-CoPref: a reinforcement learning-based coordinated prefetching controller for multiple prefetchersThe Journal of Supercomputing10.1007/s11227-024-05938-9Online publication date: 27-Feb-2024
        • (2023)CluMP: Clustered Markov Chain for Storage I/O PrefetchElectronics10.3390/electronics1215329312:15(3293)Online publication date: 31-Jul-2023
        • Show More Cited By

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media