research-article

Public Access

A hierarchical neural model of data prefetching

Authors:

Parthasarathy Ranganathan,

Calvin LinAuthors Info & Claims

ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 861 - 873

https://doi.org/10.1145/3445814.3446752

Published: 17 April 2021 Publication History

Abstract

This paper presents Voyager, a novel neural network for data prefetching. Unlike previous neural models for prefetching, which are limited to learning delta correlations, our model can also learn address correlations, which are important for prefetching irregular sequences of memory accesses. The key to our solution is its hierarchical structure that separates addresses into pages and offsets and that introduces a mechanism for learning important relations among pages and offsets. Voyager provides significant prediction benefits over current data prefetchers. For a set of irregular programs from the SPEC 2006 and GAP benchmark suites, Voyager sees an average IPC improvement of 41.6% over a system with no prefetcher, compared with 21.7% and 28.2%, respectively, for idealized Domino and ISB prefetchers. We also find that for two commercial workloads for which current data prefetchers see very little benefit, Voyager dramatically improves both accuracy and coverage. At present, slow training and prediction preclude neural models from being practically used in hardware, but Voyager’s overheads are significantly lower—in every dimension—than those of previous neural models. For example, computation cost is reduced by 15- 20×, and storage overhead is reduced by 110-200×. Thus, Voyager represents a significant step towards a practical neural prefetcher.

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jefrey Dean, Matthieu Devin, Sanjay Ghemawat, Geofrey Irving, Michael Isard, et al. Tensorflow: a system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 265-283, 2016.

[2]

Jean-Loup Baer and Tien-Fu Chen. Efective hardware-based data prefetching for high-performance processors. IEEE Transactions on Computers, 44 ( 5 ): 609-623, May 1995.

Digital Library

[3]

Mohammad Bakhshalipour, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. Domino temporal data prefetcher. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 131-142, 2018.

[4]

Mohammad Bakhshalipour, Mehran Shakerinava, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. Bingo spatial data prefetcher. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 399-411, 2019.

[5]

Scott Beamer, Krste Asanovi?, and David Patterson. The GAP benchmark suite. arXiv preprint arXiv:1508.03619, 2015.

[6]

Derek Bruening, Timothy Garnett, and Saman Amarasinghe. An infrastructure for adaptive dynamic optimization. In International Symposium on Code Generation and Optimization, 2003. CGO 2003., pages 265-275. IEEE, 2003.

[7]

Doug Burger, Thomas R. Puzak, Wei-Fen Lin, and Steven K. Reinhardt. Filtering superfluous prefetches using density vectors. In Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors (ICCD), pages 124-133, 2001.

[8]

Chi F. Chen, Se-Hyun Yang, Babak Falsafi, and Andreas Moshovos. Accurate and complexity-efective spatial pattern prediction. In Proceedings of the 10th International Symposium on High Performance Computer Architecture (HPCA), pages 276-288, 2004.

Digital Library

[9]

Trishul M. Chilimbi. Eficient representations and abstractions for quantifying and exploiting data reference locality. In SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 191-202, 2001.

[10]

Yuan Chou. Low-cost epoch-based correlation prefetching for commercial applications. In Proceedings of the 40th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO), pages 301-313, 2007.

Digital Library

[11]

Keith I. Farkas, Paul Chow, Norman P. Jouppi, and Zvonko Vranesic. Memorysystem design considerations for dynamically-scheduled processors. In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA), pages 133-143, 1997.

Digital Library

[12]

Greg Hamerly, Erez Perelman, Jeremy Lau, and Brad Calder. Simpoint 3. 0: Faster and more flexible program phase analysis. Journal of Instruction Level Parallelism, 7 ( 4 ): 1-28, 2005.

[13]

Milad Hashemi, Kevin Swersky, Jamie A Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, and Parthasarathy Ranganathan. Learning memory access patterns. arXiv preprint arXiv:1803.02329, 2018.

[14]

Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9 ( 8 ): 1735-1780, 1997.

[15]

Zhigang Hu, Margaret Martonosi, and Stefanos Kaxiras. TCP: tag correlating prefetchers. In International Symposium on, High Performance Computer Architecture (HPCA), pages 317-326, 2003.

[16]

Ibrahim Hur and Calvin Lin. Memory prefetching using adaptive stream detection. In Proceedings of the 39th International Symposium on Microarchitecture (MICRO), pages 397-408, 2006.

Digital Library

[17]

Yasuo Ishii, Mary Inaba, and Kei Hiraki. Access map pattern matching for high performance data cache prefetch. Journal of Instruction-Level Parallelism, 13 : 1-24, 2011.

[18]

Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geofrey E. Hinton. Adaptive mixtures of local experts. Neural computation, 3 ( 1 ): 79-87, 1991.

[19]

Akanksha Jain and Calvin Lin. Linearizing irregular memory accesses for improved correlated prefetching. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 247-259, 2013.

Digital Library

[20]

Akanksha Jain and Calvin Lin. Back to the future: Leveraging belady's algorithm for improved cache replacement. In Proceedings of the International Symposium on Computer Architecture (ISCA), June 2016.

Digital Library

[21]

Aamer Jaleel, Robert S Cohn, Chi-Keung Luk, and Bruce Jacob. Cmp$im: A Pin-based on-the-fly multi-core cache simulator. In Proceedings of the Fourth Annual Workshop on Modeling, Benchmarking and Simulation (MoBS), co-located with ISCA, pages 28-36, 2008.

[22]

Daniel A Jiménez. Multiperspective perceptron predictor. In The Journal of Instruction-Level Parallelism 5th JILP Workshop on Computer Architecture Competitions (JWAC-5), Championship Branch Prediction, (co-located with ISCA 2016 ), 2016.

[23]

Daniel A Jiménez and Calvin Lin. Dynamic branch prediction with perceptrons. In Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA), pages 197-206, 2001.

[24]

Daniel A Jiménez and Elvira Teran. Multiperspective reuse prediction. In 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 436-448. IEEE, 2017.

Digital Library

[25]

Teresa L. Johnson, Matthew C. Merten, and Wen-Mei W. Hwu. Run-time spatial locality detection and optimization. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO), pages 57-64, 1997.

Digital Library

[26]

Doug Joseph and Dirk Grunwald. Prefetching using markov predictors. In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA), pages 252-263, 1997.

Digital Library

[27]

Norman P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch bufers. In International Symposium on Computer Architecture (ISCA), pages 364-373, 1990.

[28]

Samira Khan, Yingying Tian, and Daniel A Jiménez. Sampling dead block prediction for last-level caches. In 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 175-186, 2010.

Digital Library

[29]

Jinchun Kim, Seth H Pugsley, Paul V Gratz, AL Reddy, Chris Wilkerson, and Zeshan Chishti. Path confidence based lookahead prefetching. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture, page 60. IEEE Press, 2016.

[30]

Tim Kraska, Alex Beutel, Ed H. Chi, Jef Dean, and Neoklis Polyzotis. The case for learned index structures. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD), 2018.

Digital Library

[31]

Sanjeev Kumar and Christopher Wilkerson. Exploiting spatial locality in data caches using spatial footprints. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 357-368, 1998.

[32]

Pierre Michaud. Best-ofset hardware prefetching. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 469-480, 2016.

[33]

Pierre Michaud. Best-ofset hardware prefetching. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 469-480, 2016.

[34]

Jinseok Nam, Jungi Kim, Eneldo Loza Mencía, Iryna Gurevych, and Johannes Fürnkranz. Large-scale multi-label text classification-revisiting neural networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 437-452, 2014.

Digital Library

[35]

Kyle J. Nesbit, Ashutosh S. Dhodapkar, and James E. Smith. AC/DC: an adaptive data cache prefetcher. In 13th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 135-145, 2004.

[36]

Kyle J. Nesbit and James E. Smith. Data cache prefetching using a global history bufer. IEEE Micro, 25 ( 1 ): 90-97, 2005.

[37]

Subbarao Palacharla and Richard E. Kessler. Evaluating stream bufers as a secondary cache replacement. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 24-33, April 1994.

[38]

Leeor Peled, Shie Mannor, Uri Weiser, and Yoav Etsion. Semantic locality and context-based prefetching using reinforcement learning. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), pages 285-297, 2015.

Digital Library

[39]

Leeor Peled, Uri Weiser, and Yoav Etsion. A neural network prefetcher for arbitrary memory access patterns. ACM Transactions on Architecture and Code Optimization (TACO), page 37, 2019.

[40]

Seth H Pugsley, Zeshan Chishti, Chris Wilkerson, Peng-fei Chuang, Robert L Scott, Aamer Jaleel, Shih-Lien Lu, Kingsum Chow, and Rajeev Balasubramonian. Sandbox prefetching: Safe run-time evaluation of aggressive prefetchers. In IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), 2014.

[41]

Suleyman Sair, Timothy Sherwood, and Brad Calder. A decoupled predictordirected stream prefetching architecture. IEEE Transactions on Computers, 52 ( 3 ): 260-276, March 2003.

Digital Library

[42]

Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian, Chris Wilkerson, Seth H. Pugsley, and Zeshan Chisthi. Eficiently prefetching complex address patterns. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO), pages 141-152, 2015.

Digital Library

[43]

Zhan Shi, Xiangru Huang, Akanksha Jain, and Calvin Lin. Applying deep learning to the cache replacement problem. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 413-425, 2019.

Digital Library

[44]

A.J. Smith. Sequential program prefetching in memory hierarchies. IEEE Transactions on Computers, 11 ( 12 ): 7-12, December 1978.

Digital Library

[45]

Yan Solihin, Jaejin Lee, and Josep Torrellas. Using a user-level memory thread for correlation prefetching. In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA), pages 171-182, 2002.

[46]

Stephen Somogyi, Thomas F. Wenisch, Anastasia Ailamaki, and Babak Falsafi. Spatio-temporal memory streaming. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 69-80, 2009.

Digital Library

[47]

Stephen Somogyi, Thomas F. Wenisch, Anastassia Ailamaki, Babak Falsafi, and Andreas Moshovos. Spatial memory streaming. In Proceedings of the 33th Annual International Symposium on Computer Architecture (ISCA), pages 252-263, 2006.

Digital Library

[48]

Ajitesh Srivastava, Angelos Lazaris, Benjamin Brooks, Rajgopal Kannan, and Viktor K. Prasanna. Predicting memory accesses: The road to compact ml-driven prefetcher. In Proceedings of the International Symposium on Memory Systems (MEMSYS), pages 461-470, 2019.

Digital Library

[49]

Stephen J Tarsa, Chit-Kwan Lin, Gokce Keskin, Gautham Chinya, and Hong Wang. Improving branch prediction by modeling global history with convolutional neural networks. arXiv preprint arXiv:1906.09889, 2019.

[50]

Elvira Teran, Zhe Wang, and Daniel A Jiménez. Perceptron learning for reuse prediction. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1-12, 2016.

[51]

Grigorios Tsoumakas and Ioannis Katakis. Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM), 3 ( 3 ): 1-13, 2007.

[52]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ?ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998-6008, 2017.

Digital Library

[53]

Thomas F. Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. Temporal streams in commercial server applications. In IEEE International Symposium on Workload Characterization, pages 99-108, 2008.

[54]

Thomas F Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. Practical of-chip meta-data for temporal memory streaming. In 2009 IEEE 15th International Symposium on High Performance Computer Architecture (HPCA), pages 79-90, 2009.

[55]

Carole-Jean Wu, Aamer Jaleel, Will Hasenplaugh, Margaret Martonosi, Simon C. Steely, Jr., and Joel Emer. SHiP: Signature-based hit predictor for high performance caching. In 44th IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 430-441, 2011.

Digital Library

[56]

Hao Wu, Krishnendra Nathella, Joseph Pusdesris, Dam Sunwoo, Akanksha Jain, and Calvin Lin. Temporal prefetching without the of-chip metadata. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 996-1008, 2019.

Digital Library

[57]

Hao Wu, Krishnendra Nathella, Dam Sunwoo, Akanksha Jain, and Calvin Lin. Eficient metadata management for irregular data prefetching. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA), pages 449-461, 2019.

Digital Library

[58]

Siavash Zangeneh, Stephen Pruett, Sangkug Lym, and Yale N Patt. Branchnet : A convolutional neural network to predict hard-to-predict branches. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 118-130, 2020.

Cited By

Cui YChen WCheng XYi J(2024)Hyperion: A Highly Effective Page and PC Based Delta PrefetcherACM Transactions on Architecture and Code Optimization10.1145/3675398Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3675398
Panda B(2024)The Game of Latency, Bandwidth, and Hardware PrefetchingComputer10.1109/MC.2024.338485157:6(122-126)Online publication date: Jun-2024
https://doi.org/10.1109/MC.2024.3384851
Gohil VDev SUpasani GLo DRanganathan PDelimitrou C(2024)The Importance of Generalizability in Machine Learning for SystemsIEEE Computer Architecture Letters10.1109/LCA.2024.338444923:1(95-98)Online publication date: Jan-2024
https://doi.org/10.1109/LCA.2024.3384449
Show More Cited By

Index Terms

A hierarchical neural model of data prefetching
1. Computer systems organization
  1. Architectures

Recommendations

Stealth prefetching
Proceedings of the 2006 ASPLOS Conference

Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...
Stealth prefetching
ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems

Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...
Increasing hardware data prefetching performance using the second-level cache

Techniques to reduce or tolerate large memory latencies are critical for achieving high processor performance. Hardware data prefetching is one of the most heavily studied solutions, but it is essentially applied to first-level caches where it can ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

April 2021

1090 pages

ISBN:9781450383172

DOI:10.1145/3445814

General Chair:
Tim Sherwood
University of California at Santa Barbara, USA
,
Program Chairs:
Emery Berger
University of Massachusetts at Amherst, USA
,
Christos Kozyrakis
Stanford University, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 April 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

ASPLOS '21

Sponsor:

SIGPLAN

ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

April 19 - 23, 2021

Virtual, USA

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
2,178
Total Downloads

Downloads (Last 12 months)740
Downloads (Last 6 weeks)95

Reflects downloads up to 12 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cui YChen WCheng XYi J(2024)Hyperion: A Highly Effective Page and PC Based Delta PrefetcherACM Transactions on Architecture and Code Optimization10.1145/3675398Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3675398
Panda B(2024)The Game of Latency, Bandwidth, and Hardware PrefetchingComputer10.1109/MC.2024.338485157:6(122-126)Online publication date: Jun-2024
https://doi.org/10.1109/MC.2024.3384851
Gohil VDev SUpasani GLo DRanganathan PDelimitrou C(2024)The Importance of Generalizability in Machine Learning for SystemsIEEE Computer Architecture Letters10.1109/LCA.2024.338444923:1(95-98)Online publication date: Jan-2024
https://doi.org/10.1109/LCA.2024.3384449
Ainsworth SMukhanov L(2024)Triangel: A High-Performance, Accurate, Timely On-Chip Temporal Prefetcher2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00090(1202-1216)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00090
Duong QJain ALin C(2024)A New Formulation of Neural Data Prefetching2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00088(1173-1187)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00088
Bera RRanganathan ARakshit JMahto SNori AGaur JOlgun AKanellopoulos KSadrosadati MSubramoney SMutlu O(2024)Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00017(88-102)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00017
Ren JXu DYang SZhao JLi ZNavasca CWang CXu HLi D(2024)Enabling Large Dynamic Neural Network Training with Learning-based Memory Management2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00066(788-802)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00066
Zhen YChen WGao WRen JChen KChen Y(2024)PatternS: An intelligent hybrid memory scheduler driven by page pattern recognitionJournal of Systems Architecture10.1016/j.sysarc.2024.103178153(103178)Online publication date: Aug-2024
https://doi.org/10.1016/j.sysarc.2024.103178
Zhang PKannan RNori APrasanna V(2024)Accelerating Graph Analytics Using Attention-Based Data PrefetcherSN Computer Science10.1007/s42979-024-02989-w5:5Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1007/s42979-024-02989-w
Zhen YZhang HDeng YChen WGao WRen JChen Y(2024)A smart hybrid memory scheduling approach using neural modelsScience China Information Sciences10.1007/s11432-023-3925-267:4Online publication date: 27-Mar-2024
https://doi.org/10.1007/s11432-023-3925-2
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents