Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3528416.3530236acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article
Open access

Fine-grained address segmentation for attention-based variable-degree prefetching

Published: 17 May 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Machine learning algorithms have shown potential to improve prefetching performance by accurately predicting future memory accesses. Existing approaches are based on the modeling of text prediction, considering prefetching as a classification problem for sequence prediction. However, the vast and sparse memory address space leads to large vocabulary, which makes this modeling impractical. The number and order of outputs for multiple cache line prefetching are also fundamentally different from text prediction.
    We propose TransFetch, a novel way to model prefetching. To reduce vocabulary size, we use fine-grained address segmentation as input. To predict unordered sets of future addresses, we use delta bitmaps for multiple outputs. We apply an attention-based network to learn the mapping between input and output. Prediction experiments demonstrate that address segmentation achieves 26% - 36% higher F1-score than delta inputs and 15% - 24% higher F1-score than page & offset inputs for SPEC 2006, SPEC 2017, and GAP benchmarks. Simulation results show that TransFetch achieves 38.75% IPC improvement compared with no prefetching, outperforming the best-performing rule-based prefetcher BOP by 10.44% and ML-based prefetcher Voyager by 6.64%.

    References

    [1]
    Mohammad Bakhshalipour, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. 2018. Domino temporal data prefetcher. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 131--142.
    [2]
    Scott Beamer, Krste Asanović, and David Patterson. 2015. The GAP benchmark suite. arXiv preprint arXiv:1508.03619 (2015).
    [3]
    Peter Braun and Heiner Litz. 2019. Understanding Memory Access Patterns for Prefetching. In International Workshop on AI-assisted Design for Architecture (AIDArc), held in conjunction with ISCA.
    [4]
    Carlos Carvalho. 2002. The gap between processor and memory speeds. In Proc. of IEEE International Conference on Control and Automation.
    [5]
    "ChampSim". 2017. https://github.com/ChampSim/ChampSim.
    [6]
    Chi F Chen, S-H Yang, Babak Falsafi, and Andreas Moshovos. 2004. Accurate and complexity-effective spatial pattern prediction. In 10th International Symposium on High Performance Computer Architecture (HPCA'04). IEEE, 276--287.
    [7]
    Tien-Fu Chen and Jean-Loup Baer. 1995. Effective hardware-based data prefetching for high-performance processors. IEEE transactions on computers 44, 5 (1995), 609--623.
    [8]
    Trishul M Chilimbi. 2001. Efficient representations and abstractions for quantifying and exploiting data reference locality. ACM SIGPLAN Notices 36, 5 (2001), 191--202.
    [9]
    Yuan Chou. 2007. Low-cost epoch-based correlation prefetching for commercial applications. In 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007). IEEE, 301--313.
    [10]
    "SPEC CPU2017". 2017. The Standard Performance Evaluation Corporation. https://www.spec.org/cpu2017/.
    [11]
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
    [12]
    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
    [13]
    Michel Dubois, Murali Annavaram, and Per Stenström. 2012. Parallel computer organization and design. cambridge university press.
    [14]
    Keith I Farkas, Paul Chow, Norman P Jouppi, and Zvonko Vranesic. 1997. Memory-system design considerations for dynamically-scheduled processors. ACM SIGARCH Computer Architecture News 25, 2 (1997), 133--143.
    [15]
    Milad Hashemi, Kevin Swersky, Jamie A Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, and Parthasarathy Ranganathan. 2018. Learning memory access patterns. arXiv preprint arXiv:1803.02329 (2018).
    [16]
    Milad Hashemi, Kevin Swersky, Jamie A. Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, and Parthasarathy Ranganathan. 2018. Learning Memory Access Patterns. CoRR abs/1803.02329 (2018). arXiv:1803.02329 http://arxiv.org/abs/1803.02329
    [17]
    Anakhi Hazarika, Soumyajit Poddar, and Hafizur Rahaman. 2020. Survey on memory management techniques in heterogeneous computing systems. IET Computers & Digital Techniques 14, 2 (2020), 47--60.
    [18]
    Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
    [19]
    Zhigang Hu, Margaret Martonosi, and Stefanos Kaxiras. 2003. TCP: Tag correlating prefetchers. In The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings. IEEE, 317--326.
    [20]
    Ibrahim Hur and Calvin Lin. 2006. Memory prefetching using adaptive stream detection. In 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06). IEEE, 397--408.
    [21]
    Andrey Ignatov, Radu Timofte, William Chou, Ke Wang, Max Wu, Tim Hartley, and Luc Van Gool. 2018. Ai benchmark: Running deep neural networks on android smartphones. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops. 0--0.
    [22]
    Yasuo Ishii, Mary Inaba, and Kei Hiraki. 2011. Access map pattern matching for high performance data cache prefetch. Journal of Instruction-Level Parallelism 13, 2011 (2011), 1--24.
    [23]
    Akanksha Jain and Calvin Lin. 2013. Linearizing irregular memory accesses for improved correlated prefetching. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. 247--259.
    [24]
    Aamer Jaleel. 2010. Memory characterization of workloads using instrumentation-driven simulation. Web Copy: http://www.glue.umd.edu/ajaleel/workload (2010).
    [25]
    Teresa L Johnson, Matthew C Merten, and Wen-Mei W Hwu. 1997. Run-time spatial locality detection and optimization. In Proceedings of 30th Annual International Symposium on Microarchitecture. IEEE, 57--64.
    [26]
    Doug Joseph and Dirk Grunwald. 1997. Prefetching using markov predictors. In Proceedings of the 24th annual international symposium on Computer architecture. 252--263.
    [27]
    Norman P Jouppi. 1990. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. ACM SIGARCH Computer Architecture News 18, 2SI (1990), 364--373.
    [28]
    Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th annual international symposium on computer architecture. 1--12.
    [29]
    Karthik Kambatla, Giorgos Kollias, Vipin Kumar, and Ananth Grama. 2014. Trends in big data analytics. Journal of parallel and distributed computing 74, 7 (2014), 2561--2573.
    [30]
    Jinchun Kim, Seth H Pugsley, Paul V Gratz, AL Narasimha Reddy, Chris Wilkerson, and Zeshan Chishti. 2016. Path confidence based lookahead prefetching. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1--12.
    [31]
    Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
    [32]
    Sanjeev Kumar and Christopher Wilkerson. 1998. Exploiting spatial locality in data caches using spatial footprints. In Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No. 98CB36235). IEEE, 357--368.
    [33]
    Colin Lea, Michael D Flynn, Rene Vidal, Austin Reiter, and Gregory D Hager. 2017. Temporal convolutional networks for action segmentation and detection. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 156--165.
    [34]
    Wei-Fen Lin, Steven K Reinhardt, Doug Burger, and Thomas R Puzak. 2001. Filtering superfluous prefetches using density vectors. In Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001. IEEE, 124--132.
    [35]
    Ward Douglas Maurer and Ted G Lewis. 1975. Hash table methods. ACM Computing Surveys (CSUR) 7, 1 (1975), 5--19.
    [36]
    Julian Richard Medina and Jugal Kalita. 2018. Parallel attention mechanisms in neural machine translation. In 2018 17th IEEE international conference on machine learning and applications (ICMLA). IEEE, 547--552.
    [37]
    Pierre Michaud. 2016. Best-offset hardware prefetching. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 469--480.
    [38]
    Richard C Murphy, Kyle B Wheeler, Brian W Barrett, and James A Ang. 2010. Introducing the graph 500. Cray Users Group (CUG) 19 (2010), 45--74.
    [39]
    Prakash M Nadkarni, Lucila Ohno-Machado, and Wendy W Chapman. 2011. Natural language processing: an introduction. Journal of the American Medical Informatics Association 18, 5 (2011), 544--551.
    [40]
    Arvind Narayanan, Saurabh Verma, Eman Ramadan, Pariya Babaie, and Zhi-Li Zhang. 2018. Deepcache: A deep learning based framework for content caching. In Proceedings of the 2018 Workshop on Network Meets AI & ML. 48--53.
    [41]
    Mahdi Nazemi, Arash Fayyazi, Amirhossein Esmaili, Atharva Khare, Soheil Nazar Shahsavani, and Massoud Pedram. 2021. NullaNet Tiny: Ultra-low-latency DNN Inference Through Fixed-function Combinational Logic. In 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 266--267.
    [42]
    Kyle J Nesbit, Ashutosh S Dhodapkar, and James E Smith. 2004. AC/DC: An adaptive data cache prefetcher. In Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004. IEEE, 135--145.
    [43]
    Tesla NVIDIA. 2017. V100 GPU architecture. The world's most advanced data center GPU. Version WP-08608-001_v1 1 (2017).
    [44]
    Subbarao Palacharla and Richard E Kessler. 1994. Evaluating stream buffers as a secondary cache replacement. In Proceedings of the 21st annual international symposium on Computer architecture. 24--33.
    [45]
    Leeor Peled, Shie Mannor, Uri Weiser, and Yoav Etsion. 2015. Semantic locality and context-based prefetching using reinforcement learning. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). IEEE, 285--297.
    [46]
    Leeor Peled, Uri Weiser, and Yoav Etsion. 2018. A neural network memory prefetcher using semantic locality. arXiv preprint arXiv:1804.00478 (2018).
    [47]
    Erez Perelman, Greg Hamerly, Michael Van Biesbrouck, Timothy Sherwood, and Brad Calder. 2003. Using SimPoint for accurate and efficient simulation. ACM SIGMETRICS Performance Evaluation Review 31, 1 (2003), 318--319.
    [48]
    David MW Powers. 2020. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061 (2020).
    [49]
    Seth H Pugsley, Zeshan Chishti, Chris Wilkerson, Peng-fei Chuang, Robert L Scott, Aamer Jaleel, Shih-Lien Lu, Kingsum Chow, and Rajeev Balasubramonian. 2014. Sandbox prefetching: Safe run-time evaluation of aggressive prefetchers. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). IEEE, 626--637.
    [50]
    S Rahman, M Burtscher, Z Zong, and A Qasem. 2015. Maximizing Hardware Prefetch Effectiveness with Machine Learning. In 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems. 383--389.
    [51]
    Mohammad Samragh Razlighi, Mohsen Imani, Farinaz Koushanfar, and Tajana Rosing. 2017. Looknn: Neural network with no multiplication. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017. IEEE, 1775--1780.
    [52]
    Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, and Jeremy Kepner. 2019. Survey and benchmarking of machine learning accelerators. In 2019 IEEE high performance extreme computing conference (HPEC). IEEE, 1--9.
    [53]
    Takaya Saito and Marc Rehmsmeier. 2015. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PloS one 10, 3 (2015), e0118432.
    [54]
    Siddharth Samsi, Vijay Gadepally, Michael Hurley, Michael Jones, Edward Kao, Sanjeev Mohindra, Paul Monticciolo, Albert Reuther, Steven Smith, William Song, et al. 2018. Graphchallenge. org: Raising the bar on graph analytic performance. In 2018 IEEE High Performance extreme Computing Conference (HPEC). IEEE, 1--7.
    [55]
    Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian, Chris Wilkerson, Seth H Pugsley, and Zeshan Chishti. 2015. Efficiently prefetching complex address patterns. In 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 141--152.
    [56]
    Zhan Shi, Akanksha Jain, Kevin Swersky, Milad Hashemi, Parthasarathy Ranganathan, and Calvin Lin. 2021. A hierarchical neural model of data prefetching. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 861--873.
    [57]
    Alan Jay Smith. 1978. Sequential program prefetching in memory hierarchies. Computer 11, 12 (1978), 7--21.
    [58]
    Yan Solihin, Jaejin Lee, and Josep Torrellas. 2002. Using a user-level memory thread for correlation prefetching. In Proceedings 29th Annual International Symposium on Computer Architecture. IEEE, 171--182.
    [59]
    Stephen Somogyi, Thomas F Wenisch, Anastassia Ailamaki, Babak Falsafi, and Andreas Moshovos. 2006. Spatial memory streaming. ACM SIGARCH Computer Architecture News 34, 2 (2006), 252--263.
    [60]
    Viji Srinivasan, Edward S Davidson, and Gary S Tyson. 2004. A prefetch taxonomy. IEEE Trans. Comput. 53, 2 (2004), 126--140.
    [61]
    Ajitesh Srivastava, Angelos Lazaris, Benjamin Brooks, Rajgopal Kannan, and Viktor K Prasanna. 2019. Predicting memory accesses: the road to compact ML-driven prefetcher. In Proceedings of the International Symposium on Memory Systems. 461--470.
    [62]
    Ajitesh Srivastava, Ta-Yang Wang, Pengmiao Zhang, Cesar Augusto F De Rose, Rajgopal Kannan, and Viktor K Prasanna. 2020. MemMAP: Compact and Generalizable Meta-LSTM Models for Memory Access Prediction. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 57--68.
    [63]
    Chun-Wei Tsai, Chin-Feng Lai, Han-Chieh Chao, and Athanasios V Vasilakos. 2015. Big data analytics: a survey. Journal of Big data 2, 1 (2015), 1--32.
    [64]
    Raju Vaishya, Mohd Javaid, Ibrahim Haleem Khan, and Abid Haleem. 2020. Artificial Intelligence (AI) applications for COVID-19 pandemic. Diabetes & Metabolic Syndrome: Clinical Research & Reviews 14, 4 (2020), 337--339.
    [65]
    Steven P Vander Wiel and David J Lilja. 1997. When caches aren't enough: Data prefetching techniques. Computer 30, 7 (1997), 23--30.
    [66]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.
    [67]
    Brian Wahl, Aline Cossy-Gantner, Stefan Germann, and Nina R Schwalbe. 2018. Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings? BMJ global health 3, 4 (2018), e000798.
    [68]
    Jonathan J Webster and Chunyu Kit. 1992. Tokenization as the initial phase in NLP. In COLING 1992 Volume 4: The 14th International Conference on Computational Linguistics.
    [69]
    Thomas F Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. 2008. Temporal streams in commercial server applications. In 2008 IEEE International Symposium on Workload Characterization. IEEE, 99--108.
    [70]
    Hao Wu, Krishnendra Nathella, Joseph Pusdesris, Dam Sunwoo, Akanksha Jain, and Calvin Lin. 2019. Temporal prefetching without the off-chip metadata. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 996--1008.
    [71]
    Hao Wu, Krishnendra Nathella, Dam Sunwoo, Akanksha Jain, and Calvin Lin. 2019. Efficient metadata management for irregular data prefetching. In 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). IEEE, 1--13.
    [72]
    Wm A Wulf and Sally A McKee. 1995. Hitting the memory wall: Implications of the obvious. ACM SIGARCH computer architecture news 23, 1 (1995), 20--24.
    [73]
    Yuan Zeng and Xiaochen Guo. 2017. Long short term memory based hardware prefetcher: a case study. In Proceedings of the International Symposium on Memory Systems. 305--311.
    [74]
    Pengmiao Zhang, Ajitesh Srivastava, Benjamin Brooks, Rajgopal Kannan, and Viktor K Prasanna. 2020. RAOP: Recurrent Neural Network Augmented Offset Prefetcher. In The International Symposium on Memory Systems (MEMSYS 2020).
    [75]
    Pengmiao Zhang, Ajitesh Srivastava, Ta-Yang Wang, Cesar AF De Rose, Rajgopal Kannan, and Viktor K Prasanna. 2021. C-MemMAP: clustering-driven compact, adaptable, and generalizable meta-LSTM models for memory access prediction. International Journal of Data Science and Analytics (2021), 1--14.

    Cited By

    View all
    • (2024)TabConv: Low-Computation CNN Inference via Table LookupsProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649212(180-188)Online publication date: 7-May-2024
    • (2023)Accurate Open-Set Recognition for Memory WorkloadACM Transactions on Knowledge Discovery from Data10.1145/359702717:9(1-14)Online publication date: 15-Jun-2023
    • (2023)ME- ViT: A Single-Load Memory-Efficient FPGA Accelerator for Vision Transformers2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC58850.2023.00039(213-223)Online publication date: 18-Dec-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CF '22: Proceedings of the 19th ACM International Conference on Computing Frontiers
    May 2022
    321 pages
    ISBN:9781450393386
    DOI:10.1145/3528416
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 May 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. address segmentation
    2. attention
    3. machine learning
    4. prefetching

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    CF '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 273 of 785 submissions, 35%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)324
    • Downloads (Last 6 weeks)36

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)TabConv: Low-Computation CNN Inference via Table LookupsProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649212(180-188)Online publication date: 7-May-2024
    • (2023)Accurate Open-Set Recognition for Memory WorkloadACM Transactions on Knowledge Discovery from Data10.1145/359702717:9(1-14)Online publication date: 15-Jun-2023
    • (2023)ME- ViT: A Single-Load Memory-Efficient FPGA Accelerator for Vision Transformers2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC58850.2023.00039(213-223)Online publication date: 18-Dec-2023
    • (2023)PaCKD: Pattern-Clustered Knowledge Distillation for Compressing Memory Access Prediction Models2023 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC58863.2023.10363610(1-7)Online publication date: 25-Sep-2023
    • (2023)G-MAP: A Graph Neural Network-Based Framework for Memory Access Prediction2023 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC58863.2023.10363605(1-7)Online publication date: 25-Sep-2023
    • (2023)RL-Based Cache Replacement: A Modern Interpretation of Belady’s Algorithm With Bypass Mechanism and Access Type AnalysisIEEE Access10.1109/ACCESS.2023.334679011(145238-145253)Online publication date: 2023
    • (2022)ReSemble: Reinforced Ensemble Framework for Data PrefetchingSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00086(1-14)Online publication date: Nov-2022
    • (2022)SHARP: Software Hint-Assisted Memory Access Prediction for Graph Analytics2022 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC55821.2022.9926307(1-8)Online publication date: 19-Sep-2022

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media