Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3466752.3480050acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Leveraging Targeted Value Prediction to Unlock New Hardware Strength Reduction Potential

Published: 17 October 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Value Prediction (VP) is a microarchitectural technique that speculatively breaks data dependencies to increase the available Instruction Level Parallelism (ILP) in general purpose processors. Despite recent proposals, VP remains expensive and has intricate interactions with several stages of the classical superscalar pipeline. In this paper, we revisit and simplify VP by leveraging the irregular distribution of the values produced during the execution of common programs.
    First, we demonstrate that a reasonable fraction of the performance uplift brought by a full VP infrastructure can be obtained by predicting only a few ”usual suspects” values. Furthermore, we show that doing so allows to greatly simplify VP operation as well as reduce the value predictor footprint. Lastly, we show that these Minimal and Targeted VP infrastructures conceptually enable Speculative Strength Reduction (SpSR), a rename-time optimization whereby instructions can disappear at rename in the presence of specific operand values.

    References

    [1]
    Haitham Akkary, Ravi Rajwar, and Srikanth T Srinivasan. 2003. Checkpoint processing and recovery: Towards scalable large instruction window processors. In Proc. of the IEEE/ACM Intl. Symp. on Microarchitecture. IEEE, 423–434.
    [2]
    Juan L Aragón, José González, and Antonio González. 2003. Power-aware control speculation through selective throttling. In Proc. of the Intl. Symp. on High-Performance Computer Architecture. IEEE, 103–112.
    [3]
    Arm Ltd.[n. d.]. Armv8 Reference Manual. https://documentation-service.arm.com/static/5f20515cbb903e39c84dc459?token=.
    [4]
    S. Bandishte, J. Gaur, Z. Sperber, L. Rappoport, A. Yoaz, and S. Subramoney. 2020. Focused Value Prediction. In Proc. of the ACM/IEEE Intl. Symp. on Computer Architecture. 79–91. https://doi.org/10.1109/ISCA45697.2020.00018
    [5]
    Steven Battle, Andrew D Hilton, Mark Hempstead, and Amir Roth. 2012. Flexible register management using reference counting. In Proc. of the IEEE Intl. Symp. on High-Performance Comp Architecture. IEEE, 1–12.
    [6]
    James Bucek, Klaus-Dieter Lange, and Jóakim v. Kistowski. 2018. SPEC CPU2017: Next-generation compute benchmark. In Companion of the 2018 ACM/SPEC Intl. Conf. on Performance Engineering. 41–42.
    [7]
    Martin Burtscher and Benjamin G Zorn. 1999. Exploring last n value prediction. In Proc. of the IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques. IEEE, 66–76.
    [8]
    Brad Calder, Glenn Reinman, and Dean M Tullsen. 1999. Selective value prediction. In Proc. of the IEEE/ACM Intl. Symp. on Computer Architecture. 64–74.
    [9]
    G Chrysos and J Emer. 1998. Memory Dependence Prediction using Store Sets. In Proc. of the ACM/IEEE Intl. Symp. on Computer Architecture. IEEE, 0142–0142.
    [10]
    Brian Fahs, Todd Rafacz, Sanjay J Patel, and Steven S Lumetta. 2005. Continuous optimization. In Proc. of the Intl. Symp. on Computer Architecture. IEEE, 86–97.
    [11]
    Andrei Frumusanu. [n. d.]. Apple’s Humongous CPU Microarchitecture. https://www.anandtech.com/show/16226/apple-silicon-m1-a14-deep-dive/2
    [12]
    John WC Fu, Janak H Patel, and Bob L Janssens. 1992. Stride directed prefetching in scalar processors. In Pro. of the IEEE/ACM Intl. Symp. on Microarchitecture. 102–110.
    [13]
    Freddy Gabbay and Avi Mendelson. 1998. Using value prediction to increase the power of speculative execution hardware. ACM Transactions on Computer Systems 16, 3 (1998), 234–270.
    [14]
    Bart Goeman, Hans Vandierendonck, and Koenraad De Bosschere. 2001. Differential FCM: Increasing value prediction accuracy by improving table usage efficiency. In Proc. of the IEEE Intl. Symp. on High-Performance Computer Architecture. IEEE, 207–216.
    [15]
    Antonio Gonzalez, Jose Gonzalez, and Mateo Valero. 1998. Virtual-physical registers. In Proc. of the IEEE Intl. Symp. on High-Performance Computer Architecture. IEEE, 175–184.
    [16]
    Brian Grayson, Jeff Rupley, Gerald Zuraski, Eric Quinnell, Daniel A. Jiménez, Tarun Nakra, Paul Kitchin, Ryan Hensley, Edward Brekelbaum, Vikas Sinha, and Ankit Ghiya. 2020. Evolution of the samsung exynos CPU microarchitecture. In Proc. of the ACM/IEEE Intl. Symp. on Computer Architecture. IEEE, 40–51.
    [17]
    Greg Hamerly, Erez Perelman, Jeremy Lau, and Brad Calder. 2005. Simpoint 3.0: Faster and more flexible program phase analysis. Journal of Instruction Level Parallelism 7, 4 (2005), 1–28.
    [18]
    Intel Corporation. [n. d.]. Intel 64 and IA-32 Arch. Optim. Reference Manual. software.intel.com/content/dam/develop/public/us/en/ documents/64-ia-32-architectures-optimization-manual.pdf.
    [19]
    Intel Corporation. [n. d.]. Intel 64 and IA-32 Arch. Soft. Dev. Manuals. software.intel.com/content/www/us/en/develop/download/ intel-64-and-ia-32-architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4.html.
    [20]
    Yasuo Ishii, Mary Inaba, and Kei Hiraki. 2009. Access map pattern matching for data cache prefetch. In Proceedings of the 23rd international conference on Supercomputing. 499–500.
    [21]
    Daniel A Jiménez and Calvin Lin. 2001. Dynamic branch prediction with perceptrons. In Proc. of the IEEE Intl. Symp. on High Performance Computer Architecture. IEEE, 197–206.
    [22]
    Stephan Jourdan, Ronny Ronen, Michael Bekerman, Bishara Shomar, and Adi Yoaz. 1998. A novel renaming scheme to exploit value temporal locality through physical register reuse and unification. In Proc. of the Intl. Symp. on Microarchitecture. IEEE, 216–225.
    [23]
    Kleovoulos Kalaitzidis and André Seznec. 2020. Leveraging Value Equality Prediction for Value Speculation. ACM Transactions on Architecture and Code Optimization 18, 1(2020), 1–20.
    [24]
    Ilhyun Kim and Mikko H Lipasti. 2004. Understanding scheduling replay schemes. In Proc. of the IEEE Intl. Symp. on High Performance Computer Architecture. IEEE, 198–209.
    [25]
    Mikko H Lipasti, Brian R Mestan, and Erika Gunadi. 2004. Physical register inlining. In Proc. of the IEEE/ACM Intl. Symp. on Computer Architecture. IEEE, 325–335.
    [26]
    Mikko H Lipasti, Christopher B Wilkerson, and John Paul Shen. 1996. Value locality and load value prediction. In Proc. of the ACM Tntl Conf. on Architectural Support for Programming Languages and Operating Systems. 138–147.
    [27]
    Gabriel Loh. 2003. Width prediction for reducing value predictor size and power. In First Value Prediction Workshop, at IEEE/ACM ISCA. Citeseer.
    [28]
    Jason Lowe-Power, Abdul Mutaal Ahmad, Ayaz Akram, Mohammad Alian, Rico Amslinger, Matteo Andreozzi, Adrià Armejach, Nils Asmussen, Brad Beckmann, Srikant Bharadwaj, Gabe Black, Gedare Bloom, Bobby R. Bruce, Daniel Rodrigues Carvalho, Jeronimo Castrillon, Lizhong Chen, Nicolas Derumigny, Stephan Diestelhorst, Wendy Elsasser, Carlos Escuin, Marjan Fariborz, Amin Farmahini-Farahani, Pouya Fotouhi, Ryan Gambord, Jayneel Gandhi, Dibakar Gope, Thomas Grass, Anthony Gutierrez, Bagus Hanindhito, Andreas Hansson, Swapnil Haria, Austin Harris, Timothy Hayes, Adrian Herrera, Matthew Horsnell, Syed Ali Raza Jafri, Radhika Jagtap, Hanhwi Jang, Reiley Jeyapaul, Timothy M. Jones, Matthias Jung, Subash Kannoth, Hamidreza Khaleghzadeh, Yuetsu Kodama, Tushar Krishna, Tommaso Marinelli, Christian Menard, Andrea Mondelli, Miquel Moreto, Tiago Mück, Omar Naji, Krishnendra Nathella, Hoa Nguyen, Nikos Nikoleris, Lena E. Olson, Marc Orr, Binh Pham, Pablo Prieto, Trivikram Reddy, Alec Roelke, Mahyar Samani, Andreas Sandberg, Javier Setoain, Boris Shingarov, Matthew D. Sinclair, Tuan Ta, Rahul Thakur, Giacomo Travaglini, Michael Upton, Nilay Vaish, Ilias Vougioukas, William Wang, Zhengrong Wang, Norbert Wehn, Christian Weis, David A. Wood, Hongil Yoon, and Éder F. Zulian. 2020. The gem5 Simulator: Version 20.0+. arxiv:cs.AR/2007.03152
    [29]
    Milo MK Martin, Daniel J Sorin, Harold W Cain, Mark D Hill, and Mikko H Lipasti. 2001. Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing. In Proc. of the ACM/IEEE Intl. Symp. on Microarchitecture. MICRO-34. IEEE, 328–337.
    [30]
    Tarun Nakra, Rajiv Gupta, and Mary Lou Soffa. 1999. Global context-based value prediction. In Proc. ot the IEEE Intl. Symp. on High-Performance Computer Architecture. IEEE, 4–12.
    [31]
    Subbarao Palacharla, Norman P Jouppi, and James E Smith. 1997. Complexity-effective superscalar processors. In Proc. of the IEEE/ACM Intl. Symp. on Computer architecture. 206–218.
    [32]
    Arthur Perais. 2021. A Case for Speculative Strength Reduction. IEEE Computer Architecture Letters 20, 1 (2021), 22–25.
    [33]
    Arthur Perais and André Seznec. 2014. EOLE: Paving the way for an effective implementation of value prediction. In Proc. of the ACM/IEEE Intl. Symp. on Computer Architecture. IEEE, 481–492.
    [34]
    Arthur Perais and André Seznec. 2014. Practical data value speculation for future high-end processors. In Proc. of the IEEE Intl. Symp. on High Performance Computer Architecture. IEEE, 428–439.
    [35]
    Arthur Perais and André Seznec. 2015. BeBoP: A cost effective predictor infrastructure for superscalar value prediction. In Proc. of the IEEE Intl. Symp. on High Performance Computer Architecture. IEEE, 13–25.
    [36]
    Arthur Perais and André Seznec. 2016. Cost effective physical register sharing. In Proc. of the IEEE Intl. Symp. on High Performance Computer Architecture (HPCA). IEEE, 694–706.
    [37]
    Vlad Petric, Tingting Sha, and Amir Roth. 2005. Reno: a rename-based instruction optimizer. In Proc. of the Intl. Symp. on Computer Architecture. IEEE, 98–109.
    [38]
    G Reinman, T Anstin, and B Calder. 1999. A scalable front-end architecture for fast instruction delivery. In Proc. of the ACM/IEEE Intl. Symp. on Computer Architecture. IEEE, 234–245.
    [39]
    Nicholas Riley and Craig Zilles. 2006. Probabilistic counter updates for predictor hysteresis and stratification. In Proc. of the IEEE Intl. Symp. on High-Performance Computer Architecture, 2006. IEEE, 110–120.
    [40]
    RISC-V Foundation. [n. d.]. RISC-V Unprivileged Spec. https://github.com/riscv/riscv-isa-manual/releases/latest.
    [41]
    Elham Safi, Andreas Moshovos, and Andreas Veneris. 2010. Two-stage, pipelined register renaming. IEEE Transactions on Very Large Scale Integration systems 19, 10(2010), 1926–1931.
    [42]
    Toshinori Sato and Itsujiro Arita. 2000. Table size reduction for data value predictors by exploiting narrow width values. In Proc. of the Intl. Conf. on Supercomputing. 196–205.
    [43]
    Yiannakis Sazeides and James E Smith. 1997. The predictability of data values. In Proc. of the IEEE/ACM Intl. Symp. on Microarchitecture. IEEE, 248–258.
    [44]
    André Seznec. 2011. A new case for the tage branch predictor. In Proc. of the IEEE/ACM Intl. Symp. on Microarchitecture. 117–127.
    [45]
    André Seznec, Stephen Felix, Venkata Krishnan, and Yiannakis Sazeides. 2002. Design tradeoffs for the Alpha EV8 conditional branch predictor. Proc. of the ACM/IEEE Intl. Symp. on Computer Architecture 30, 2(2002), 295–306.
    [46]
    Rami Sheikh, Harold W Cain, and Raguram Damodaran. 2017. Load value prediction via path-based address prediction: Avoiding mispredictions due to conflicting stores. In Proc. of the IEEE/ACM Intl. Symp. on Microarchitecture. 423–435.
    [47]
    Rami Sheikh and Derek Hower. 2019. Efficient load value prediction using multiple predictors and filters. In Proc. of the IEEE Intl. Symp. on High Performance Computer Architecture. IEEE, 454–465.
    [48]
    Niranjan Soundararajan, Saurabh Gupta, Ragavendra Natarajan, Jared Stark, Rahul Pal, Franck Sala, Lihu Rappoport, Adi Yoaz, and Sreenivas Subramoney. 2019. Towards the adoption of local branch predictors in modern out-of-order superscalar processors. In Proc. of the IEEE/ACM Intl. Symp. on Microarchitecture. 519–530.
    [49]
    Renju Thomas and Manoj Franklin. 2001. Using dataflow based context for accurate value prediction. In Proc. of the IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques. IEEE, 107–117.
    [50]
    Gary S Tyson and Todd M Austin. 1997. Improving the accuracy and performance of memory communication through renaming. In Proc. of the Intl. Symp. on Microarchitecture. IEEE, 218–227.
    [51]
    Kai Wang and Manoj Franklin. 1997. Highly accurate data value prediction using hybrid predictors. In Proc. of the IEEE/ACM Intl. Symp. on Microarchitecture. IEEE, 281–290.
    [52]
    Jun Yang and Rajiv Gupta. 2002. Frequent value locality and its applications. ACM Transactions on Embedded Computing Systems (TECS) 1, 1(2002), 79–105.
    [53]
    Huiyang Zhou, Jill Flanagan, and Thomas M Conte. 2003. Detecting global stride locality in value streams. In Proc. of the IEEE/ACM Intl. Symp. on Computer architecture. 324–335.

    Cited By

    View all
    • (2024)Cost-Effective Value Predictor for ILP processors through Design Space ExplorationProceedings of the Great Lakes Symposium on VLSI 202410.1145/3649476.3658804(301-304)Online publication date: 12-Jun-2024
    • (2024)Improving the Representativeness of Simulation Intervals for the Cache Memory SystemIEEE Access10.1109/ACCESS.2024.335064612(5973-5985)Online publication date: 2024

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture
    October 2021
    1322 pages
    ISBN:9781450385572
    DOI:10.1145/3466752
    Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Microarchitecture
    2. performance
    3. speculation
    4. value prediction

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    MICRO '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 484 of 2,242 submissions, 22%

    Upcoming Conference

    MICRO '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)113
    • Downloads (Last 6 weeks)6

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Cost-Effective Value Predictor for ILP processors through Design Space ExplorationProceedings of the Great Lakes Symposium on VLSI 202410.1145/3649476.3658804(301-304)Online publication date: 12-Jun-2024
    • (2024)Improving the Representativeness of Simulation Intervals for the Cache Memory SystemIEEE Access10.1109/ACCESS.2024.335064612(5973-5985)Online publication date: 2024

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media