research-article

Leveraging Targeted Value Prediction to Unlock New Hardware Strength Reduction Potential

Author:

Arthur PeraisAuthors Info & Claims

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

October 2021

Pages 792 - 803

https://doi.org/10.1145/3466752.3480050

Published: 17 October 2021 Publication History

Abstract

Value Prediction (VP) is a microarchitectural technique that speculatively breaks data dependencies to increase the available Instruction Level Parallelism (ILP) in general purpose processors. Despite recent proposals, VP remains expensive and has intricate interactions with several stages of the classical superscalar pipeline. In this paper, we revisit and simplify VP by leveraging the irregular distribution of the values produced during the execution of common programs.

First, we demonstrate that a reasonable fraction of the performance uplift brought by a full VP infrastructure can be obtained by predicting only a few ”usual suspects” values. Furthermore, we show that doing so allows to greatly simplify VP operation as well as reduce the value predictor footprint. Lastly, we show that these Minimal and Targeted VP infrastructures conceptually enable Speculative Strength Reduction (SpSR), a rename-time optimization whereby instructions can disappear at rename in the presence of specific operand values.

References

[1]

Haitham Akkary, Ravi Rajwar, and Srikanth T Srinivasan. 2003. Checkpoint processing and recovery: Towards scalable large instruction window processors. In Proc. of the IEEE/ACM Intl. Symp. on Microarchitecture. IEEE, 423–434.

[2]

Juan L Aragón, José González, and Antonio González. 2003. Power-aware control speculation through selective throttling. In Proc. of the Intl. Symp. on High-Performance Computer Architecture. IEEE, 103–112.

[3]

Arm Ltd.[n. d.]. Armv8 Reference Manual. https://documentation-service.arm.com/static/5f20515cbb903e39c84dc459?token=.

[4]

S. Bandishte, J. Gaur, Z. Sperber, L. Rappoport, A. Yoaz, and S. Subramoney. 2020. Focused Value Prediction. In Proc. of the ACM/IEEE Intl. Symp. on Computer Architecture. 79–91. https://doi.org/10.1109/ISCA45697.2020.00018

Digital Library

[5]

Steven Battle, Andrew D Hilton, Mark Hempstead, and Amir Roth. 2012. Flexible register management using reference counting. In Proc. of the IEEE Intl. Symp. on High-Performance Comp Architecture. IEEE, 1–12.

Digital Library

[6]

James Bucek, Klaus-Dieter Lange, and Jóakim v. Kistowski. 2018. SPEC CPU2017: Next-generation compute benchmark. In Companion of the 2018 ACM/SPEC Intl. Conf. on Performance Engineering. 41–42.

Digital Library

[7]

Martin Burtscher and Benjamin G Zorn. 1999. Exploring last n value prediction. In Proc. of the IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques. IEEE, 66–76.

[8]

Brad Calder, Glenn Reinman, and Dean M Tullsen. 1999. Selective value prediction. In Proc. of the IEEE/ACM Intl. Symp. on Computer Architecture. 64–74.

[9]

G Chrysos and J Emer. 1998. Memory Dependence Prediction using Store Sets. In Proc. of the ACM/IEEE Intl. Symp. on Computer Architecture. IEEE, 0142–0142.

[10]

Brian Fahs, Todd Rafacz, Sanjay J Patel, and Steven S Lumetta. 2005. Continuous optimization. In Proc. of the Intl. Symp. on Computer Architecture. IEEE, 86–97.

Digital Library

[11]

Andrei Frumusanu. [n. d.]. Apple’s Humongous CPU Microarchitecture. https://www.anandtech.com/show/16226/apple-silicon-m1-a14-deep-dive/2

[12]

John WC Fu, Janak H Patel, and Bob L Janssens. 1992. Stride directed prefetching in scalar processors. In Pro. of the IEEE/ACM Intl. Symp. on Microarchitecture. 102–110.

[13]

Freddy Gabbay and Avi Mendelson. 1998. Using value prediction to increase the power of speculative execution hardware. ACM Transactions on Computer Systems 16, 3 (1998), 234–270.

Digital Library

[14]

Bart Goeman, Hans Vandierendonck, and Koenraad De Bosschere. 2001. Differential FCM: Increasing value prediction accuracy by improving table usage efficiency. In Proc. of the IEEE Intl. Symp. on High-Performance Computer Architecture. IEEE, 207–216.

[15]

Antonio Gonzalez, Jose Gonzalez, and Mateo Valero. 1998. Virtual-physical registers. In Proc. of the IEEE Intl. Symp. on High-Performance Computer Architecture. IEEE, 175–184.

[16]

Brian Grayson, Jeff Rupley, Gerald Zuraski, Eric Quinnell, Daniel A. Jiménez, Tarun Nakra, Paul Kitchin, Ryan Hensley, Edward Brekelbaum, Vikas Sinha, and Ankit Ghiya. 2020. Evolution of the samsung exynos CPU microarchitecture. In Proc. of the ACM/IEEE Intl. Symp. on Computer Architecture. IEEE, 40–51.

Digital Library

[17]

Greg Hamerly, Erez Perelman, Jeremy Lau, and Brad Calder. 2005. Simpoint 3.0: Faster and more flexible program phase analysis. Journal of Instruction Level Parallelism 7, 4 (2005), 1–28.

[18]

Intel Corporation. [n. d.]. Intel 64 and IA-32 Arch. Optim. Reference Manual. software.intel.com/content/dam/develop/public/us/en/ documents/64-ia-32-architectures-optimization-manual.pdf.

[19]

Intel Corporation. [n. d.]. Intel 64 and IA-32 Arch. Soft. Dev. Manuals. software.intel.com/content/www/us/en/develop/download/ intel-64-and-ia-32-architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4.html.

[20]

Yasuo Ishii, Mary Inaba, and Kei Hiraki. 2009. Access map pattern matching for data cache prefetch. In Proceedings of the 23rd international conference on Supercomputing. 499–500.

Digital Library

[21]

Daniel A Jiménez and Calvin Lin. 2001. Dynamic branch prediction with perceptrons. In Proc. of the IEEE Intl. Symp. on High Performance Computer Architecture. IEEE, 197–206.

[22]

Stephan Jourdan, Ronny Ronen, Michael Bekerman, Bishara Shomar, and Adi Yoaz. 1998. A novel renaming scheme to exploit value temporal locality through physical register reuse and unification. In Proc. of the Intl. Symp. on Microarchitecture. IEEE, 216–225.

[23]

Kleovoulos Kalaitzidis and André Seznec. 2020. Leveraging Value Equality Prediction for Value Speculation. ACM Transactions on Architecture and Code Optimization 18, 1(2020), 1–20.

Digital Library

[24]

Ilhyun Kim and Mikko H Lipasti. 2004. Understanding scheduling replay schemes. In Proc. of the IEEE Intl. Symp. on High Performance Computer Architecture. IEEE, 198–209.

[25]

Mikko H Lipasti, Brian R Mestan, and Erika Gunadi. 2004. Physical register inlining. In Proc. of the IEEE/ACM Intl. Symp. on Computer Architecture. IEEE, 325–335.

[26]

Mikko H Lipasti, Christopher B Wilkerson, and John Paul Shen. 1996. Value locality and load value prediction. In Proc. of the ACM Tntl Conf. on Architectural Support for Programming Languages and Operating Systems. 138–147.

Digital Library

[27]

Gabriel Loh. 2003. Width prediction for reducing value predictor size and power. In First Value Prediction Workshop, at IEEE/ACM ISCA. Citeseer.

[28]

Jason Lowe-Power, Abdul Mutaal Ahmad, Ayaz Akram, Mohammad Alian, Rico Amslinger, Matteo Andreozzi, Adrià Armejach, Nils Asmussen, Brad Beckmann, Srikant Bharadwaj, Gabe Black, Gedare Bloom, Bobby R. Bruce, Daniel Rodrigues Carvalho, Jeronimo Castrillon, Lizhong Chen, Nicolas Derumigny, Stephan Diestelhorst, Wendy Elsasser, Carlos Escuin, Marjan Fariborz, Amin Farmahini-Farahani, Pouya Fotouhi, Ryan Gambord, Jayneel Gandhi, Dibakar Gope, Thomas Grass, Anthony Gutierrez, Bagus Hanindhito, Andreas Hansson, Swapnil Haria, Austin Harris, Timothy Hayes, Adrian Herrera, Matthew Horsnell, Syed Ali Raza Jafri, Radhika Jagtap, Hanhwi Jang, Reiley Jeyapaul, Timothy M. Jones, Matthias Jung, Subash Kannoth, Hamidreza Khaleghzadeh, Yuetsu Kodama, Tushar Krishna, Tommaso Marinelli, Christian Menard, Andrea Mondelli, Miquel Moreto, Tiago Mück, Omar Naji, Krishnendra Nathella, Hoa Nguyen, Nikos Nikoleris, Lena E. Olson, Marc Orr, Binh Pham, Pablo Prieto, Trivikram Reddy, Alec Roelke, Mahyar Samani, Andreas Sandberg, Javier Setoain, Boris Shingarov, Matthew D. Sinclair, Tuan Ta, Rahul Thakur, Giacomo Travaglini, Michael Upton, Nilay Vaish, Ilias Vougioukas, William Wang, Zhengrong Wang, Norbert Wehn, Christian Weis, David A. Wood, Hongil Yoon, and Éder F. Zulian. 2020. The gem5 Simulator: Version 20.0+. arxiv:cs.AR/2007.03152

[29]

Milo MK Martin, Daniel J Sorin, Harold W Cain, Mark D Hill, and Mikko H Lipasti. 2001. Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing. In Proc. of the ACM/IEEE Intl. Symp. on Microarchitecture. MICRO-34. IEEE, 328–337.

[30]

Tarun Nakra, Rajiv Gupta, and Mary Lou Soffa. 1999. Global context-based value prediction. In Proc. ot the IEEE Intl. Symp. on High-Performance Computer Architecture. IEEE, 4–12.

[31]

Subbarao Palacharla, Norman P Jouppi, and James E Smith. 1997. Complexity-effective superscalar processors. In Proc. of the IEEE/ACM Intl. Symp. on Computer architecture. 206–218.

Digital Library

[32]

Arthur Perais. 2021. A Case for Speculative Strength Reduction. IEEE Computer Architecture Letters 20, 1 (2021), 22–25.

Digital Library

[33]

Arthur Perais and André Seznec. 2014. EOLE: Paving the way for an effective implementation of value prediction. In Proc. of the ACM/IEEE Intl. Symp. on Computer Architecture. IEEE, 481–492.

Digital Library

[34]

Arthur Perais and André Seznec. 2014. Practical data value speculation for future high-end processors. In Proc. of the IEEE Intl. Symp. on High Performance Computer Architecture. IEEE, 428–439.

[35]

Arthur Perais and André Seznec. 2015. BeBoP: A cost effective predictor infrastructure for superscalar value prediction. In Proc. of the IEEE Intl. Symp. on High Performance Computer Architecture. IEEE, 13–25.

[36]

Arthur Perais and André Seznec. 2016. Cost effective physical register sharing. In Proc. of the IEEE Intl. Symp. on High Performance Computer Architecture (HPCA). IEEE, 694–706.

[37]

Vlad Petric, Tingting Sha, and Amir Roth. 2005. Reno: a rename-based instruction optimizer. In Proc. of the Intl. Symp. on Computer Architecture. IEEE, 98–109.

Digital Library

[38]

G Reinman, T Anstin, and B Calder. 1999. A scalable front-end architecture for fast instruction delivery. In Proc. of the ACM/IEEE Intl. Symp. on Computer Architecture. IEEE, 234–245.

[39]

Nicholas Riley and Craig Zilles. 2006. Probabilistic counter updates for predictor hysteresis and stratification. In Proc. of the IEEE Intl. Symp. on High-Performance Computer Architecture, 2006. IEEE, 110–120.

[40]

RISC-V Foundation. [n. d.]. RISC-V Unprivileged Spec. https://github.com/riscv/riscv-isa-manual/releases/latest.

[41]

Elham Safi, Andreas Moshovos, and Andreas Veneris. 2010. Two-stage, pipelined register renaming. IEEE Transactions on Very Large Scale Integration systems 19, 10(2010), 1926–1931.

Digital Library

[42]

Toshinori Sato and Itsujiro Arita. 2000. Table size reduction for data value predictors by exploiting narrow width values. In Proc. of the Intl. Conf. on Supercomputing. 196–205.

Digital Library

[43]

Yiannakis Sazeides and James E Smith. 1997. The predictability of data values. In Proc. of the IEEE/ACM Intl. Symp. on Microarchitecture. IEEE, 248–258.

[44]

André Seznec. 2011. A new case for the tage branch predictor. In Proc. of the IEEE/ACM Intl. Symp. on Microarchitecture. 117–127.

Digital Library

[45]

André Seznec, Stephen Felix, Venkata Krishnan, and Yiannakis Sazeides. 2002. Design tradeoffs for the Alpha EV8 conditional branch predictor. Proc. of the ACM/IEEE Intl. Symp. on Computer Architecture 30, 2(2002), 295–306.

[46]

Rami Sheikh, Harold W Cain, and Raguram Damodaran. 2017. Load value prediction via path-based address prediction: Avoiding mispredictions due to conflicting stores. In Proc. of the IEEE/ACM Intl. Symp. on Microarchitecture. 423–435.

Digital Library

[47]

Rami Sheikh and Derek Hower. 2019. Efficient load value prediction using multiple predictors and filters. In Proc. of the IEEE Intl. Symp. on High Performance Computer Architecture. IEEE, 454–465.

[48]

Niranjan Soundararajan, Saurabh Gupta, Ragavendra Natarajan, Jared Stark, Rahul Pal, Franck Sala, Lihu Rappoport, Adi Yoaz, and Sreenivas Subramoney. 2019. Towards the adoption of local branch predictors in modern out-of-order superscalar processors. In Proc. of the IEEE/ACM Intl. Symp. on Microarchitecture. 519–530.

Digital Library

[49]

Renju Thomas and Manoj Franklin. 2001. Using dataflow based context for accurate value prediction. In Proc. of the IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques. IEEE, 107–117.

[50]

Gary S Tyson and Todd M Austin. 1997. Improving the accuracy and performance of memory communication through renaming. In Proc. of the Intl. Symp. on Microarchitecture. IEEE, 218–227.

[51]

Kai Wang and Manoj Franklin. 1997. Highly accurate data value prediction using hybrid predictors. In Proc. of the IEEE/ACM Intl. Symp. on Microarchitecture. IEEE, 281–290.

[52]

Jun Yang and Rajiv Gupta. 2002. Frequent value locality and its applications. ACM Transactions on Embedded Computing Systems (TECS) 1, 1(2002), 79–105.

Digital Library

[53]

Huiyang Zhou, Jill Flanagan, and Thomas M Conte. 2003. Detecting global stride locality in value streams. In Proc. of the IEEE/ACM Intl. Symp. on Computer architecture. 324–335.

Digital Library

Cited By

Yang LZheng ZHuang LYan RMa SWang YXu W(2024)Cost-Effective Value Predictor for ILP processors through Design Space ExplorationProceedings of the Great Lakes Symposium on VLSI 202410.1145/3649476.3658804(301-304)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3649476.3658804
Bueno NCastro FPinuel LGomez-Perez JCatthoor F(2024)Improving the Representativeness of Simulation Intervals for the Cache Memory SystemIEEE Access10.1109/ACCESS.2024.335064612(5973-5985)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3350646

Recommendations

Exploiting speculative value reuse using value prediction

Data dependencies between instructions greatly impede instruction-level parallelism. Recently two hardware techniques --- Value Prediction and Value Reuse --- have been proposed to overcome the limits imposed by data dependencies. We introduce a new ...
Read More
Leveraging Value Equality Prediction for Value Speculation

Value Prediction (VP) has recently been gaining interest in the research community, since prior work has established practical solutions for its implementation that provide meaningful performance gains. A constant challenge of contemporary context-based ...
Read More
Exploiting speculative value reuse using value prediction
CRPIT '02: Proceedings of the seventh Asia-Pacific conference on Computer systems architecture

Data dependencies between instructions greatly impede instruction-level parallelism. Recently two hardware techniques --- Value Prediction and Value Reuse --- have been proposed to overcome the limits imposed by data dependencies. We introduce a new ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

October 2021

1322 pages

ISBN:9781450385572

DOI:10.1145/3466752

Copyright © 2021 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

MICRO '21

Sponsor:

SIGMICRO

MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture

October 18 - 22, 2021

Virtual Event, Greece

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Sponsor:
sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
525
Total Downloads

Downloads (Last 12 months)113
Downloads (Last 6 weeks)6

Other Metrics

View Author Metrics

Citations

Cited By

Yang LZheng ZHuang LYan RMa SWang YXu W(2024)Cost-Effective Value Predictor for ILP processors through Design Space ExplorationProceedings of the Great Lakes Symposium on VLSI 202410.1145/3649476.3658804(301-304)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3649476.3658804
Bueno NCastro FPinuel LGomez-Perez JCatthoor F(2024)Improving the Representativeness of Simulation Intervals for the Cache Memory SystemIEEE Access10.1109/ACCESS.2024.335064612(5973-5985)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3350646

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents