Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/377792.377850acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

Load and store reuse using register file contents

Published: 17 June 2001 Publication History

Abstract

The detection of opportunities for value reuse optimizations in memory operations require both the addresses and values associated with these operations to be available. Although the values are typically available in the physical register file, their presence cannot be exploited because no correspondence between the values and addressess is maintained. In this paper we propose the explicit management of the physical register file contents as a level in the memory hierarchy by supporting the Value Address Association Structure (VAAS). The entries in VAAS have a one-to-one correspondence with entries in the physical register file. For each value in the register file that is involved in a load or store operation, the associated information, including the memory address, are stored in the corresponding VAAS entry. Several optimization tasks can be performed using the combination of physical registers and VAAS.
Specifically VAAS enables unified implementation of the following optimization tasks: (i) Store-to-load forwarding is performed without explicitly saving the stored values; (ii) Load-to-load forwarding is performed without saving loaded values in a reuse buffer; (iii) Silent stores are eliminated without saving or loading the prior value stored to the same addresses; (iv) Switching of bits in L1 cache is minimized without saving additional history; and (v) False memory access order violations are avoided without holding speculatively loaded values in the speculated loads table.
Our experiments demonstrate that our implementation of non-speculative optimizations is highly effective as it eliminates memory references due to 60% (58%) of loads in SPECint95 (SPECfp95) and 25% (22.6%) of stores in SPECint95 (SPECfp95). On an average over 45% of cache references are eliminated due to non-speculative reuse. On an average the L1 switching activity was reduced by 7.75%.

References

[1]
Rastislav Bodik, Rajiv Gupta, and Mary Lou Soffa, "Load-reuse analysis: design and evaluation," ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 64-76, Atlanta, Georgia, May 1999.
[2]
George Z. Chrysos and Joel S. Emer. Memory dependence prediction using store sets. ACM 25th International Symposium on Computer Architecture (ISCA), pages 142-153, Barcelona, Spain, June 1998.
[3]
Hank Dietz and Chi-Hung Chi. A new kind of memory for referencing arrays and pointers. Supercomputing'88, pages 360-367, Orlando, Florida, November 1988.
[4]
Matthew Postiff, David Greene and Trevor Mudge. The store-load address table and speculative register promotion. IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 235-244, Monterey, California, December 2000.
[5]
Freddy Gabbay and Avi Mendelson. Using value prediction to increase the power of speculative execution hardware. ACM Transactions on Computer Systems, 16(3):234-270, August 1998.
[6]
Stephan Jourdan, Ronny Ronen, Michael Bekerman, Bishara Shomar, and Adi Yoaz. A novel renaming scheme to exploit value temporal locality through physical register reuse and unification. IEEE/ACM 31st Annual International Symposium on Microarchitecture (MICRO), pages 216-225, December 1998.
[7]
Johnson Kin, Munish Gupta, and William H. Mangione-Smith. Filter cache: an energy efficient memory structure. IEEE/ACM 30st Annual International Symposium on Microarchitecture (MICRO), pages 184-193, December 1997.
[8]
Kevin Lepak and Mikko H. Lipasti. On the value locality of store instructions. ACM 27th Annual International Symposium on Computer Architecture (ISCA), pages 182-191, Vancouver, Canada, June 2000.
[9]
Mikko H. Lipasti, Christopher B. Wilkerson, and John Paul Shen. Value locality and load value prediction. ACM 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 138-147, Cambridge, USA, October 1996.
[10]
Srilatha Manne, Artur Klauser, and Dirk Grunwald, "Pipeline gating: speculation control for energy reduction," ACM 25th Annual International Symposium on Computer Architecture (ISCA), pages 132-141, June 1998.
[11]
Teresa Monreal, Antonio Gonzlez, Mateo Valero, Jos Gonzlez, and Victor Vinals. Delaying physical register allocation through virtual-physical registers. IEEE/ACM 32nd Annual International Symposium on Microarchitecture (MICRO), pages 186-192, Haifa, Israel, November 1999.
[12]
Andreas Moshovos and Gurindar S. Sohi. Streamlining inter-operation memory communication via data dependence prediction. IEEE/ACM 30th Annual International Symposium on Microarchitecture (MICRO), pages 235-245, December 1997.
[13]
Andreas Moshovos and Gurindar S. Sohi. Read-after-read memory dependence prediction. IEEE/ACM 31st Annual International Symposium on Microarchitecture (MICRO), pages 177-185, November 1999.
[14]
Andreas I. Moshovos. Memory Dependence Prediction. PhD thesis, University of Wisconsin - Madison, 1998.
[15]
Andreas I. Moshovos, Scott E. Breach, T. N. Vijaykumar, and Gurindar S. Sohi. Dynamic speculation and synchronization of data dependences. ACM 24th International Symposium on Computer Architecture (ISCA), pages 181-193, June 1997.
[16]
Soner . Onder and Rajiv Gupta. Automatic generation of microarchitecture simulators. IEEE International Conference onComputer Languages, pages 80-89, Chicago, May 1998.
[17]
Soner . Onder and Rajiv Gupta. Dynamic memory disambiguation in the presence of out-of-order store issuing. IEEE/ACM 32nd Annual International Symposium on Microarchitecture (MICRO), pages 170-176, November 1999.
[18]
Avinash Sodani and Gurindar S. Sohi. Dynamic instruction reuse. ACM 24th International Symposium on Computer Architecture (ISCA), pages 194-205, 1997.
[19]
Dean M. Tullsen and John S. Seng. Storageless value prediction using prior register values. ACM 26th International Symposium on Computer Architecture (ISCA), pages 270-279, May 1999.
[20]
Jun Yang and Rajiv Gupta. Energy-efficient load and store reuse. ACM/IEEE International Symposium on Low Power Electronics and Design, Huntington, CA, August 2001.
[21]
Jun Yang and Rajiv Gupta. Load redundancy removal through instruction reuse. International Conference on Parallel Processing, pages 61-68, Toronto, Canada, August 2000.
[22]
Youtao Zhang, Jun Yang, and Rajiv Gupta. Frequent value locality and value-centric data cache design. ACM 9th International Conference onArchitectural Support for Programming Languages and Operating Systems (ASPLOS), pages 150-159, Cambridge, MA, November 2000.

Cited By

View all
  • (2021)Early Address PredictionACM Transactions on Architecture and Code Optimization10.1145/345888318:3(1-22)Online publication date: 8-Jun-2021
  • (2016)Value Reuse Potential in ARM Architectures2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD.2016.30(174-181)Online publication date: Oct-2016
  • (2016)Cost effective physical register sharing2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2016.7446105(694-706)Online publication date: Mar-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '01: Proceedings of the 15th international conference on Supercomputing
June 2001
510 pages
ISBN:158113410X
DOI:10.1145/377792
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 June 2001

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ICS01
Sponsor:

Acceptance Rates

ICS '01 Paper Acceptance Rate 45 of 133 submissions, 34%;
Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Early Address PredictionACM Transactions on Architecture and Code Optimization10.1145/345888318:3(1-22)Online publication date: 8-Jun-2021
  • (2016)Value Reuse Potential in ARM Architectures2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD.2016.30(174-181)Online publication date: Oct-2016
  • (2016)Cost effective physical register sharing2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2016.7446105(694-706)Online publication date: Mar-2016
  • (2015)LaZy superscalarACM SIGARCH Computer Architecture News10.1145/2872887.275040943:3S(260-271)Online publication date: 13-Jun-2015
  • (2015)LaZy superscalarProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750409(260-271)Online publication date: 13-Jun-2015
  • (2014)An implementation of Auto-Memoization mechanism on ARM-based superscalar processor2014 International Symposium on System-on-Chip (SoC)10.1109/ISSOC.2014.6972435(1-8)Online publication date: Oct-2014
  • (2011)A unified approach to eliminate memory accesses earlyProceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems10.1145/2038698.2038710(55-64)Online publication date: 9-Oct-2011
  • (2011)Macro Data LoadIEEE Transactions on Computers10.1109/TC.2010.13160:4(526-537)Online publication date: 1-Apr-2011
  • (2008)Zero loadsProceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture10.1145/1509084.1509087(16-23)Online publication date: 26-Oct-2008
  • (2006)Reducing cache traffic and energy with macro data loadProceedings of the 2006 international symposium on Low power electronics and design10.1145/1165573.1165608(147-150)Online publication date: 4-Oct-2006
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media