Article

Free access

Value locality and load value prediction

Authors:

Mikko H. Lipasti,

Christopher B. Wilkerson, and

John Paul ShenAuthors Info & Claims

ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems

September 1996

Pages 138 - 147

https://doi.org/10.1145/237090.237173

Published: 01 September 1996 Publication History

Abstract

Since the introduction of virtual memory demand-paging and cache memories, computer systems have been exploiting spatial and temporal locality to reduce the average latency of a memory reference. In this paper, we introduce the notion of value locality, a third facet of locality that is frequently present in real-world programs, and describe how to effectively capture and exploit it in order to perform load value prediction. Temporal and spatial locality are attributes of storage locations, and describe the future likelihood of references to those locations or their close neighbors. In a similar vein, value locality describes the likelihood of the recurrence of a previously-seen value within a storage location. Modern processors already exploit value locality in a very restricted sense through the use of control speculation (i.e. branch prediction), which seeks to predict the future value of a single condition bit based on previously-seen values. Our work extends this to predict entire 32- and 64-bit register values based on previously-seen values. We find that, just as condition bits are fairly predictable on a per-static-branch basis, full register values being loaded from memory are frequently predictable as well. Furthermore, we show that simple microarchitectural enhancements to two modern microprocessor implementations (based on the PowerPC 620 and Alpha 21164) that enable load value prediction can effectively exploit value locality to collapse true dependencies, reduce average memory latency and bandwidth requirements, and provide measurable performance gains.

References

[1]

Todd M. Austin and Gurindar S. Sohi. Zero-cycle loads: Microarchitecture support for reducing load latency. In Proceedings of the 28th Annual A CM/IEEE International Symposium on Microarchitecture, pages 82-92, December 1995.

Digital Library

[2]

Walid Abu-Sufah, David J. Kuck, and Duncan H. Lawrie. On the performance enhancement of paging systems through program analysis and transformations. IEEE Transactions on Computers, C-30(5):341-356, May 1981.

Digital Library

[3]

A.V. Aho, R. Sethi, and J.D. Ullman. Compilers principles, techniques, and tools. Addison-Wesley, Reading, MA, 1986.

Digital Library

[4]

S. G. Abraham, R. A. Sugumar, D. Windheiser, B. R. Ran, and R. Gupta. Predictability of load/store instruction latencies. In Proceedings of the 26th Annual ACM/ IEEE International Symposium on Microarchitecture, December 1993.

Digital Library

[5]

Peter Bannon and Jim Keller. Internal architecture of Alpha 21164 microprocessor. COMPCON 95, 1995.

Digital Library

[6]

Tien-Fu Chen and Jean-Loup Baer. A performance study of software and hardware data prefetching schemes. In 21st Annual International Symposium on Computer Architecture, pages 223-232, 1994.

Digital Library

[7]

David Callahan, Ken Kennedy, and Allan Porterfield. Software prefetching, in Fourth international Conference on Architectural Support for Programming Lan~ guages and Operating Systems, pages zt0-52, Santa Clara, April 1991.

Digital Library

[8]

W. Y. Chen, S. A. Mahlke, P. P. Chang, and W.-M. Hwu. Data access microarchitecture for superscalar processors with compiler-assisted data prefetching. In Proceedings of the 24th International Symposium on Microarchitecture, 199 I.

Digital Library

[9]

Steve Cart, KathrynS. McKinley, and Chau-Wen Tseng. Compiler optimiza',ions for improving data locality. In Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 252-262, San Jose, October 1994.

Digital Library

[10]

Trung A. Diep, Christopher Nelson, and John P. Shen. Performance evaluation of the PowerPC 620 microarchitecture. In Proceedings of the 22nd international Symposium on Computer Architecture, Santa Margherita Ligure, Italy, June 1995.

Digital Library

[11]

Trung A. Died and John Paul Shen. VMW: A visualization-based microarchitecture workbench. IEEE Computer, 28(12):57-64, 1995.

Digital Library

[12]

Linley Gwennap, Comparing RISC microprocessors. In Proceedings of the Microprocessor Forum, October 1994.

[13]

Samuel P. Harbison. A Computer Architecture for the Dynamic Optimization of High-Level Language Programs. PhD thesis, Carnegie Mellon University, September 1980.

Digital Library

[14]

Samuel P. Harbison. An architectural alternative to optimizing compilers. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 57-65, March 1982.

Digital Library

[15]

N.P. Jouppi. Architectural and organizational tradeoffs in the design of the MulfiTitan CPU. Technical Report TN-8, DEC-wrl, December 19gg.

[16]

Norman P, Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In 17th Annual International Symposium on Computer Architecture, pages 364-373, Seattle, May 1990.

Digital Library

[17]

David Keppel, Susan j. Eggers, and Robert R. Henry. Evaluating runtime-compiled, value-specific optimizations. Technical report, University of Washington, 1993.

[18]

David Kroft. Lockup-free instruction fetch/prefetch cache organization. In 8th Annual International Symposium on Computer Architecture, pages 81-87. IEEE Computer Society Press, 1981.

Digital Library

[19]

David Levitan, Thomas Thomas, and Paul Tu. The PowerPC 620 microprocessor: A high performance superscalar RISC processor. COMPCON 95, 1995.

Digital Library

[20]

Todd C. Mowry, Monica S. Lam, and Anoop Gupta. Design and evaluation of a compiler algorithm for prefetching. In Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 62-73, 1992.

Digital Library

[21]

K. Roland and A. Dollas. Predicting and precluding problems with memory latency. IEEE Micro, 14(4):59- 67, 1994.

Digital Library

[22]

Stephen E. Richardson. Caching function results: Faster arithmetic by avoiding unnecessary computation. Technical report, Sun Microsystems Laboratories, 1992.

Digital Library

[23]

Amitabh Srivastava and Alan Eustace. ATOM: A system for building customized program analysis tools. In Proceedings of the A CM SIGPLAN '94 Conference on Programming Language Design and Implementation, pages 196-205, 1994.

Digital Library

[24]

SIGPLAN. Proceedings of the Symposium on Partial Evaluation and Semantics-Based Program Manipulation, volume 26, Cambridge, MA, September 1991. SIGPLAN Notices.

[25]

J.E. Smith. A study of branch prediction techniques. In Proceedings of the 8th Annual Symposium on Computer Architecture, pages 135-147, June 1981.

Digital Library

[26]

Alan Jay Smith. Cache memories. Computing Surveys, 14(3):473-530, 1982.

Digital Library

[27]

Amitabh Srivastava and David W. Wall. Link-time optimization of address calculation on a 64-bit architecture. SIGPLAN Notices, 29(6):49-60, June 1994. Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation.

Digital Library

[28]

Gary Tyson, Matthew Farrens, John Matthews, and Andrew R. Pleszkun. A modified approach to data cache management. In Proceedings of the 28th Annual A CM/IEEE International Symposium on Microarchitecture, pages 93-103, December 1995.

Digital Library

[29]

T.Y. Yeh and Y. N. Patt. Two-level adaptive training branch prediction, in Proceedings of the 24th Annual International Symposium on Microarchitecture, pages 51-61, November 1991.

Digital Library

Cited By

Yang LZheng ZHuang LYan RMa SWang YXu W(2024)Cost-Effective Value Predictor for ILP processors through Design Space ExplorationProceedings of the Great Lakes Symposium on VLSI 202410.1145/3649476.3658804(301-304)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3649476.3658804
Aoun AMasadeh MTahar SThapliyal HDeMara RPartin-Vaisband IKatkoori S(2023)A Machine Learning Based Load Value Approximator Guided by the Tightened Value LocalityProceedings of the Great Lakes Symposium on VLSI 202310.1145/3583781.3590207(679-684)Online publication date: 5-Jun-2023
https://dl.acm.org/doi/10.1145/3583781.3590207
Ha DOh YRo WSolihin YHeinrich M(2023)R2D2: Removing ReDunDancy Utilizing Linearity of Address Generation in GPUsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589039(1-14)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589039
Show More Cited By

Index Terms

Value locality and load value prediction

Recommendations

Load value prediction via path-based address prediction: avoiding mispredictions due to conflicting stores
MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture

Current flagship processors excel at extracting instruction-level-parallelism (ILP) by forming large instruction windows. Even then, extracting ILP is inherently limited by true data dependencies. Value prediction was proposed to address this ...
Read More
Value locality and load value prediction

Since the introduction of virtual memory demand-paging and cache memories, computer systems have been exploiting spatial and temporal locality to reduce the average latency of a memory reference. In this paper, we introduce the notion of value locality, ...
Read More
Value locality and load value prediction

Since the introduction of virtual memory demand-paging and cache memories, computer systems have been exploiting spatial and temporal locality to reduce the average latency of a memory reference. In this paper, we introduce the notion of value locality, ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems

October 1996

290 pages

ISBN:0897917677

DOI:10.1145/237090

Chairmen:
Bill Dally
Massachusetts Institute of Technology
,
Susan Eggets
Univ. of Washington, Seattle

ACM SIGPLAN Notices Volume 31, Issue 9
Sept. 1996
273 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/248209
Chairmen:
Bill Dally
Massachusetts Institute of Technology
,
Susan Eggers
Univ. of Washington, Seattle
Issue’s Table of Contents
ACM SIGOPS Operating Systems Review Volume 30, Issue 5
Dec. 1996
273 pages
ISSN:0163-5980
DOI:10.1145/248208
Chairmen:
Bill Dally
Massachusetts Institute of Technology
,
Susan Eggers
Univ. of Washington, Seattle
Issue’s Table of Contents

Copyright © 1996 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 1996

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ASPLOS96

Sponsor:

ASPLOS96: 7th Conference on Architectural Support of Programming Languages & Operating Systems

October 1 - 4, 1996

Massachusetts, Cambridge, USA

Acceptance Rates

ASPLOS VII Paper Acceptance Rate 25 of 109 submissions, 23%;

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

497
Total Citations
View Citations
3,113
Total Downloads

Downloads (Last 12 months)773
Downloads (Last 6 weeks)516

Other Metrics

View Author Metrics

Citations

Cited By

Yang LZheng ZHuang LYan RMa SWang YXu W(2024)Cost-Effective Value Predictor for ILP processors through Design Space ExplorationProceedings of the Great Lakes Symposium on VLSI 202410.1145/3649476.3658804(301-304)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3649476.3658804
Aoun AMasadeh MTahar SThapliyal HDeMara RPartin-Vaisband IKatkoori S(2023)A Machine Learning Based Load Value Approximator Guided by the Tightened Value LocalityProceedings of the Great Lakes Symposium on VLSI 202310.1145/3583781.3590207(679-684)Online publication date: 5-Jun-2023
https://dl.acm.org/doi/10.1145/3583781.3590207
Ha DOh YRo WSolihin YHeinrich M(2023)R2D2: Removing ReDunDancy Utilizing Linearity of Address Generation in GPUsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589039(1-14)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589039
Zhang CSun HLi SWang YChen HLiu H(2023)A Survey of Memory-Centric Energy Efficient Computer ArchitectureIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329759534:10(2657-2670)Online publication date: Oct-2023
https://doi.org/10.1109/TPDS.2023.3297595
Holtryd NManivannan MStenström P(2023)SoK: Analysis of Root Causes and Defense Strategies for Attacks on Microarchitectural Optimizations2023 IEEE 8th European Symposium on Security and Privacy (EuroS&P)10.1109/EuroSP57164.2023.00044(631-650)Online publication date: Jul-2023
https://doi.org/10.1109/EuroSP57164.2023.00044
Fabian XGuarnieri MPatrignani MYin HStavrou ACremers CShi E(2022)Automatic Detection of Speculative Execution CombinationsProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security10.1145/3548606.3560555(965-978)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3548606.3560555
Golestani HSen RYoung VGupta GRauchwerger LCameron KNikolopoulos DPnevmatikatos D(2022)CalipersProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532390(1-14)Online publication date: 28-Jun-2022
https://dl.acm.org/doi/10.1145/3524059.3532390
Shukla SBandishte SGaur JSubramoney SSalapura VZahran MChong FTang L(2022)Register file prefetchingProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527398(410-423)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3527398
Mustafa D(2022)A Survey of Performance Tuning Techniques and Tools for Parallel ApplicationsIEEE Access10.1109/ACCESS.2022.314784610(15036-15055)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3147846
Yu JYan MKhyzha AMorrison ATorrellas JFletcher C(2021)Speculative taint tracking (STT)Communications of the ACM10.1145/349120164:12(105-112)Online publication date: 19-Nov-2021
https://dl.acm.org/doi/10.1145/3491201
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents