Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/237090.237173acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article
Free access

Value locality and load value prediction

Published: 01 September 1996 Publication History
  • Get Citation Alerts
  • Abstract

    Since the introduction of virtual memory demand-paging and cache memories, computer systems have been exploiting spatial and temporal locality to reduce the average latency of a memory reference. In this paper, we introduce the notion of value locality, a third facet of locality that is frequently present in real-world programs, and describe how to effectively capture and exploit it in order to perform load value prediction. Temporal and spatial locality are attributes of storage locations, and describe the future likelihood of references to those locations or their close neighbors. In a similar vein, value locality describes the likelihood of the recurrence of a previously-seen value within a storage location. Modern processors already exploit value locality in a very restricted sense through the use of control speculation (i.e. branch prediction), which seeks to predict the future value of a single condition bit based on previously-seen values. Our work extends this to predict entire 32- and 64-bit register values based on previously-seen values. We find that, just as condition bits are fairly predictable on a per-static-branch basis, full register values being loaded from memory are frequently predictable as well. Furthermore, we show that simple microarchitectural enhancements to two modern microprocessor implementations (based on the PowerPC 620 and Alpha 21164) that enable load value prediction can effectively exploit value locality to collapse true dependencies, reduce average memory latency and bandwidth requirements, and provide measurable performance gains.

    References

    [1]
    Todd M. Austin and Gurindar S. Sohi. Zero-cycle loads: Microarchitecture support for reducing load latency. In Proceedings of the 28th Annual A CM/IEEE International Symposium on Microarchitecture, pages 82-92, December 1995.
    [2]
    Walid Abu-Sufah, David J. Kuck, and Duncan H. Lawrie. On the performance enhancement of paging systems through program analysis and transformations. IEEE Transactions on Computers, C-30(5):341-356, May 1981.
    [3]
    A.V. Aho, R. Sethi, and J.D. Ullman. Compilers principles, techniques, and tools. Addison-Wesley, Reading, MA, 1986.
    [4]
    S. G. Abraham, R. A. Sugumar, D. Windheiser, B. R. Ran, and R. Gupta. Predictability of load/store instruction latencies. In Proceedings of the 26th Annual ACM/ IEEE International Symposium on Microarchitecture, December 1993.
    [5]
    Peter Bannon and Jim Keller. Internal architecture of Alpha 21164 microprocessor. COMPCON 95, 1995.
    [6]
    Tien-Fu Chen and Jean-Loup Baer. A performance study of software and hardware data prefetching schemes. In 21st Annual International Symposium on Computer Architecture, pages 223-232, 1994.
    [7]
    David Callahan, Ken Kennedy, and Allan Porterfield. Software prefetching, in Fourth international Conference on Architectural Support for Programming Lan~ guages and Operating Systems, pages zt0-52, Santa Clara, April 1991.
    [8]
    W. Y. Chen, S. A. Mahlke, P. P. Chang, and W.-M. Hwu. Data access microarchitecture for superscalar processors with compiler-assisted data prefetching. In Proceedings of the 24th International Symposium on Microarchitecture, 199 I.
    [9]
    Steve Cart, KathrynS. McKinley, and Chau-Wen Tseng. Compiler optimiza',ions for improving data locality. In Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 252-262, San Jose, October 1994.
    [10]
    Trung A. Diep, Christopher Nelson, and John P. Shen. Performance evaluation of the PowerPC 620 microarchitecture. In Proceedings of the 22nd international Symposium on Computer Architecture, Santa Margherita Ligure, Italy, June 1995.
    [11]
    Trung A. Died and John Paul Shen. VMW: A visualization-based microarchitecture workbench. IEEE Computer, 28(12):57-64, 1995.
    [12]
    Linley Gwennap, Comparing RISC microprocessors. In Proceedings of the Microprocessor Forum, October 1994.
    [13]
    Samuel P. Harbison. A Computer Architecture for the Dynamic Optimization of High-Level Language Programs. PhD thesis, Carnegie Mellon University, September 1980.
    [14]
    Samuel P. Harbison. An architectural alternative to optimizing compilers. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 57-65, March 1982.
    [15]
    N.P. Jouppi. Architectural and organizational tradeoffs in the design of the MulfiTitan CPU. Technical Report TN-8, DEC-wrl, December 19gg.
    [16]
    Norman P, Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In 17th Annual International Symposium on Computer Architecture, pages 364-373, Seattle, May 1990.
    [17]
    David Keppel, Susan j. Eggers, and Robert R. Henry. Evaluating runtime-compiled, value-specific optimizations. Technical report, University of Washington, 1993.
    [18]
    David Kroft. Lockup-free instruction fetch/prefetch cache organization. In 8th Annual International Symposium on Computer Architecture, pages 81-87. IEEE Computer Society Press, 1981.
    [19]
    David Levitan, Thomas Thomas, and Paul Tu. The PowerPC 620 microprocessor: A high performance superscalar RISC processor. COMPCON 95, 1995.
    [20]
    Todd C. Mowry, Monica S. Lam, and Anoop Gupta. Design and evaluation of a compiler algorithm for prefetching. In Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 62-73, 1992.
    [21]
    K. Roland and A. Dollas. Predicting and precluding problems with memory latency. IEEE Micro, 14(4):59- 67, 1994.
    [22]
    Stephen E. Richardson. Caching function results: Faster arithmetic by avoiding unnecessary computation. Technical report, Sun Microsystems Laboratories, 1992.
    [23]
    Amitabh Srivastava and Alan Eustace. ATOM: A system for building customized program analysis tools. In Proceedings of the A CM SIGPLAN '94 Conference on Programming Language Design and Implementation, pages 196-205, 1994.
    [24]
    SIGPLAN. Proceedings of the Symposium on Partial Evaluation and Semantics-Based Program Manipulation, volume 26, Cambridge, MA, September 1991. SIGPLAN Notices.
    [25]
    J.E. Smith. A study of branch prediction techniques. In Proceedings of the 8th Annual Symposium on Computer Architecture, pages 135-147, June 1981.
    [26]
    Alan Jay Smith. Cache memories. Computing Surveys, 14(3):473-530, 1982.
    [27]
    Amitabh Srivastava and David W. Wall. Link-time optimization of address calculation on a 64-bit architecture. SIGPLAN Notices, 29(6):49-60, June 1994. Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation.
    [28]
    Gary Tyson, Matthew Farrens, John Matthews, and Andrew R. Pleszkun. A modified approach to data cache management. In Proceedings of the 28th Annual A CM/IEEE International Symposium on Microarchitecture, pages 93-103, December 1995.
    [29]
    T.Y. Yeh and Y. N. Patt. Two-level adaptive training branch prediction, in Proceedings of the 24th Annual International Symposium on Microarchitecture, pages 51-61, November 1991.

    Cited By

    View all
    • (2024)Cost-Effective Value Predictor for ILP processors through Design Space ExplorationProceedings of the Great Lakes Symposium on VLSI 202410.1145/3649476.3658804(301-304)Online publication date: 12-Jun-2024
    • (2023)A Machine Learning Based Load Value Approximator Guided by the Tightened Value LocalityProceedings of the Great Lakes Symposium on VLSI 202310.1145/3583781.3590207(679-684)Online publication date: 5-Jun-2023
    • (2023)R2D2: Removing ReDunDancy Utilizing Linearity of Address Generation in GPUsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589039(1-14)Online publication date: 17-Jun-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
    October 1996
    290 pages
    ISBN:0897917677
    DOI:10.1145/237090
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 September 1996

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    ASPLOS96
    Sponsor:

    Acceptance Rates

    ASPLOS VII Paper Acceptance Rate 25 of 109 submissions, 23%;
    Overall Acceptance Rate 535 of 2,713 submissions, 20%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)773
    • Downloads (Last 6 weeks)516

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Cost-Effective Value Predictor for ILP processors through Design Space ExplorationProceedings of the Great Lakes Symposium on VLSI 202410.1145/3649476.3658804(301-304)Online publication date: 12-Jun-2024
    • (2023)A Machine Learning Based Load Value Approximator Guided by the Tightened Value LocalityProceedings of the Great Lakes Symposium on VLSI 202310.1145/3583781.3590207(679-684)Online publication date: 5-Jun-2023
    • (2023)R2D2: Removing ReDunDancy Utilizing Linearity of Address Generation in GPUsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589039(1-14)Online publication date: 17-Jun-2023
    • (2023)A Survey of Memory-Centric Energy Efficient Computer ArchitectureIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329759534:10(2657-2670)Online publication date: Oct-2023
    • (2023)SoK: Analysis of Root Causes and Defense Strategies for Attacks on Microarchitectural Optimizations2023 IEEE 8th European Symposium on Security and Privacy (EuroS&P)10.1109/EuroSP57164.2023.00044(631-650)Online publication date: Jul-2023
    • (2022)Automatic Detection of Speculative Execution CombinationsProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security10.1145/3548606.3560555(965-978)Online publication date: 7-Nov-2022
    • (2022)CalipersProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532390(1-14)Online publication date: 28-Jun-2022
    • (2022)Register file prefetchingProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527398(410-423)Online publication date: 18-Jun-2022
    • (2022)A Survey of Performance Tuning Techniques and Tools for Parallel ApplicationsIEEE Access10.1109/ACCESS.2022.314784610(15036-15055)Online publication date: 2022
    • (2021)Speculative taint tracking (STT)Communications of the ACM10.1145/349120164:12(105-112)Online publication date: 19-Nov-2021
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media