Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Speculative precomputation: long-range prefetching of delinquent loads

Published: 01 May 2001 Publication History
  • Get Citation Alerts
  • Abstract

    This paper explores Speculative Precomputation, a technique that uses idle thread context in a multithreaded architecture to improve performance of single-threaded applications. It attacks program stalls from data cache misses by pre-computing future memory accesses in available thread contexts, and prefetching these data. This technique is evaluated by simulating the performance of a research processor based on the Itanium™ ISA supporting Simultaneous Multithreading. Two primary forms of Speculative Precomputation are evaluated. If only the non-speculative thread spawns speculative threads, performance gains of up to 30% are achieved when assuming ideal hardware. However, this speedup drops considerably with more realistic hardware assumptions. Permitting speculative threads to directly spawn additional speculative threads reduces the overhead associated with spawning threads and enables significantly more aggressive speculation, overcoming this limitation. Even with realistic costs for spawning threads, speedups as high as 169% are achieved, with an average speedup of 76%.

    References

    [1]
    S.G. Abraham and B. R. Rau. Predicting load latencies using cache profiling. In Hewlett Packard Lab, Technical Report HPL-94-110, Dec. 1994.
    [2]
    J. Bharadwajh. et al. The Intel IA-64 compiler code generator. In IEEE Micro, pages 44-53, Sept. 2000.
    [3]
    M. Carlisle. Olden: Parallelizing programs with dynamic data structures on distributed-memory machines. In PhD Thesis, Princeton University Department of Computer Science, June 1996.
    [4]
    R. Chappell, J. Stark, S. Kim, S. Reinhardt, and Y. Patt. Simultaneous subordinate microthreading (SSMT). In 26th Annual International Symposium on Computer Architecture, pages 186-195, Oct. 1999.
    [5]
    J. Emer. Simultaneous multithreading: Multiplying Alpha's performance. In Microprocessor Forum, Oct. 1999.
    [6]
    M.D. Hill. Aspects of cache memory an instruction buffer performance. In PhD Thesis, Universi O, of California, Berkeley, 1987.
    [7]
    J. Huck, D. Morris, J. Ross, A. Knies, H. Mulder, and R. Zahir. Introducing the IA-64 architecture. In IEEE Micro, pages 12- 23, Sept. 2000.
    [8]
    Intel Corporation. Intel IA-64 architecture software developer's manual.
    [9]
    D. Joseph and D. Grunwald. Prefetching using Markov predictors. In 24th Annual International Symposium on Computer Architecture, June 1997.
    [10]
    Y. Kim, M. Hill, and D. Wood. Implementing stack simulation for highly-associative memories (extended abstract). In ACM Sigmetrics, pages 212-213, May 1991.
    [11]
    R. Krishnaiyer. et al. An advanced optimizer for the IA-64 architecture. In IEEE Micro, pages 60-68, Nov. 2000.
    [12]
    A. Roth, A. Moshovos, and G. Sohi. Dependence based prefetching for linked data structures. In Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1998.
    [13]
    A. Roth and G. Sohi. Speculative data-driven multithreading. In Seventh International S.ymposium on High Performance Computer Architecture, pages 37-48, Jan. 2001.
    [14]
    H. Sharangpani and K. Aurora. Itanium processor microarchitecture. In IEEE Micro, pages 24-43, Sept. 2000.
    [15]
    Y. Song and M. Dubois. Assisted execution. In Tcchnicai Report CENG 98-25, Department of EE-Systems, UniversiO' of Southern Californm, Oct. 1998.
    [16]
    SPEC. SPEC cpu2000 documentation. In http://www.spec.org/osg/cpu2OOO/docs/.
    [17]
    K. Sundaramoorthy, Z. Purser, and E. Rotenberg. Slipstream processors: Improving both performance and fault tolerance. In Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 257-268, Nov. 2000.
    [18]
    D. Tullsen. Simulation and modeling of a simultaneous multitbreaded processor. In 22nd Annual Computer Measurement Group Conference, Dec. 1996.
    [19]
    D. Tullsen, S. Eggers, J. Emer, H. Levy, J. Lo, and R. Stamm. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In 23rd Annual International Symposium on Computer Architecture, pages 191-202, May 1996.
    [20]
    D. M. Tullsen, S. J. Eggers, and H. M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In 22nd Annual International Symposium on Computer Architecture, pages 392-403, June 1995.
    [21]
    R. Uhlig, R. Fishtein, O. Gershon, 1. Hirsh, and H. Wang. SoftSDV: A presilicon software development environment for the IA-64 architecture. In lntel Technology Journal, 4th Quarter 1999.
    [22]
    S. Wallace, B. Calder, and D, M. Tullsen. Threaded multiple path execution. In 25th Annual International Symposium on Computer Architecture, pages 238-249, June 1998.
    [23]
    H. Wang et al. A conjugate flow processor. In Docket No. 884.225US1. Patent Pending, May 2000.
    [24]
    C. Young, N. Gloy, and M. D. Smith. A comparative analysis of schemes for correlated branch prediction. In 22nd Annual International S lvnposium on Computer Architecture, pages 276-286, May 1995.
    [25]
    C. Zilles and G. Sohi. Understanding the backward slices of performance degrading instructions. In 27th Annual International Symposium on Computer Architecture, pages 172-181, June 2000.

    Cited By

    View all
    • (2024)Tyche: An Efficient and General Prefetcher for Indirect Memory AccessesACM Transactions on Architecture and Code Optimization10.1145/364185321:2(1-26)Online publication date: 22-Jan-2024
    • (2022)Trends in Computing and Memory TechnologiesEmerging Computing: From Devices to Systems10.1007/978-981-16-7487-7_1(3-11)Online publication date: 9-Jul-2022
    • (2021)Vector runaheadProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00024(195-208)Online publication date: 14-Jun-2021
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 29, Issue 2
    Special Issue: Proceedings of the 28th annual international symposium on Computer architecture (ISCA '01)
    May 2001
    262 pages
    ISSN:0163-5964
    DOI:10.1145/384285
    Issue’s Table of Contents
    • cover image ACM Conferences
      ISCA '01: Proceedings of the 28th annual international symposium on Computer architecture
      June 2001
      289 pages
      ISBN:0769511627
      DOI:10.1145/379240

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 May 2001
    Published in SIGARCH Volume 29, Issue 2

    Check for updates

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)32
    • Downloads (Last 6 weeks)3

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Tyche: An Efficient and General Prefetcher for Indirect Memory AccessesACM Transactions on Architecture and Code Optimization10.1145/364185321:2(1-26)Online publication date: 22-Jan-2024
    • (2022)Trends in Computing and Memory TechnologiesEmerging Computing: From Devices to Systems10.1007/978-981-16-7487-7_1(3-11)Online publication date: 9-Jul-2022
    • (2021)Vector runaheadProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00024(195-208)Online publication date: 14-Jun-2021
    • (2018)Parallel Precomputation with Input Value Prediction for Model Predictive Control SystemsIEICE Transactions on Information and Systems10.1587/transinf.2018PAP0003E101.D:12(2864-2877)Online publication date: 1-Dec-2018
    • (2015)IMPProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830807(178-190)Online publication date: 5-Dec-2015
    • (2014)Automatic Skeleton-Driven Memory Affinity for Transactional Worklist ApplicationsInternational Journal of Parallel Programming10.1007/s10766-013-0253-x42:2(365-382)Online publication date: 1-Apr-2014
    • (2013)Multithreading ArchitectureSynthesis Lectures on Computer Architecture10.2200/S00458ED1V01Y201212CAC0218:1(1-109)Online publication date: 15-Jan-2013
    • (2010)Optimistic Parallelism Based on Speculative Asynchronous Messages PassingProceedings of the International Symposium on Parallel and Distributed Processing with Applications10.1109/ISPA.2010.43(382-391)Online publication date: 6-Sep-2010
    • (2009)Exploiting Speculative TLP in Recursive Programs by Dynamic Thread PredictionProceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 200910.1007/978-3-642-00722-4_7(78-93)Online publication date: 27-Mar-2009
    • (2008)Prefetching irregular references for software cache on cellProceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization10.1145/1356058.1356079(155-164)Online publication date: 6-Apr-2008
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media