Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1944862.1944875acmotherconferencesArticle/Chapter ViewAbstractPublication PageshipeacConference Proceedingsconference-collections
research-article

Extended histories: improving regularity and performance in correlation prefetchers

Published: 24 January 2011 Publication History
  • Get Citation Alerts
  • Abstract

    Data Prefetchers identify and make use of any regularity present in the history/training stream to predict future references and prefetch them into the cache. The training information used is typically the primary misses seen at a particular cache level, which is a filtered version of the accesses seen by the cache. In this work we demonstrate that extending the training information to include secondary misses and hits along with primary misses helps improve the performance of prefetchers. In addition to empirical evaluation, we use the information theoretic metric entropy, to quantify the regularity present in extended histories. Entropy measurements indicate that extended histories are more regular than the default primary miss only training stream. Entropy measurements also help corroborate our empirical findings.
    With extended histories, further benefits can be achieved by triggering prefetches during secondary misses also. In this paper we explore the design space of extended prefetch histories and alternative prefetch trigger points for delta correlation prefetchers. We observe that different prefetch schemes benefit to a different extent with extended histories and alternative trigger points. Also the best performing design point varies on a per-benchmark basis. To meet these requirements, we propose a simple adaptive scheme that identifies the best performing design point for a benchmark-prefetcher combination at runtime.
    In SPEC2000 benchmarks, using all the L2 accesses as history for prefetcher improves the performance in terms of both IPC and misses reduced over techniques that use only primary misses as history. The adaptive scheme improves the performance of CZone prefetcher over Baseline by 4.6% on an average. These performance gains are accompanied by a moderate reduction in the memory traffic requirements.

    References

    [1]
    A. Basu, N. Kirman, M. Kirman, M. Chaudhuri, J. F. Martinez, Scavenger: A New Last Level Cache Architecture With Global Block Priority. In Proc. of Int. Symp. on Microarchitecture-40, MICRO 2007.
    [2]
    J. Baer and T. Chen, An effective on-chip preloading scheme to reduce data access penalty. In Proc. of Supercomputing'91, 1991.
    [3]
    B. Bloom, Space/Time Trade-offs in Hash Coding with Allowable Errors, In Communications of the ACM, July 1970.
    [4]
    R. Desikan, D. C. Burger, S. W. Keckler and T. Austin, Sim-alpha: a Validated, Execution-Driven Alpha 21264 Simulator. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-01-23, 2001.
    [5]
    M. Dimitrov and H. Zhou, Combining Local and Global History for High Performance Data Prefetching. In 1st JILP Data Prefetching Championship, DPC-1.
    [6]
    J. W. C. Fu and J. H. Patel, Stride directed prefetching in scalar processors. In proceeding of Int. Symp. on Microarchitecture-25, 1992.
    [7]
    Y. Ishii, M. Inaba and K. Hiraki, Access Map Pattern Matching Prefetch: Optimization Friendly Method. In 1st JILP Data Prefetching Championship, DPC-1.
    [8]
    D. Joseph and D. Grunwald, Prefetching Using Markov Predictors. In IEEE Transactions on Computer Systems, 1999.
    [9]
    N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proc. of Intl. Symp. on Computer Architecture, ISCA 1990.
    [10]
    W. F. Lin, S. K. Reinhardt, D. Burger and T. R. Puzak, Filtering superfluous prefetches using density vectors. In Proc. of ICCD, 2001.
    [11]
    K. J. Nesbit, A. S. Dhodapkar and J. E. Smith, AC/DC: An adaptive data cache prefetcher. In Proc. of PACT, 2004.
    [12]
    K. J. Nesbit and J. E. Smith, Data Cache Prefetching Using a Global History Buffer. In Proc. of Int. Symp. on High Performance Computer Architecture-10, 2004.
    [13]
    M. K. Qureshi, D. N. Lynch, O. Mutlu, Y. N. Patt, A Case for MLP-Aware Cache Replacement. In Proc. of Int. Symp. Computer Architecture-33, 2006.
    [14]
    B. M. Rogers, A. Krisha, G. B. Bell, K. Vu, X. Jiang and Y. Solihin, Scaling the bandwidth wall: challenges in and avenues for CMP scaling. In Proc. of Int. Symp. Computer Architecture, ISCA 2009.
    [15]
    C. E. Shannon, A Mathematical Theory of Communication, Bell System Technical Journal, vol. 27, pp. 379--423, 623--656, July, October, 1948
    [16]
    T. Sherwood, E. Perelman, G. Hamerly and B. Calder, Automatically Characterizing Large Scale Program Behaviour. In Proc. of ASPLOS-X, 2002.
    [17]
    T. Sherwood, S. Sair and B. Calder, Predictor-Directed Stream Buffers. In Proc. of Int. Symp. on Microarchitecture-33, 2000.
    [18]
    S. Srinath, O. Mutlu, H. Kim, Y. N. Patt, Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers. In Proc. of Int. Symp. on High Performance Computer Architecture-13, 2007.
    [19]
    V. Srinivasan, G. S. Tyson and E. S. Davidson, A static filter for reducing prefetch traffic. CSE-TR-400-99, University of Michigan Technical Report, 1999.
    [20]
    Z. Wang, D. Burger, K. McKinley, S. Reinhardt and C. Weems, Guided Region Prefetching: A Cooperative Hardware/Software Approach. In Proc. of Int. Symp. Computer Architecture-30, 2003.
    [21]
    X. Zhuang and H. H. S. Lee, A hardware based cache pollution filtering mechanism for aggressive prefetches. In Proc. of ICCP-32, 2003.

    Cited By

    View all
    • (2022)CRISP: critical slice prefetchingProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507745(300-313)Online publication date: 28-Feb-2022
    • (2022)Register file prefetchingProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527398(410-423)Online publication date: 18-Jun-2022
    • (2017)WA-Dataspaces: Exploring the Data Staging Abstractions for Wide-Area Distributed Scientific Workflows2017 46th International Conference on Parallel Processing (ICPP)10.1109/ICPP.2017.34(251-260)Online publication date: Aug-2017
    • Show More Cited By

    Index Terms

    1. Extended histories: improving regularity and performance in correlation prefetchers

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
      January 2011
      226 pages
      ISBN:9781450302418
      DOI:10.1145/1944862
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      • HiPEAC: HiPEAC Network of Excellence

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 24 January 2011

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Research-article

      Conference

      HIPEAC '11
      Sponsor:
      • HiPEAC

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)1

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)CRISP: critical slice prefetchingProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507745(300-313)Online publication date: 28-Feb-2022
      • (2022)Register file prefetchingProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527398(410-423)Online publication date: 18-Jun-2022
      • (2017)WA-Dataspaces: Exploring the Data Staging Abstractions for Wide-Area Distributed Scientific Workflows2017 46th International Conference on Parallel Processing (ICPP)10.1109/ICPP.2017.34(251-260)Online publication date: Aug-2017
      • (2016)A Survey of Recent Prefetching Techniques for Processor CachesACM Computing Surveys10.1145/290707149:2(1-35)Online publication date: 2-Aug-2016
      • (2012)Atomic StreamingIEEE Computer Architecture Letters10.1109/L-CA.2011.2111:1(5-8)Online publication date: 1-Jan-2012

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media