Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1166133.1166138acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmedeaConference Proceedingsconference-collections
Article

Data prefetching in a cache hierarchy with high bandwidth and capacity

Published: 16 September 2006 Publication History

Abstract

In this paper we evaluate four hardware data prefetchers in the context of a high-performance three-level on chip cache hierarchy with high bandwidth and capacity. We consider two classic prefetchers (Sequential Tagged and Stride) and two correlating prefetchers: PC/DC, a recent method with a superior score and low-sized tables, and P-DFCM, a new method. Like PC/DC, P-DFCM focuses on local delta sequences, but it is based on the DFCM value predictor. We explore different prefetch degrees and distances. Running SPEC2000, Olden and IAbench applications, results show that this kind of cache hierarchy turns prefetching aggressiveness into success for the four prefetchers. Sequential Tagged is the best, and deserves further attention to cut it losses in some applications. PC/DC results are matched or even improved by P-DFCM, using far fewer accesses to tables while keeping sizes low.

References

[1]
J. L. Baer and T. F. Chen. "An Effective On-chip Preloading Scheme to Reduce Data Access Penalty". In Int. Conf. on Supercomputing (ICS) pp. 176--186, 1991.
[2]
D. Burger and T. Austin, The SimpleScalar Toolset, v. 3.0. http://www.simplescalar.org.
[3]
J. Collins, S. Sair, B. Calder and D. M. Tullsen. "Pointer Cache Assisted Prefetching". In Procs. 35th Int. Symp. on Microarchitecture (MICRO-35) pp. 62--73, Nov. 2002
[4]
R. Cooksey, S. Jordan, D. Grundwald. "A Stateless, Content-Directed Data Prefetching Mechanism". In Proc. of 10th Int. conf. on Architectural support for programming languages and operating systems (ASPLOS X) pp. 279--290 San José, California, Oct. 2002.
[5]
A. S. Dhodapkar and J. E. Smith. "Managing Multi-Configuration Hardware via Dynamic Working Set Analysis". In Proc. of the 29th Ann. Intl. Symp. on Computer Architecture, (ISCA) pp. 233--245. May 2002.
[6]
P. G. Emma, A. Harstein, T. R. Puzac and V. Srinivasan. "Exploring the limits of prefetching". IBM Journal of Res. and Dev. 49 (1) pp. 127--144, Jan. 2005.
[7]
B. Goeman, H. Vandierendonck and K. De Bosschere. "Differential FCM: Increasing Value Prediction Accuracy by Improving Table Usage Efficiency". In Procs. of the 7th Int. Symp. on High-Performance Computer Architecture (HPCA) pp. 207--218. Monterrey, Mexico 2001.
[8]
D. Gracia, G. Mouchard and O. Temam. "MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms". Proc. of the 37th Int. Symp. on Microarchitecture (MICRO-37), pp.: 43--54. December 2004.
[9]
Z. Hu, M. Martonosi, S. Kaxiras, "TCP Tag Correlating Prefetchers", In Proceedings of the 9th Int. Symposium on High Performance Computer Architecture (HPCA), 2003.
[10]
P. Ibáñez, V. Viñals, J. L. Briz, and M. J. Garzarán. "Characterization and Improvement of Load/Store Cache-based Prefetching". In Proc. of Int. Conf. on Supercomputing (ICS) Melbourne, Australia. pp.369--376 July 1998.
[11]
D. Joseph and D. Grunwald. "Prefetching Using Markov Predictors". IEEE Trans. on Computer Systems, 48(2), pp. 121--133, 1999."
[12]
N. Jouppi. "Improving direct-mapped cache performance by addition of a small fully associative cache and prefetch buffers". In Procs. of the 17th International Symposium on Computer Architecture (ISCA), Seattle, WA, 1990.
[13]
G. B. Kandiraju and A. Sivasubramaniam. "Going the Distance for TLB Prefetching: An Application-driven Study". In Procs. of the 29th Int. Symposium on Computer Architecture (ISCA), May 2002.
[14]
A. Lai, C. Fide and B. Falsafi. Dead-Block Correlating Prefetchers". In Procs. of the 28th Intl. Symp. on Computer Architecture (ISCA) pp. 144--154, 2001
[15]
Mark J. Charney and Anthony P. Reeves. "Generalized correlation-based hardware prefetching". TR EECEG-95-1, School of Electrical Engineering, Cornell University, February 1995.
[16]
K. J. Nesbit and J. E. Smith. "Data Cache Prefetching Using a Global History Buffer". In Procs. of the 10th Annual Int. Symp. on High Performance Computer Architecture (HPCA) pp: 96--105, Madrid, Spain 2004.
[17]
K. J. Nesbit and J. E. Smith. "Data Cache Prefetching Using a Global History Buffer". IEEE Micro 25 (3), pp. 90--97. May/June 2005.
[18]
K. J. Nesbit, A. S. Dhodapkar and J. E. Smith. "AC/DC: An Adaptive Data Cache Prefetcher". In Proc. of the 13th Int. Conf. on Parallel Architecture and Compilation Techniques (PACT) Sept. 2004.
[19]
L. Ramos, P. Ibáñez, V. Viñals and J. M. Llabería. "Modelling Load Address Behaviour Through Recurrences". In Proc. of Int. Symp. on Performance Analysis of Systems and Software (ISPASS), Austin, Texas. pp. 101--108 April, 2000.
[20]
A. Rogers, M. Carlisle, J. Reppy and L. Hendren. "Supporting Dynamic Data Structures on Distributed Memory Machines". ACM Trans. on Programming Languages and Systems, March 1995.
[21]
S. Sair, T. Sherwood and B. Calder. "Quntifying load stream behavior". In Proc 8th. Annual International Symposium on High Performance Computer Architecture (HPCA) 2002.
[22]
Y. Sazeides and J. E. Smith. "Implementations of context based value predictors. TR ECE97--8, Dept. of Electrical and Computer Engineering, Univ. Wiscosin-Madison, Dec. 1997.
[23]
T. Sherwood et al., "Automatically Characterizing Large Scale Program Behaviour," ASPLOS X, Oct. 2002.
[24]
A. J. Smith, "Sequential Program Prefetching in Memory Hierarchies", IEEE Transactions on Computers., 11(12), pp.7--21, Dec. 1978.
[25]
S. P. Vanderwiel and D. J. Lilja.- "Data Prefetch Mechanisms". ACM Computing Surveys 32 (2) June 2000.
[26]
Z. Wang, D. Burger, K. S. McKinley, S. K. Reinhardt and C. C. Weems. "Guided Region Prefetching: A Cooperative Hardware/Software Approach". In Proc. 30th Int. Symp. on Computer Architecture (ISCA) 2003.

Index Terms

  1. Data prefetching in a cache hierarchy with high bandwidth and capacity

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      MEDEA '06: Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
      September 2006
      49 pages
      ISBN:1595935681
      DOI:10.1145/1166133
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 16 September 2006

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      MEDEA '06 Paper Acceptance Rate 6 of 9 submissions, 67%;
      Overall Acceptance Rate 6 of 9 submissions, 67%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 296
        Total Downloads
      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 08 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media