Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1375527.1375576acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Focused prefetching: performance oriented prefetching based on commit stalls

Published: 07 June 2008 Publication History

Abstract

Loads that miss in L1 or L2 caches and waiting for their data at the head of the ROB cause significant slow down in the form of commit stalls. We identify that most of these commit stalls are caused by a small set of loads, referred to as LIMCOS (Loads Incurring Majority of COmmit Stalls). We propose simple history-based classifiers that track commit stalls suffered by loads to help us identify this small set of loads.
We study an application of these classifiers to prefetching. The classifiers are used to train the prefetcher to focus on the misses suffered by LIMCOS. This, referred to as focused prefetching, results in a 9.8% gain in IPC over naive GHB based delta correlation prefetcher along with a 20.3% reduction in memory traffic for a set of 17 memory-intensive SPEC2000 benchmarks. Another important impact of focused prefetching is a 61% improvement in the accuracy of prefetches. We demonstrate that the proposed classification criterion performs better than other existing criteria like criticality and delinquent loads. Also we show that the criterion of focusing on commit stalls is robust enough across cache levels and can be applied to any prefetcher without any modifications to the prefetcher.

References

[1]
Baer and T. Chen, An effective on-chip preloading scheme to reduce data access penalty.In Proc. of Supercomputing'91, 1991.
[2]
Basu, N. Kirman, M. Kirman, M. Chaudhuri, J.F. Martinez, Scavenger: A New Last Level Cache Architecture With Global Block Priority.In Proc. of Int. Symp. on Microarchitecture, 2007.
[3]
D. Collins, H. Wang, D.M. Tullsen, C. Hughes, Y-F. Lee, D. Lavery and J.P. Shen, Speculative Precomputation: Long-range Prefetching of Delinquent Loads.In Proc. of Int. Symp. Computer Architecture-28, 2001.
[4]
Cristal, D. Ortega, J. Llosa and M. Valero, Out-of-order commit processors.In Proc. of Int. Symp. on High Performance Computer Architecture, 2004.
[5]
Desikan, D.C. Burger, S.W. Keckler and T. Austin, Sim-alpha: a Validated, Execution-Driven Alpha 21264 Simulator. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-01-23,2001.
[6]
Farkas, P. Chow, N. Jouppi and Z. Vranesic, Memory-system design considerations for dynamically-scheduled processors.In Proc. of Int. Symp. Computer Architecture, 1997.
[7]
Fields, S. Rubin and R. Bodik, Focusing processor policies via critical-path prediction.In Proc. of Int. Symp. Computer Architecture, 2001.
[8]
W.C. Fu and J.H. Patel, Stride directed prefetching in scalar processors.In Proc. of Int. Symp. on Microarchitecture, 1992.
[9]
Joseph and D. Grunwald, Prefetching Using Markov Predictors.In IEEE Trans. on Computer Systems, 1999.
[10]
P. Jouppi, Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers.In Proc. of Int. Symp. Computer Architecture, 1990.
[11]
Kirman, M. Kirman, M. Chaudhuri and J.F. Martinez, Checkpointed Early Load Retirement.In Proc. of Int. Symp. on High Performance Computer Architecture, 2005.
[12]
Kroft, Lockup-free instruction fetch/prefetch cache organization.In Proc. of Int. Symp. Computer Architecture, 1981.
[13]
F. Lin, S.K. Reinhardt, D. Burger and T.R. Puzak, Filtering superfluous prefetches using density vectors.In Proc. of Int. Conf. on Computer Design, 2001.
[14]
Mutlu and T. Moscibroda, Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors.In Proc. of Int. Symp. on Microarchitecture, 2007.
[15]
J. Nesbit, A.S. Dhodapkar and J.E. Smith, AC/DC: An adaptive data cache prefetcher.In In Proc. of Int. Conf. on Parallel Architectures and Compilation Techniques, 2004.
[16]
J. Nesbit and J.E. Smith, Data Cache Prefetching Using a Global History Buffer.In Proc. of Int. Symp. on High Performance Computer Architecture, 2004.
[17]
K. Qureshi, D.N. Lynch, O. Mutlu, Y.N. Patt, A Case for MLP-Aware Cache Replacement.In Proc. of Int. Symp. Computer Architecture, 2006.
[18]
Sherwood, E. Perelman, G. Hamerly and B. Calder, Automatically Characterizing Large Scale Program Behaviour.In Proc. of Int. Conf. on Architectural Support for Programming Languages and Operating Systems, 2002.
[19]
Sherwood, S. Sair and B. Calder, Predictor-Directed Stream Buffers.In Proc. of Int. Symp. on Microarchitecture, 2000.
[20]
Srinath, O. Mutlu, H. Kim, Y.N. Patt, Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers.In Proc. of Int. Symp. on High Performance Computer Architecture, 2007.
[21]
T. Srinivasan, R.D-C. Ju, A.R. Lebeck, C.R. Wilkerson, Locality vs. Criticality.In In Proc. of Int. Symp. Computer Architecture, 2001.
[22]
Srinivasan, G.S. Tyson and E.S. Davidson, A static filter for reducing prefetch traffic.CSE-TR-400-99, University of Michigan Technical Report, 1999.
[23]
Wang, D. Burger, K. McKinley, S. Reinhardt and C. Weems, Guided Region Prefetching: A Cooperative Hardware/Software Approach.In Proc. of Int. Symp. Computer Architecture, 2003.
[24]
Zhuang and H.H.S. Lee, A hardware-based cache pollution filtering mechanism for aggressive prefetches.In Proc. of Int. Conf. on Parallel Processing, 2003.

Cited By

View all
  • (2023)CLIP: Load Criticality based Data Prefetching for Bandwidth-constrained Many-core SystemsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614245(714-727)Online publication date: 28-Oct-2023
  • (2017)Using Criticality of GPU Accesses in Memory Management for CPU-GPU Heterogeneous Multi-Core ProcessorsACM Transactions on Embedded Computing Systems10.1145/312654016:5s(1-23)Online publication date: 27-Sep-2017
  • (2016)A Survey of Recent Prefetching Techniques for Processor CachesACM Computing Surveys10.1145/290707149:2(1-35)Online publication date: 2-Aug-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '08: Proceedings of the 22nd annual international conference on Supercomputing
June 2008
390 pages
ISBN:9781605581583
DOI:10.1145/1375527
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. commit stalls
  2. prefetch

Qualifiers

  • Research-article

Conference

ICS08
Sponsor:
ICS08: International Conference on Supercomputing
June 7 - 12, 2008
Island of Kos, Greece

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)2
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)CLIP: Load Criticality based Data Prefetching for Bandwidth-constrained Many-core SystemsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614245(714-727)Online publication date: 28-Oct-2023
  • (2017)Using Criticality of GPU Accesses in Memory Management for CPU-GPU Heterogeneous Multi-Core ProcessorsACM Transactions on Embedded Computing Systems10.1145/312654016:5s(1-23)Online publication date: 27-Sep-2017
  • (2016)A Survey of Recent Prefetching Techniques for Processor CachesACM Computing Surveys10.1145/290707149:2(1-35)Online publication date: 2-Aug-2016
  • (2011)Multi-Core Cache HierarchiesSynthesis Lectures on Computer Architecture10.2200/S00365ED1V01Y201105CAC0176:3(1-153)Online publication date: 22-May-2011

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media