Article

An analysis of the effects of miss clustering on the cost of a cache miss

Authors:

Thomas R. Puzak,

Jim MitchellAuthors Info & Claims

CF '07: Proceedings of the 4th international conference on Computing frontiers

Pages 3 - 12

https://doi.org/10.1145/1242531.1242536

Published: 07 May 2007 Publication History

Abstract

In this paper we describe a new technique, called pipeline spectroscopy, and use it to measure the cost of each cache miss. The cost of a miss is displayed (graphed) as a histogram, which represents a precise readout showing a detailed visualization of the cost of each cache miss throughout all levels of the memory hierarchy. We call the graphs 'spectrograms' because they reveal certain signature features of the processor's memory hierarchy, the pipeline, and the miss pattern itself. Next we provide two examples that use spectroscopy to optimize the processor's hardware or application's software. The first example demonstrates how a miss spectrogram can aid software designers in analyzing the performance of an application. The second example uses a miss spectrogram to analyze bus queueing. Our experiments show that performance gains of up to 8% are possible. Detailed analysis of a spectrogram leads to much greater insight in pipeline dynamics, including effects due to miss cluster, miss overlap, prefetching, and miss queueing delays.

References

[1]

A. Glew, "MLP yes! ILP no!," in ASPLOS Wild and Crazy Ideas Session, October 1998.

[2]

V. Pai and S. Adve, "Code Transformations to Improve Memory Parallelism," in 32nd International Symposium on Microarchitecture, November 1999.

Digital Library

[3]

H. Zhou and T. Conte, "Enhancing Memory Level Parallelism via Recovery-Free Value Prediction," in International Conference on Supercomputing, June 2003.

Digital Library

[4]

D. Sorin et al, "Analytic Evaluation of Shared-Memory Systems with ILP Processors," in 25th International Symposium on Computer Architecture, 1998.

Digital Library

[5]

V. Pai, P. Ranganathan and S. Adve, "The Impact of Instruction- Level Parallelism on Multiprocessor Performance and Simulation Methodology," in International Symposium on High Performance Computer Architecture, February 1997.

Digital Library

[6]

P. Ranganathan, K. Gharachorloo, S. Adve and L. Barroso, "Performance of Database Workloads on Shared-Memory Systems with Out-of-Order Processors," in ASPLOS-VIII, 1998.

Digital Library

[7]

Y. Chou, B. Fahs, and S. Abraham, "Microarchitecture Optimizations for Exploiting Memory-Level Parallelism Exploiting Memory-Level Parallelism" in 31st International Symposium on Computer Architecture, 2004.

Digital Library

[8]

Yuan Chou, Lawrence Spracklen, Santosh G. Abraham. "Store Memory-Level Parallelism Optimizations for Commercial Applications," pp. 183--196, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05), 2005.

Digital Library

[9]

Moinuddin Qureshi, Daniel Lynch, Onur Mutlu, Yale Patt, "A Case for MLP-Aware Cache Replacement" in 33rd International Symposium on Computer Architecture, June 2006.

Digital Library

[10]

A. Zahir, V. Hummel, M. Kling, T Yeh, US. Patent 6,353,802, "Apparatus and Method for Cycle Accounting in Microprocessors".

[11]

B. Gaither, R. Smith, US Patent 6,892,173 B1, "Analyzing Effectiveness of a Computer Cache By Estimating a Hit Rate Based on Applying a Subset of Real-time Addresses to a Model of the Cache".

[12]

H. Ravichandran, US Patent 6,341,357 B1, "Apparatus and Method for Processor Performance Monitoring".

[13]

R. Trauben, US Patent 5,594,864, "Method and apparatus for unobtrusively monitoring Processor States and Characterizing Bottlenecks in a Pipeline Processor Executing Grouped Instructions".

[14]

G. Brooks, US Patent 5,845,310 "System and Methods For Performing Cache Latency Diagnostics in Scalable Parallel Processing Architectures Including Calculating CPU Idle Time and Counting Number of Cache Misses.

[15]

W. Flynn, US Patent 6,256,775 B1, "Facilities For Detailed Software Performance Analysis in a Multithreaded Processor".

[16]

F. Levine, B. McCredie, W. Starke, E. Welbon, US Patent 5,862,371, "Method and System for Instruction Trace Reconstruction Utilizing Performance monitor outputs and bus Monitoring".

[17]

F. Levine, B. McCredie, W. Starke, E. Welbon, US Patent 5,894,575 "Method and System for Initial State Determination for Instruction Trace Reconstruction.

[18]

J. Dean, J. E. Hicks, C. A. Waldspurger, W. E. Weihl, and G. Z. Chrysos. ProfileMe: Hardware support for instruction-level profiling on out-of-order processors. In MICRO'97: pages 292--302, 1997.

Digital Library

[19]

Brian A. Fields, Rastislav Bodik, Mark D. Hill, Chris J. Newburn., Interaction cost and shotgun profiling. ACM Transactions on Architecture and Code Optimization, Vol 1, No. 3. Sept 2004.

Digital Library

[20]

Tejas Karkhanis,James E. Smith, A First-Order Superscalar Processor Model. Proceedings of the 31st ISCA. pages 338--349, June 2004.

Digital Library

[21]

A. Hartstein and T. Puzak. The optimum pipeline depth for a microprocessor, 29th International Symposium on Microarchitecture, pages 7--13 May 2002.

Digital Library

[22]

A. Hartstein and T. Puzak. Optimum power/performance pipeline depth. 36th Annual IEEE/ACM International Symposium on Microarchitecture In MICRO, Dec. 2003.

Digital Library

[23]

T. Puzak, P. Emma, A. Hartstein, V. Srinivasan, "When prefetching Improves/Degrades Performance" Conference On Computing Frontiers Proceedings of the 2nd conference on Computing frontiers 2005, Ischia, Italy May 04 - 06, 2005.

Digital Library

[24]

P. Emma, A. Hartstein, T. Puzak, V. Srinivasan, "Exploring the Limits of Prefetching", IBM Journal of Research and Development Volume 49, Issue 1 (January 2005).

Digital Library

[25]

US Patent 5,636,364 Method for enabling concurrent misses in a cache memory.

[26]

US Patent 5,233,702 Cache miss facility with stored sequences for data fetching.

[27]

IBM Technical Disclosure Bulletin, ""A Protocol for Processing Concurrent Misses"", Dec. 1993, vol. 36 No. 12.

[28]

IBM Technical Disclosure Bulletin, vol. ""Design for Improved Cache Performance via Overlapping of Cache Miss Sequences"" vol. 25 No. 1B Apr. 1983 pp. 5962--5966.

[29]

R. Bartoszynski, M Niewiadomska-Bugaj,Probability and Statistical Inference, (Wiley series in probability and statistics) 1996.

Cited By

Hu WWu BXie BChen TMiao L(2010)A Bypass Optimization Method for Network on ChipProceedings of the 2010 10th IEEE International Conference on Computer and Information Technology10.1109/CIT.2010.310(1788-1795)Online publication date: 29-Jun-2010
https://dl.acm.org/doi/10.1109/CIT.2010.310

Index Terms

An analysis of the effects of miss clustering on the cost of a cache miss
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies
2. General and reference
  1. Cross-computing tools and techniques

Recommendations

Cache Replacement Algorithms with Nonuniform Miss Costs

Cache replacement algorithms originally developed in the context of uniprocessors executing one instruction at a time implicitly assume that all cache misses have the same cost. However, in modern systems, some cache misses are more expensive than ...
Miss-Correlation Folding: Encoding Per-Block Miss Correlations in Compressed DRAM for Data Prefetching
IPDPS '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium

Cache misses frequently exhibit repeated streaming behavior, i.e. a sequence of cache misses has a high tendency of being repeated. Correlation-based prefetchers record the missing streams in a history table for accurate prefetching. Saving a large miss ...
Cache miss behavior: is it √2?
CF '06: Proceedings of the 3rd conference on Computing frontiers

It has long been empirically observed that the cache miss rate decreased as a power law of cache size, where the power was approximately -1/2. In this paper, we examine the dependence of the cache miss rate on cache size both theoretically and through ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CF '07: Proceedings of the 4th international conference on Computing frontiers

May 2007

300 pages

ISBN:9781595936837

DOI:10.1145/1242531

General Chairs:
Utpal Banerjee
Intel, USA
,
José Moreira
IBM, USA
,
Program Chairs:
Michel Dubois
University of Southern California, USA
,
Per Stenström
Chalmers University of Technology, SE

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 May 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

CF07

Sponsor:

CF07: Computing Frontiers Conference

May 7 - 9, 2007

Ischia, Italy

Acceptance Rates

Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Sponsor:
sigmicro

22nd ACM International Conference on Computing Frontiers

May 28 - 30, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
298
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hu WWu BXie BChen TMiao L(2010)A Bypass Optimization Method for Network on ChipProceedings of the 2010 10th IEEE International Conference on Computer and Information Technology10.1109/CIT.2010.310(1788-1795)Online publication date: 29-Jun-2010
https://dl.acm.org/doi/10.1109/CIT.2010.310

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents