Article

Using Interaction Costs for Microarchitectural Bottleneck Analysis

Authors:

Brian A. Fields,

Rastislav Bodík,

Chris J. NewburnAuthors Info & Claims

MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture

Page 228

Published: 03 December 2003 Publication History

Abstract

Attacking bottlenecks in modern processors is difficultbecause many microarchitectural events overlap witheach other. This parallelism makes it difficult to both(a) assign a cost to an event (e.g., to one of two overlappingcache misses) and (b) assign blame for each cycle(e.g., for a cycle where many, overlapping resources areactive). This paper introduces a new model for understandingevent costs to facilitate processor design andoptimization.First, we observe that everything in a machine (instructions,hardware structures, events) can interact inonly one of two ways (in parallel or serially). Wequantify these interactions by defining interaction cost,which can be zero (independent, no interaction), positive(parallel), or negative (serial).Second, we illustrate the value of using interactioncosts in processor design and optimization.Finally, we propose performance-monitoring hardwarefor measuring interaction costs that is suitable formodern processors.

References

[1]

{1} J. M. Anderson, L. M. Berc, J. Dean, S. Ghemawat, M. R. Henzinger, S. A. Leung, R. L. Sites, M. T. Vandevoorde, C. A. Waldspurger, and W. E. Weihl. Continuous profiling: Where have all the cycles gone? ACM Transactions on Computer Systems , Nov. 1997.

Digital Library

[2]

{2} E. Borch, E. Tune, B. Manne, and J. Emer. Loose loops sink chips. In 8th International Symposium on High-Performance Computer Architecture, Feb. 2002.

Digital Library

[3]

{3} E. L. Boyd and E. S. Davidson. Hierarchical performance modeling with MACS: A case study of the Convex C-240. In 20th International Symposium on Computer Architecture, May. 1993.

Digital Library

[4]

{4} D. C. Burger and T. M. Austin. The simplescalar tool set, version 2.0. Technical Report CS-TR-1997-1342, University of Wisconsin, Madison, Jun. 1997.

Digital Library

[5]

{5} B. Calder, G. Reinman, and D. Tullsen. Selective value prediction. In 26th International Symposium on Computer Architecture , May. 1999.

Digital Library

[6]

{6} J. Casmira and D. Grunwald. Dynamic instruction scheduling slack. In Kool Chips Workshop in conjunction with MICRO 33, Dec. 2000.

[7]

{7} Intel Corporation. Intel Itanium 2 processor reference manual for software development and optimization. Apr. 2003.

[8]

{8} Intel Corporation. Intel Pentium 4 processor manual. In {http://developer.intel.com/design/pentium4/manuals/}, 2003.

[9]

{9} J. Dean, J. E. Hicks, C. A. Waldspurger, W. E. Weihl, and G. Chrysos. ProfileMe: Hardware support for instruction-level profiling on out-of-order processors. In 30th International Symposium on Microarchitecture, Dec. 1997.

Digital Library

[10]

{10} B. Fahs, S. Bose, M. Crum, B. Slechta, F. Spadini, T. Tung, S. J. Patel, and S. S. Lumetta. Performance characterization of a hardware mechanism for dynamic optimization. In 34th International Symposium on Microarchitecture, Dec. 2001.

Digital Library

[11]

{11} B. Fields, R. Bodík, and M. D. Hill. Slack: Maximizing performance under technological constraints. In 29th International Symposium on Computer Architecture, May. 2002.

Digital Library

[12]

{12} B. Fields, S. Rubin, and R. Bodík. Focusing processor policies via critical-path prediction. In 28th International Symposium on Computer Architecture, Jun. 2001.

Digital Library

[13]

{13} B. R. Fisk and R. I. Bahar. The non-critical buffer: Using load latency tolerance to improve data cache efficiency. Oct. 1999.

Digital Library

[14]

{14} R. D. Fleischmann et al. Whole-genome random sequencing and assembly of haemophilus-influenzae. Science, 269:496- 512, 1995.

[15]

{15} A. Hartstein and T. R. Puzak. The optimum pipeline depth for a microprocessor. In 29th International Symposium on Computer Architecture, 2002.

Digital Library

[16]

{16} J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, Los Altos, CA, 3rd edition, 2002.

Digital Library

[17]

{17} M. S. Hrishikesh, N. P. Jouppi, K. I. Farkas, D. Burger, S. W. Keckler, and P. Shivakumar. The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays. In 29th International Symposium on Computer Architecture, 2002.

Digital Library

[18]

{18} Raj Jain. The Art of Computer Systems Performance Analysis. Wiley Professional Computing, 1991.

[19]

{19} M. H. Lipasti and J. P. Shen. Exceeding the dataflow limit via value prediction. In 29th International Symposium on Microarchitecture , Dec. 1996.

Digital Library

[20]

{20} V. S. Pai, P. Ranganathan, and S. V. Adve. The impact of instruction-level parallelism on multiprocessor performance and simulation methodology. In 3rd International Symposium on High Performance Computer Architecture, Feb. 1997.

Digital Library

[21]

{21} S. Patel, M. Evers, and Y. Patt. Improving trace cache effectiveness with branch promotion and trace packing. In 25th International Symposium on Computer Architecture, Jun. 1998.

Digital Library

[22]

{22} R. Rajwar and J. R. Goodman. Speculative lock elision: Enabling highly concurrent multithreaded execution. In 34th International Symposium on Microarchitecture, December 2001.

Digital Library

[23]

{23} R. Rakvic, B. Black, D. Limaye, and J. P. Shen. Nonvital loads. In 8th International Symposium on High-Performance Computer Architecture, Feb. 2002.

Digital Library

[24]

{24} P. Ranganathan, K. Gharachorloo, S. V. Adve, and L. A. Barroso. Performance of database workloads on shared-memory systems with out-of-order processors. Oct. 1998.

[25]

{25} M. Rosenblum, E. Bugnion, S. A. Herrod, E. Witchel, and A. Gupta. The impact of architectural trends on operating system performance. In 15th Symposium on Operating Systems Principles, Dec. 1995.

Digital Library

[26]

{26} R. Sasanka, C. J. Hughes, and S. V. Adve. Joint local and global hardware adaptations for energy. In 10th International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 2002.

Digital Library

[27]

{27} G. Semeraro, G. Magklis, R. Balasubramonian, D. H. Albonesi, S. Dwarkadas, and M. L. Scott. Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling. In 8th International Symposium on High-Performance Computer Architecture, Feb. 2002.

Digital Library

[28]

{28} J. S. Seng, E. S. Tune, and D. M. Tullsen. Reducing power with dynamic critical path information. In 34th International Symposium on Microarchitecture, Dec. 2001.

Digital Library

[29]

{29} J. S. Seng, E. S. Tune, and D. M. Tullsen. Reducing power with dynamic critical path information. In 34th International Symposium on Microarchitecture, Dec. 2001.

Digital Library

[30]

{30} Avinash Sodani and Gurindar S. Sohi. Dynamic instruction reuse. In 24th International Symposium on Computer Architecture , 1997.

Digital Library

[31]

{31} E. Sprangle and D. Carmean. Increasing processor performance by implementing deeper pipelines. In 29th International Symposium on Computer Architecture, 2002.

Digital Library

[32]

{32} B. Sprunt. Pentium 4 performance-monitoring features. IEEE Micro, Jul. 2002.

Digital Library

[33]

{33} S. T. Srinivasan, R. Dz ching Ju, A. R. Lebeck, and C. Wilkerson. Locality vs. criticality. In 28th International Symposium on Computer Architecture, Jun. 2001.

Digital Library

[34]

{34} S. T. Srinivasan and A. R. Lebeck. Load latency tolerance in dynamically scheduled processors. In 31st International Symposium on Microarchitecture, Nov. 1998.

Digital Library

[35]

{35} J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry. A scalable approach to thread-level speculation. In 27th International Symposium on Computer Architecture , Jun. 2000.

Digital Library

[36]

{36} E. Tune, D. Liang, D. M. Tullsen, and B. Calder. Dynamic prediction of critical path instructions. In 7th International Symposium on High-Performance Computer Architecture, Jan. 2001.

Digital Library

[37]

{37} E. Tune, D. Tullsen, and B. Calder. Quantifying instruction criticality. In 11th International Conference on Parallel Architectures and Compilation Techniques, Sep. 2002.

Digital Library

[38]

{38} J. J. Yi, D. J. Lilja, and D. M. Hawkins. A statistically rigorous approach for improving simulation methodology. In 9th International Symposium on High Performance Computer Architecture , Feb. 2003.

Digital Library

[39]

{39} M. Zagha, B. Larson, S. Turner, and M. Itzkowitz. Performance analysis using the MIPS R10000 performance counters. In Supercomputing '96, 1996.

Digital Library

Cited By

Pandey SYazdanbakhsh ALiu H(2024)TAO: Re-Thinking DL-based Microarchitecture SimulationProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36560128:2(1-25)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3656012
Dave SNowatzki TShrivastava AAamodt TSwift MJerger N(2023)Explainable-DSE: An Agile and Explainable Exploration of Efficient HW/SW Codesigns of Deep Learning Accelerators Using Bottleneck AnalysisProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624772(87-107)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3623278.3624772
Nori AGaur JRai SSubramoney SWang H(2018)Criticality aware tiered cache hierarchyProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00019(96-109)Online publication date: 2-Jun-2018
https://dl.acm.org/doi/10.1109/ISCA.2018.00019
Show More Cited By

Index Terms

Using Interaction Costs for Microarchitectural Bottleneck Analysis

Recommendations

Microarchitectural support for precomputation microthreads
MICRO 35: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture

Research has shown that precomputation microthreads can be useful for improving branch prediction and prefetching. However, it is not obvious how to provide the necessary microarchitectural support, and few details have been given in the literature. By ...
InSpectre: Breaking and Fixing Microarchitectural Vulnerabilities by Formal Analysis
CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security

The recent Spectre attacks have demonstrated the fundamental insecurity of current computer microarchitecture. The attacks use features like pipelining, out-of-order and speculation to extract arbitrary information about the memory contents of a ...
Microarchitectural techniques to enable efficient java execution

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture

December 2003

412 pages

ISBN:076952043X

Copyright © Copyright (c) 2003 Institute of Electrical and Electronics Engineers, Inc. All rights reserved.

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

IEEE Computer Society

United States

Publication History

Published: 03 December 2003

Check for updates

Qualifiers

Article

Conference

MICRO-36

Sponsor:

SIGMICRO

MICRO-36: The 36th Annual International Symposium on Microarchitecture

December 3 - 5, 2003

Acceptance Rates

MICRO 36 Paper Acceptance Rate 35 of 134 submissions, 26%;

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Sponsor:
sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
442
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pandey SYazdanbakhsh ALiu H(2024)TAO: Re-Thinking DL-based Microarchitecture SimulationProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36560128:2(1-25)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3656012
Dave SNowatzki TShrivastava AAamodt TSwift MJerger N(2023)Explainable-DSE: An Agile and Explainable Exploration of Efficient HW/SW Codesigns of Deep Learning Accelerators Using Bottleneck AnalysisProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624772(87-107)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3623278.3624772
Nori AGaur JRai SSubramoney SWang H(2018)Criticality aware tiered cache hierarchyProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00019(96-109)Online publication date: 2-Jun-2018
https://dl.acm.org/doi/10.1109/ISCA.2018.00019
Nowatzki TGangadhar VSankaralingam K(2015)Exploring the potential of heterogeneous von neumann/dataflow execution modelsACM SIGARCH Computer Architecture News10.1145/2872887.275038043:3S(298-310)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2872887.2750380
Clapp RDimitrov MKumar KViswanathan VWillhalm T(2015)A Simple Model to Quantify the Impact of Memory Latency and Bandwidth on PerformanceACM SIGMETRICS Performance Evaluation Review10.1145/2796314.274590043:1(471-472)Online publication date: 15-Jun-2015
https://dl.acm.org/doi/10.1145/2796314.2745900
Nowatzki TGangadhar VSankaralingam KMarr DAlbonesi D(2015)Exploring the potential of heterogeneous von neumann/dataflow execution modelsProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750380(298-310)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2749469.2750380
Clapp RDimitrov MKumar KViswanathan VWillhalm TLin BXu JSengupta SShah D(2015)A Simple Model to Quantify the Impact of Memory Latency and Bandwidth on PerformanceProceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems10.1145/2745844.2745900(471-472)Online publication date: 15-Jun-2015
https://dl.acm.org/doi/10.1145/2745844.2745900
Fang ZMehta SYew PZhai AGreensky JBeeraka GZang B(2015)Measuring Microarchitectural Details of Multi- and Many-Core Memory Systems through MicrobenchmarkingACM Transactions on Architecture and Code Optimization10.1145/268735611:4(1-26)Online publication date: 9-Jan-2015
https://dl.acm.org/doi/10.1145/2687356
Rangasamy ASrikant YCascaval CTrancoso PPrasanna V(2011)Evaluation of dynamic voltage and frequency scaling for stream programsProceedings of the 8th ACM International Conference on Computing Frontiers10.1145/2016604.2016654(1-10)Online publication date: 3-May-2011
https://dl.acm.org/doi/10.1145/2016604.2016654
Saidi ABinkert NReinhardt SMudge T(2009)End-to-end performance forecastingACM SIGARCH Computer Architecture News10.1145/1555815.155580037:3(361-370)Online publication date: 20-Jun-2009
https://dl.acm.org/doi/10.1145/1555815.1555800
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents