Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/956417.956543acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article

Using Interaction Costs for Microarchitectural Bottleneck Analysis

Published: 03 December 2003 Publication History

Abstract

Attacking bottlenecks in modern processors is difficultbecause many microarchitectural events overlap witheach other. This parallelism makes it difficult to both(a) assign a cost to an event (e.g., to one of two overlappingcache misses) and (b) assign blame for each cycle(e.g., for a cycle where many, overlapping resources areactive). This paper introduces a new model for understandingevent costs to facilitate processor design andoptimization.First, we observe that everything in a machine (instructions,hardware structures, events) can interact inonly one of two ways (in parallel or serially). Wequantify these interactions by defining interaction cost,which can be zero (independent, no interaction), positive(parallel), or negative (serial).Second, we illustrate the value of using interactioncosts in processor design and optimization.Finally, we propose performance-monitoring hardwarefor measuring interaction costs that is suitable formodern processors.

References

[1]
{1} J. M. Anderson, L. M. Berc, J. Dean, S. Ghemawat, M. R. Henzinger, S. A. Leung, R. L. Sites, M. T. Vandevoorde, C. A. Waldspurger, and W. E. Weihl. Continuous profiling: Where have all the cycles gone? ACM Transactions on Computer Systems , Nov. 1997.
[2]
{2} E. Borch, E. Tune, B. Manne, and J. Emer. Loose loops sink chips. In 8th International Symposium on High-Performance Computer Architecture, Feb. 2002.
[3]
{3} E. L. Boyd and E. S. Davidson. Hierarchical performance modeling with MACS: A case study of the Convex C-240. In 20th International Symposium on Computer Architecture, May. 1993.
[4]
{4} D. C. Burger and T. M. Austin. The simplescalar tool set, version 2.0. Technical Report CS-TR-1997-1342, University of Wisconsin, Madison, Jun. 1997.
[5]
{5} B. Calder, G. Reinman, and D. Tullsen. Selective value prediction. In 26th International Symposium on Computer Architecture , May. 1999.
[6]
{6} J. Casmira and D. Grunwald. Dynamic instruction scheduling slack. In Kool Chips Workshop in conjunction with MICRO 33, Dec. 2000.
[7]
{7} Intel Corporation. Intel Itanium 2 processor reference manual for software development and optimization. Apr. 2003.
[8]
{8} Intel Corporation. Intel Pentium 4 processor manual. In {http://developer.intel.com/design/pentium4/manuals/}, 2003.
[9]
{9} J. Dean, J. E. Hicks, C. A. Waldspurger, W. E. Weihl, and G. Chrysos. ProfileMe: Hardware support for instruction-level profiling on out-of-order processors. In 30th International Symposium on Microarchitecture, Dec. 1997.
[10]
{10} B. Fahs, S. Bose, M. Crum, B. Slechta, F. Spadini, T. Tung, S. J. Patel, and S. S. Lumetta. Performance characterization of a hardware mechanism for dynamic optimization. In 34th International Symposium on Microarchitecture, Dec. 2001.
[11]
{11} B. Fields, R. Bodík, and M. D. Hill. Slack: Maximizing performance under technological constraints. In 29th International Symposium on Computer Architecture, May. 2002.
[12]
{12} B. Fields, S. Rubin, and R. Bodík. Focusing processor policies via critical-path prediction. In 28th International Symposium on Computer Architecture, Jun. 2001.
[13]
{13} B. R. Fisk and R. I. Bahar. The non-critical buffer: Using load latency tolerance to improve data cache efficiency. Oct. 1999.
[14]
{14} R. D. Fleischmann et al. Whole-genome random sequencing and assembly of haemophilus-influenzae. Science, 269:496- 512, 1995.
[15]
{15} A. Hartstein and T. R. Puzak. The optimum pipeline depth for a microprocessor. In 29th International Symposium on Computer Architecture, 2002.
[16]
{16} J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, Los Altos, CA, 3rd edition, 2002.
[17]
{17} M. S. Hrishikesh, N. P. Jouppi, K. I. Farkas, D. Burger, S. W. Keckler, and P. Shivakumar. The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays. In 29th International Symposium on Computer Architecture, 2002.
[18]
{18} Raj Jain. The Art of Computer Systems Performance Analysis. Wiley Professional Computing, 1991.
[19]
{19} M. H. Lipasti and J. P. Shen. Exceeding the dataflow limit via value prediction. In 29th International Symposium on Microarchitecture , Dec. 1996.
[20]
{20} V. S. Pai, P. Ranganathan, and S. V. Adve. The impact of instruction-level parallelism on multiprocessor performance and simulation methodology. In 3rd International Symposium on High Performance Computer Architecture, Feb. 1997.
[21]
{21} S. Patel, M. Evers, and Y. Patt. Improving trace cache effectiveness with branch promotion and trace packing. In 25th International Symposium on Computer Architecture, Jun. 1998.
[22]
{22} R. Rajwar and J. R. Goodman. Speculative lock elision: Enabling highly concurrent multithreaded execution. In 34th International Symposium on Microarchitecture, December 2001.
[23]
{23} R. Rakvic, B. Black, D. Limaye, and J. P. Shen. Nonvital loads. In 8th International Symposium on High-Performance Computer Architecture, Feb. 2002.
[24]
{24} P. Ranganathan, K. Gharachorloo, S. V. Adve, and L. A. Barroso. Performance of database workloads on shared-memory systems with out-of-order processors. Oct. 1998.
[25]
{25} M. Rosenblum, E. Bugnion, S. A. Herrod, E. Witchel, and A. Gupta. The impact of architectural trends on operating system performance. In 15th Symposium on Operating Systems Principles, Dec. 1995.
[26]
{26} R. Sasanka, C. J. Hughes, and S. V. Adve. Joint local and global hardware adaptations for energy. In 10th International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 2002.
[27]
{27} G. Semeraro, G. Magklis, R. Balasubramonian, D. H. Albonesi, S. Dwarkadas, and M. L. Scott. Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling. In 8th International Symposium on High-Performance Computer Architecture, Feb. 2002.
[28]
{28} J. S. Seng, E. S. Tune, and D. M. Tullsen. Reducing power with dynamic critical path information. In 34th International Symposium on Microarchitecture, Dec. 2001.
[29]
{29} J. S. Seng, E. S. Tune, and D. M. Tullsen. Reducing power with dynamic critical path information. In 34th International Symposium on Microarchitecture, Dec. 2001.
[30]
{30} Avinash Sodani and Gurindar S. Sohi. Dynamic instruction reuse. In 24th International Symposium on Computer Architecture , 1997.
[31]
{31} E. Sprangle and D. Carmean. Increasing processor performance by implementing deeper pipelines. In 29th International Symposium on Computer Architecture, 2002.
[32]
{32} B. Sprunt. Pentium 4 performance-monitoring features. IEEE Micro, Jul. 2002.
[33]
{33} S. T. Srinivasan, R. Dz ching Ju, A. R. Lebeck, and C. Wilkerson. Locality vs. criticality. In 28th International Symposium on Computer Architecture, Jun. 2001.
[34]
{34} S. T. Srinivasan and A. R. Lebeck. Load latency tolerance in dynamically scheduled processors. In 31st International Symposium on Microarchitecture, Nov. 1998.
[35]
{35} J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry. A scalable approach to thread-level speculation. In 27th International Symposium on Computer Architecture , Jun. 2000.
[36]
{36} E. Tune, D. Liang, D. M. Tullsen, and B. Calder. Dynamic prediction of critical path instructions. In 7th International Symposium on High-Performance Computer Architecture, Jan. 2001.
[37]
{37} E. Tune, D. Tullsen, and B. Calder. Quantifying instruction criticality. In 11th International Conference on Parallel Architectures and Compilation Techniques, Sep. 2002.
[38]
{38} J. J. Yi, D. J. Lilja, and D. M. Hawkins. A statistically rigorous approach for improving simulation methodology. In 9th International Symposium on High Performance Computer Architecture , Feb. 2003.
[39]
{39} M. Zagha, B. Larson, S. Turner, and M. Itzkowitz. Performance analysis using the MIPS R10000 performance counters. In Supercomputing '96, 1996.

Cited By

View all
  • (2024)TAO: Re-Thinking DL-based Microarchitecture SimulationProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36560128:2(1-25)Online publication date: 29-May-2024
  • (2023)Explainable-DSE: An Agile and Explainable Exploration of Efficient HW/SW Codesigns of Deep Learning Accelerators Using Bottleneck AnalysisProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624772(87-107)Online publication date: 25-Mar-2023
  • (2018)Criticality aware tiered cache hierarchyProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00019(96-109)Online publication date: 2-Jun-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
December 2003
412 pages
ISBN:076952043X

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 03 December 2003

Check for updates

Qualifiers

  • Article

Conference

MICRO-36
Sponsor:

Acceptance Rates

MICRO 36 Paper Acceptance Rate 35 of 134 submissions, 26%;
Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)TAO: Re-Thinking DL-based Microarchitecture SimulationProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36560128:2(1-25)Online publication date: 29-May-2024
  • (2023)Explainable-DSE: An Agile and Explainable Exploration of Efficient HW/SW Codesigns of Deep Learning Accelerators Using Bottleneck AnalysisProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624772(87-107)Online publication date: 25-Mar-2023
  • (2018)Criticality aware tiered cache hierarchyProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00019(96-109)Online publication date: 2-Jun-2018
  • (2015)Exploring the potential of heterogeneous von neumann/dataflow execution modelsACM SIGARCH Computer Architecture News10.1145/2872887.275038043:3S(298-310)Online publication date: 13-Jun-2015
  • (2015)A Simple Model to Quantify the Impact of Memory Latency and Bandwidth on PerformanceACM SIGMETRICS Performance Evaluation Review10.1145/2796314.274590043:1(471-472)Online publication date: 15-Jun-2015
  • (2015)Exploring the potential of heterogeneous von neumann/dataflow execution modelsProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750380(298-310)Online publication date: 13-Jun-2015
  • (2015)A Simple Model to Quantify the Impact of Memory Latency and Bandwidth on PerformanceProceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems10.1145/2745844.2745900(471-472)Online publication date: 15-Jun-2015
  • (2015)Measuring Microarchitectural Details of Multi- and Many-Core Memory Systems through MicrobenchmarkingACM Transactions on Architecture and Code Optimization10.1145/268735611:4(1-26)Online publication date: 9-Jan-2015
  • (2011)Evaluation of dynamic voltage and frequency scaling for stream programsProceedings of the 8th ACM International Conference on Computing Frontiers10.1145/2016604.2016654(1-10)Online publication date: 3-May-2011
  • (2009)End-to-end performance forecastingACM SIGARCH Computer Architecture News10.1145/1555815.155580037:3(361-370)Online publication date: 20-Jun-2009
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media