Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/998680.1006729acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

A First-Order Superscalar Processor Model

Published: 02 March 2004 Publication History

Abstract

A proposed performance model for superscalar processorsconsists of 1) a component that models the relationshipbetween instructions issued per cycle and the sizeof the instruction window under ideal conditions, and 2)methods for calculating transient performance penaltiesdue to branch mispredictions, instruction cache misses,and data cache misses.Using trace-derived data dependenceinformation, data and instruction cache miss rates,and branch miss-prediction rates as inputs, the model canarrive at performance estimates for a typical superscalarprocessor that are within 5.8% of detailed simulation onaverage and within 13% in the worst case. The modelalso provides insights into the workings of superscalarprocessors and long-term microarchitecture trends such aspipeline depths and issue widths.

References

[1]
{1} G. Sohi and S. Vajapeyam, "Instruction Issue Logic for High-Performance, Interruptable Pipelined Processors," International Symposium on Computer Architecture , pp. 27-34, 1987.
[2]
{2} P. G. Emma and E. S. Davidson, "Characterization of Branch and Data Dependencies on Programs for Evaluating Pipeline performance," IEEE Transactions on Computers, Vol. 36, pp. 859-875, 1987.
[3]
{3} A. Hartstein and T. R. Puzak, "The Optimum Pipeline Depth for a Microprocessors," International Symposium on Computer Architecture, pp. 7-13, 2002.
[4]
{4} E. Sprangle and D. Carmean, "Increasing Processor Performance by Implementing Deeper Pipelines," International Symposium on Computer Architecture , pp. 25-34, 2002.
[5]
{5} D. B. Noonburg and J. P. Shen, "Theoretical Modeling of Superscalar Processor Performance," International Symposium on Microarchitecture, pp. 52-62, 1994.
[6]
{6} P. Michaud, A. Seznec, and S. Jourdan, "Exploring Instruction-Fetch Bandwidth Requirement in Wide-Issue Superscalar Processors," International Symposium on Parallel Architectures and Compilation Techniques, 1999.
[7]
{7} P. Michaud, A. Seznec, and S. Jourdan, "An Exploration of Instruction Fetch Requirement in Out-Of-Order Superscalar Processors," International Journal of Parallel Programming, vol. 29, 2001.
[8]
{8} S. Nussbaum and J. E. Smith, "Modeling Superscalar Processors via Statistical Simulation," International Symposium on Parallel Architectures and Compilation Techniques, 2001.
[9]
{9} R. Carl and J. E. Smith, "Modeling Superscalar Processors via Statistical Simulation," Workshop on Performance Analysis and Its Impact on Design, 1998.
[10]
{10} L. Eeckhout, K. De Bosschere, and H. Neefs, "Performance Analysis Through Synthetic Trace Generation," International Symposium on Performance Analysis of Systems and Software, 2000.
[11]
{11} D. B. Noonburg and J. P. Shen, "A Framework for Statistical Modeling of Superscalar Processor Performances," International Symposium on High Performance Computer Architecture, pp. 298-309, 1997.
[12]
{12} D. Sorin, V. Pai, S. V. Adve, M. K. Vernon, and D. A. Wood, "Analytic Evaluation of Shared Memory Systems with ILP Processors," International Symposium on Computer Architecture, pp. 380-391, 1998.
[13]
{13} B. A. Fields, R. Bodik, M. D. Hill, and C. J. Newburn, "Using Interaction Costs for Microarchitectural Bottleneck Analysis," International Symposium on Microarchitecture, pp. 228-239, 2003.
[14]
{14} D. J. Ofelt, "Efficient Performance Prediction for Modern Microprocessors," Stanford University PhD Thesis, 1999.
[15]
{15} E. Riseman and C. Foster, "The Inhibition of Potential Parallelism by Conditional Jumps," IEEE Transactions on Computers, vol. C-21, pp. 1405-1411, 1972.
[16]
{16} N. P. Jouppi, "The Nonuniform Distribution of Instruction-Level and Machine Parallelism and Its Effect on Performance," IEEE Transactions on Computers , vol. 38, pp. 1645-1658, 1989.
[17]
{17} S. R. Kunkel and J. E. Smith, "Optimal pipelining in supercomputers," International Symposium on Computer Architecture, pp. 404-411, 1986.
[18]
{18} M. S. Hrishikesh, D. Burger, N. P. Jouppi, S. W. Keckler, K. I. Farkas, and P. Shivakumar, "The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays," International Symposium on Computer Architecture, pp. 14-24, 2002.

Cited By

View all
  • (2019)PROFETProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/3341617.33261493:2(1-33)Online publication date: 19-Jun-2019
  • (2019)Predicting New Workload or CPU Performance by Analyzing Public DatasetsACM Transactions on Architecture and Code Optimization10.1145/328412715:4(1-21)Online publication date: 8-Jan-2019
  • (2019)Sampled Simulation of Task-Based ProgramsIEEE Transactions on Computers10.1109/TC.2018.286001268:2(255-269)Online publication date: 1-Feb-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture
June 2004
373 pages
ISBN:0769521436
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 32, Issue 2
    ISCA 2004
    March 2004
    373 pages
    ISSN:0163-5964
    DOI:10.1145/1028176
    Issue’s Table of Contents

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 02 March 2004

Check for updates

Qualifiers

  • Article

Conference

ISCA04
Sponsor:

Acceptance Rates

ISCA '04 Paper Acceptance Rate 31 of 217 submissions, 14%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)69
  • Downloads (Last 6 weeks)6
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)PROFETProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/3341617.33261493:2(1-33)Online publication date: 19-Jun-2019
  • (2019)Predicting New Workload or CPU Performance by Analyzing Public DatasetsACM Transactions on Architecture and Code Optimization10.1145/328412715:4(1-21)Online publication date: 8-Jan-2019
  • (2019)Sampled Simulation of Task-Based ProgramsIEEE Transactions on Computers10.1109/TC.2018.286001268:2(255-269)Online publication date: 1-Feb-2019
  • (2018)An Analytical Cache Performance Evaluation Framework for Embedded Out-of-Order Processors Using Software CharacteristicsACM Transactions on Embedded Computing Systems10.1145/323318217:4(1-25)Online publication date: 9-Aug-2018
  • (2018)Fast and Accurate Performance Analysis of SynchronizationProceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3178442.3178446(31-40)Online publication date: 24-Feb-2018
  • (2018)TEMProfProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00076(881-893)Online publication date: 20-Oct-2018
  • (2018)RpStacks-MTProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00054(586-599)Online publication date: 20-Oct-2018
  • (2018)Modeling Superscalar Processor Memory-Level ParallelismIEEE Computer Architecture Letters10.1109/LCA.2017.270137017:1(9-12)Online publication date: 1-Jan-2018
  • (2018)CharmProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00023(152-165)Online publication date: 2-Jun-2018
  • (2017)CHARSTARACM SIGARCH Computer Architecture News10.1145/3140659.308021245:2(147-160)Online publication date: 24-Jun-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media