Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/279358.279408acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free access

Analytic evaluation of shared-memory systems with ILP processors

Published: 16 April 1998 Publication History

Abstract

This paper develops and validates an analytical model for evaluating various types of architectural alternatives for shared-memory systems with processors that aggressively exploit instruction-level parallelism. Compared to simulation, the analytical model is many orders of magnitude faster to solve, yielding highly accurate system performance estimates in seconds.The model input parameters characterize the ability of an application to exploit instruction-level parallelism as well as the interaction between the application and the memory system architecture. A trace-driven simulation methodology is developed that allows these parameters to be generated over 100 times faster than with a detailed execution-driven simulator.Finally, this paper shows that the analytical model can be used to gain insights into application performance and to evaluate architectural design trade-offs.

References

[1]
S. Adve, V. Adve, M. Hill, and M. Vernon. Comparison of Hardware and Software Cache Coherence Schemes. In Proc. 18th Int'l Symp. on Computer Architecture, pages 298-308, June 1991.
[2]
V. Adve et al. An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs. In Proceedings of Supercomputing "95, San Diego, CA, Dec. 1995.
[3]
V. Adve and M. Vernon. The influence of Random Delays on Parallel Task Execution Times. In Proc. ACM SIGMET- RICS, pages 61-73, May 1993.
[4]
D. Albonesi and I. Koren. A Mean Value Analysis Multiprocessor Model Incorporating Superscalar Processors and Latency Tolerating Techniques. lnt'l Journal of Parallel Programming, 1996.
[5]
M. Chiang and G. Sohi. Evaluating Design Choices for Shared Bus Multiprocessors. IEEE Trans. on Computers, 41(3):297-317, Mar. 1992.
[6]
M. Durbhakula, V. Pai, and S. Adve. Improving the Speed vs. Accuracy Tradeoff for Simulating Shared-Memory Multiprocessors with ILP Processors. Technical Report 9802, Dept. of Elec. and Comp. Engineering, Rice Univ., Apr. 1998.
[7]
D. Eager. Private communication, Nov. 1997.
[8]
K. Farkas, P. Chow, N. Jouppi, and Z. Vranesic. Memory- System Design Considerations for Dynamically-Scheduled Processors. In Proc. 24th Int'l Symp. on Computer Architecture, pages 133-143, June 1997.
[9]
P. Heidelberger and K. Trivedi. Analytic Queueing Models for Programs with Internal Concurrency. IEEE Trans. on Computers, C-32(1):73-82, Jan. 1982.
[10]
P. Heidelberger and K. Trivedi. Queueing Network Models for Parallel Processing with Asynchronous Tasks. IEEE Trans. on Computers, C-31(11):1099-1109, Nov. 1982.
[11]
M. Heinrich et al. The Performance Impact of Flexibility in the Stanford FLASH Multiprocessor. In ASPLOS-VI, pages 274-285, Oct. 1994.
[12]
C. Holt, J. Singh, and J. Hennessy. Application and Architectural Bottlenecks in Large Scale Distributed Shared Memory Machines. In Proc. 23rd Int'l Symp. on Computer Architecture, pages 134-145, May 1996.
[13]
P. Jacobson and E. Lazowska. Analyzing Queueing Networks with Simultaneous Resource Possession. Communications of the ACM, 25(2):142-I51, Feb. 1982.
[14]
D. Kroft. Lockup-Free Instruction Fetch/Prefetch Cache Organization. In Proc. 8th Int'l Symp. on Computer Architecture, pages 81-87, May 1981.
[15]
J. Kuskin et al. The Stanford FLASH Multiprocessor. In Proc. 21st Int'l Symp. on Computer Architecture, Apr. 1994.
[16]
J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In Proc. 24th lnt'l Symp. on Computer Architecture, June 1997.
[17]
E. Lazowska, J. Zahorjan, G. Graham, and K. Sevcik. Quantitative System Performance, Computer System Analysis Using Queueing Network Models. Prentice-Hall, Englewood Cliffs, NJ, May 1984.
[18]
T. Lovett and R. Clapp. STING: A CC-NUMA Computer System for the Commercial Marketplace. In Proc. 23rd Int'l Symp. on Computer Architecture, pages 308-317, May 1996.
[19]
M. Michael, A. Nanda, B. Lim, and M. Scott. Coherence Controller Architectures for SMP-Based CC-NUMA Multiprocessors. In Proc. 24th lnt'l Symp. on Computer Architecture, pages 219-229, June 1997.
[20]
V. Pai, P. Ranganathan, and S. Adve. RSIM Reference Manual. Technical Report 9705, Department of Electrical and Computer Engineering, Rice University, Aug. 1997.
[21]
V. Pai, P. Ranganathan, and S. Adve. The Impact of Instruction-Level Parallelism on Multiprocessor Performance and Simulation Methodology. In Proc. Third Int'l Symp. on High Performance Computer Architecture, pages 72-83, Feb. 1997.
[22]
S. Reinhardt, J. Larus, and D. Wood. Tempest and Typhoon: User-Level Shared Memory. In Proc. 21st lnt'l Syrup. on Computer Architecture, pages 325-337, Apr. 1994.
[23]
S. Reinhardt, R. Pfile, and D. Wood. Decoupled Hardware Support for Distributed Shared Memory. In Proc. 23rd lnt'l Symp. on Computer Architecture, pages 34--43, May 1996.
[24]
J. Singh, W. Weber, and A. Gupta. SPLASH: Stanford Parallel Applications for Shared-Memory. Computer Architecture News, 1992.
[25]
D. Sorin et al. A Customized MVA Model for ILP Multiprocessors. Technical Report 1369, Computer Sciences Dept., Univ. of Wisconsin- Madison, Mar. 1998.
[26]
M. Vernon, E. Lazowska, and J. Zahorjan. An Accurate and Efficient Performance Analysis Technique for Multiprocessor Snooping Cache-Consistency Protocols. In Proc. 15th Int 'l Symp. on Computer Architecture, pages 192-202, 1988.
[27]
D. Willick and D. Eager. An Analytical Model of Multistage lnterconnection Networks. In Proc. ACM SIGMET- RICS, pages 192-202, May 1990.
[28]
E. Witchel and M. Rosenblum. Embra: Fast and Flexible Machine Simulation. In Proc. ACM SIGMETRICS, May 1996.
[29]
S. Woo et al. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proc. 22nd Int'l Symp. on Computer Architecture, pages 24-36, June 1995.

Cited By

View all
  • (2016)Accurate phase-level cross-platform power and performance estimationProceedings of the 53rd Annual Design Automation Conference10.1145/2897937.2897977(1-6)Online publication date: 5-Jun-2016
  • (2015)ELFProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/2807591.2807598(1-12)Online publication date: 15-Nov-2015
  • (2015)A Simple Model to Quantify the Impact of Memory Latency and Bandwidth on PerformanceACM SIGMETRICS Performance Evaluation Review10.1145/2796314.274590043:1(471-472)Online publication date: 15-Jun-2015
  • Show More Cited By

Index Terms

  1. Analytic evaluation of shared-memory systems with ILP processors

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ISCA '98: Proceedings of the 25th annual international symposium on Computer architecture
      April 1998
      402 pages
      ISBN:0818684917
      • cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 26, Issue 3
        Special Issue: Proceedings of the 25th annual international symposium on Computer architecture (ISCA '98)
        June 1998
        379 pages
        ISSN:0163-5964
        DOI:10.1145/279361
        Issue’s Table of Contents

      Sponsors

      Publisher

      IEEE Computer Society

      United States

      Publication History

      Published: 16 April 1998

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Article

      Conference

      ISCA98
      Sponsor:
      ISCA98: International Symposium on Computer Architecture
      June 27 - July 2, 1998
      Barcelona, Spain

      Acceptance Rates

      Overall Acceptance Rate 543 of 3,203 submissions, 17%

      Upcoming Conference

      ISCA '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)59
      • Downloads (Last 6 weeks)13
      Reflects downloads up to 22 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2016)Accurate phase-level cross-platform power and performance estimationProceedings of the 53rd Annual Design Automation Conference10.1145/2897937.2897977(1-6)Online publication date: 5-Jun-2016
      • (2015)ELFProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/2807591.2807598(1-12)Online publication date: 15-Nov-2015
      • (2015)A Simple Model to Quantify the Impact of Memory Latency and Bandwidth on PerformanceACM SIGMETRICS Performance Evaluation Review10.1145/2796314.274590043:1(471-472)Online publication date: 15-Jun-2015
      • (2015)TransitProceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing10.1145/2749246.2749265(101-106)Online publication date: 15-Jun-2015
      • (2015)A Simple Model to Quantify the Impact of Memory Latency and Bandwidth on PerformanceProceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems10.1145/2745844.2745900(471-472)Online publication date: 15-Jun-2015
      • (2014)A queueing theoretic approach for performance evaluation of low-power multi-core embedded systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2013.07.00374:1(1872-1890)Online publication date: 1-Jan-2014
      • (2012)Predicting Performance Impact of DVFS for Realistic Memory SystemsProceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2012.23(155-165)Online publication date: 1-Dec-2012
      • (2012)Active memory controllerThe Journal of Supercomputing10.1007/s11227-011-0735-962:1(510-549)Online publication date: 1-Oct-2012
      • (2011)SniperProceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/2063384.2063454(1-12)Online publication date: 12-Nov-2011
      • (2010)Accelerating multi-core simulatorsProceedings of the 2010 ACM Symposium on Applied Computing10.1145/1774088.1774582(2377-2382)Online publication date: 22-Mar-2010
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media