Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

FLASH vs. (Simulated) FLASH: closing the simulation loop

Published: 12 November 2000 Publication History

Abstract

Simulation is the primary method for evaluating computer systems during all phases of the design process. One significant problem with simulation is that it rarely models the system exactly, and quantifying the resulting simulator error can be difficult. More importantly, architects often assume without proof that although their simulator may make inaccurate absolute performance predictions, it will still accurately predict architectural trends.This paper studies the source and magnitude of error in a range of architectural simulators by comparing the simulated execution time of several applications and microbenchmarks to their execution time on the actual hardware being modeled. The existence of a hardware gold standard allows us to find, quantify, and fix simulator inaccuracies. We then use the simulators to predict architectural trends and analyze the sensitivity of the results to the simulator configuration. We find that most of our simulators predict trends accurately, as long as they model all of the important performance effects for the application in question. Unfortunately, it is difficult to know what these effects are without having a hardware reference, as they can be quite subtle. This calls into question the value, for architectural studies, of highly detailed simulators whose characteristics are not carefully validated against s real hardware design.

References

[1]
R. Bedichek. Talisman: Fast and Accurate Multicomputer Simulation. Performance Evaluation Review, vol. 23, no. 1, pp. 14-24, May 1995.
[2]
D. C. Burger and T. M. Austin. The SimpleScalar Tool Set, Version 2.0. Computer Architecture News, 25(3), pages 13-25, June 1997.
[3]
M. Durbhakula, V. Pai, and S. Adve. Improving the Speed vs. Accuracy Tradeoff for Simulating Shared-Memory Multiprocessors with ILP Processors. Rice University ECE Technical Report 9802, June 1998.
[4]
S. Goldschmidt. Simulation of Multiprocessors: Accuradcy and Performance. Ph.D. Dissertation, Stanford University, June 1993.
[5]
M. Heinrich. The Performance and Scalability of Distributed Shared Memory Cache Coherence Protocols. Ph.D. Dissertation, Stanford University, October 1998.
[6]
M. Heinrich, J. Kuskin, D. Ofelt, et al. The Performance Impact of Flexibility in the Stanford FLASH Multiprocessor. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 274-285, October 1994.
[7]
M. Heinrich, D. Ofelt, M. Horowitz, and J. Hennessy. Hardware/Software Codesign of the Stanford FLASH Multiprocessor. In Proceedings of the IEEE Special Issue on Hardware/Software Co-design, 85(3), March 1997.
[8]
C. Hristea, D. Lenoski, and J. Keen. Measuring Memory Hierarchy Performance of Cache-Coherent Multiprocessors Using Micro Benchmarks. In Proceedings of Supercomputing 1997, November 1997.
[9]
J. Kuskin, D. Ofelt, M. Heinrich, et al. The Stanford FLASH Multiprocessor. In Proceedings of the 21st International Symposium on Computer Architecture, pages 302-313, April 1994.
[10]
P.S. Magnusson, F. Dahlgren, H. Grahn, et al. SimICS/sun4m: A VirtualWorstation. In Proceedings of the Usenix Annual Technical Conference, June 1998.
[11]
D. Ofelt. Efficient Performance Prediction for Modern Microprocessors. Ph.D. Dissertation, Stanford University, August 1999.
[12]
M. Martonosi, D. Ofelt, and M. Heinrich. Integrating Performance Monitoring and Communication in Parallel Computers. In 1996 ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems, May, 1996.
[13]
L. McVoy and C. Staelin. lmbench: Portable tools for performance analysis. USENIX technical conference, pages 279-284, January 1996.
[14]
V. S. Pai, P. Ranganathan, and S. V. Adve. RSIM Reference Manual version 1.0. Technical Report #9705, Department of Electrical and Computer Engineering, Rice University, August 1997.
[15]
V. S. Pai, P. Ranganathan, and S. V. Adve. The Impact of Instruction Level Parallelism on Multiprocessor Performance and Simulation Methodology. In Proceedings of the 3rd International Symposium on High Performance Computer Architecture, 1997.
[16]
U. Prestor. Snbench homepage, on-line at http://www.cs.utah.edu/~uros/snbench.
[17]
S. K. Reinhardt, M. D. Hill, J. R. Larus, et al. The Wisconsin Wind Tunnel: Virtual Prototyping of Parallel Computers. In ACM SIGMETRICS Conference on Measurement & Modeling of Computer Systems, May 1993.
[18]
M. Rosenblum. Personal Communication.
[19]
M. Rosenblum, S. A. Herrod, E. Witchel, and A. Gupta. Complete Computer Simulation: The SimOS Approach. IEEE Parallel and Distributed Technology, 3(4):34-43, Winter 1995.
[20]
Standard Performance Evaluation Corporation. The SPEC95 Benchmark Suite. Details on-line at http://www.specbench.org/.
[21]
Stanford Parallel Applications for Shared Memory. SPLASH-2 web page, on-line at http://www-flash.stanford.edu/apps/SPLASH.
[22]
C. Stolte, R. Bosch, P. Hanrahan, and M. Rosenblum. Visualizing Application Behavior on Superscalar Processors. In Proceedings of IEEE Information Visualization, 1999, pages 10-17, 1999.
[23]
M. Talluri and M. Hill. Surpassing the TLB Performance of Superpages with Less Operating System Support. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 171-182, October 1994.
[24]
J. E. Veenstra and R. J. Fowler. MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors. In Proceedings of the Second International Workshop on Modeling. Analysis, and Simulation of Computer and Telecommunication Systems, pages 201-207, January 1994.
[25]
S. C. Woo, M. Ohara, E. Torrie, et al. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22nd International Symposium on Computer Architecture, pages 24-36, June 1995.
[26]
Kenneth Yeager. Personal Communication.
[27]
Kenneth Yeager. The MIPS R10000 Superscalar Microprocessor. IEEE Micro, 16(2):28-40, April 1996.

Cited By

View all
  • (2019)A Survey of Computer Architecture Simulation Techniques and ToolsIEEE Access10.1109/ACCESS.2019.29176987(78120-78145)Online publication date: 2019
  • (2015)A Storage Device Emulator for System Performance EvaluationACM Transactions on Embedded Computing Systems10.1145/278596914:4(1-27)Online publication date: 20-Oct-2015
  • (2014)FPGA prototyping of emerging manycore architectures for parallel programming research using Formic boardsJournal of Systems Architecture10.1016/j.sysarc.2014.03.00260:6(481-493)Online publication date: Jun-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2000
Published in SIGARCH Volume 28, Issue 5

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)97
  • Downloads (Last 6 weeks)14
Reflects downloads up to 10 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2019)A Survey of Computer Architecture Simulation Techniques and ToolsIEEE Access10.1109/ACCESS.2019.29176987(78120-78145)Online publication date: 2019
  • (2015)A Storage Device Emulator for System Performance EvaluationACM Transactions on Embedded Computing Systems10.1145/278596914:4(1-27)Online publication date: 20-Oct-2015
  • (2014)FPGA prototyping of emerging manycore architectures for parallel programming research using Formic boardsJournal of Systems Architecture10.1016/j.sysarc.2014.03.00260:6(481-493)Online publication date: Jun-2014
  • (2021)AccelWattch: A Power Modeling Framework for Modern GPUsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480063(738-753)Online publication date: 18-Oct-2021
  • (2019)Racing to Hardware-Validated Simulation2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2019.00014(58-67)Online publication date: Mar-2019
  • (2014)PyMTLProceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2014.50(280-292)Online publication date: 13-Dec-2014
  • (2014)Sources of error in full-system simulation2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2014.6844457(13-22)Online publication date: Mar-2014
  • (2012)SESAM/Par4AllProceedings of the 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools10.1145/2162131.2162133(9-16)Online publication date: 23-Jan-2012
  • (2010)SESAM extension for fast MPSoC architectural exploration and dynamic streaming applications2010 18th IEEE/IFIP International Conference on VLSI and System-on-Chip10.1109/VLSISOC.2010.5642684(341-346)Online publication date: Sep-2010
  • (2009)A simulation toolkit to investigate the effects of grid characteristics on workflow completion timeProceedings of the 4th Workshop on Workflows in Support of Large-Scale Science10.1145/1645164.1645170(1-10)Online publication date: 16-Nov-2009
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media