research-article

Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation

Authors:

Trevor E. Carlson,

Lieven EeckhoutAuthors Info & Claims

SC '11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis

Article No.: 52, Pages 1 - 12

https://doi.org/10.1145/2063384.2063454

Published: 12 November 2011 Publication History

Abstract

Two major trends in high-performance computing, namely, larger numbers of cores and the growing size of on-chip cache memory, are creating significant challenges for evaluating the design space of future processor architectures. Fast and scalable simulations are therefore needed to allow for sufficient exploration of large multi-core systems within a limited simulation time budget. By bringing together accurate high-abstraction analytical models with fast parallel simulation, architects can trade off accuracy with simulation speed to allow for longer application runs, covering a larger portion of the hardware design space. Interval simulation provides this balance between detailed cycle-accurate simulation and one-IPC simulation, allowing long-running simulations to be modeled much faster than with detailed cycle-accurate simulation, while still providing the detail necessary to observe core-uncore interactions across the entire system. Validations against real hardware show average absolute errors within 25% for a variety of multi-threaded workloads; more than twice as accurate on average as one-IPC simulation. Further, we demonstrate scalable simulation speed of up to 2.0 MIPS when simulating a 16-core system on an 8-core SMP machine.

References

[1]

A. Alameldeen and D. Wood. Variability in architectural simulations of multi-threaded workloads. In Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA), pages 7--18, Feb. 2003.

Digital Library

[2]

K. C. Barr, H. Pan, M. Zhang, and K. Asanovic. Accelerating multiprocessor simulation with a memory timestamp record. In Proceedings of the 2005 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 66--77, Mar. 2005.

Digital Library

[3]

C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 72--81, Oct. 2008.

Digital Library

[4]

N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K. Reinhardt. The M5 simulator: Modeling networked systems. IEEE Micro, 26:52--60, 2006.

Digital Library

[5]

J. Chen, L. K. Dabbiru, D. Wong, M. Annavaram, and M. Dubois. Adaptive and speculative slack simulations of CMPs on CMPs. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 523--534. IEEE Computer Society, 2010.

Digital Library

[6]

D. Chiou, D. Sunwoo, J. Kim, N. A. Patil, W. Reinhart, D. E. Johnson, J. Keefe, and H. Angepat. FPGA-accelerated simulation technologies (FAST): Fast, full-system, cycle-accurate simulators. In Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 249--261, Dec. 2007.

Digital Library

[7]

T. M. Conte, M. A. Hirsch, and K. N. Menezes. Reducing state loss for effective trace sampling of superscalar processors. In Proceedings of the International Conference on Computer Design (ICCD), pages 468--477, Oct. 1996.

Digital Library

[8]

Y. Cui, W. Wu, Y. Wang, X. Guo, Y. Chen, and Y. Shi. A discrete event simulation model for understanding kernel lock thrashing on multi-core architectures. In Proceedings of the 16th International Conference on Parallel and Distributed Systems (ICPADS), pages 1--8, Dec. 2010.

Digital Library

[9]

M. Ekman and P. Stenström. Enhancing multiprocessor architecture simulation speed using matched-pair comparison. In Proceedings of the 2005 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 89--99, Mar. 2005.

Digital Library

[10]

S. Eyerman, L. Eeckhout, T. Karkhanis, and J. E. Smith. A mechanistic performance model for superscalar out-of-order processors. ACM Transactions on Computer Systems (TOCS), 27(2):42--53, May 2009.

Digital Library

[11]

A. Fog. Instruction tables. http://www.agner.org/optimize/instruction_tables.pdf, April 2011.

[12]

H. Franke, R. Russell, and M. Kirkwood. Fuss, futexes and furwocks: Fast userlevel locking in Linux. In Proceedings of the 2002 Ottawa Linux Summit, pages 479--495, 2002.

[13]

R. M. Fujimoto. Parallel discrete event simulation. Communications of the ACM, 33(10):30--53, Oct. 1990.

Digital Library

[14]

D. Genbrugge, S. Eyerman, and L. Eeckhout. Interval simulation: Raising the level of abstraction in architectural simulation. In Proceedings of the 16th International Symposium on High Performance Computer Architecture (HPCA), pages 307--318, Feb. 2010.

[15]

L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. D. an B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional memory coherence and consistency. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 102--113, June 2004.

Digital Library

[16]

A. Jaleel, R. S. Cohn, C.-K. Luk, and B. Jacob. CMP$im: A Pin-based on-the-fly multi-core cache simulator. In Proceedings of the Fourth Annual Workshop on Modeling, Benchmarking and Simulation (MoBS), co-located with ISCA 2008, pages 28--36, June 2008.

[17]

A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely, Jr., and J. Emer. Adaptive insertion policies for managing shared caches. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques (PACT), pages 208--219, 2008.

Digital Library

[18]

B. Lee, J. Collins, H. Wang, and D. Brooks. CPR: Composable performance regression for scalable multiprocessor models. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 270--281, Nov. 2008.

Digital Library

[19]

C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI), pages 190--200. ACM, June 2005.

Digital Library

[20]

M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. ACM SIGARCH Computer Architecture News, 33(4):92--99, Nov. 2005.

Digital Library

[21]

A. M. G. Maynard, C. M. Donnelly, and B. R. Olszewski. Contrasting characteristics and cache performance of technical and multi-user commercial workloads. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 145--156, Oct. 1994.

Digital Library

[22]

J. E. Miller, H. Kasture, G. Kurian, C. Gruenwald III, N. Beckmann, C. Celio, J. Eastep, and A. Agarwal. Graphite: A distributed parallel simulator for multicores. In Proceedings of the 16th International Symposium on High Performance Computer Architecture (HPCA), pages 1--12, Jan. 2010.

[23]

K. E. Moore, J. Bobba, M. J. Moravan, M. D. Hill, and D. A. Wood. LogTM: Log-based transactional memory. In Proceedings of the 13th International Symposium on High Performance Computer Architecture (HPCA), pages 254--265, Feb. 2006.

[24]

M. Pellauer, M. Adler, M. Kinsy, A. Parashar, and J. Emer. HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing. In Proceedings of the 17th International Symposium on High Performance Computer Architecture (HPCA), pages 406--417, Feb. 2011.

Digital Library

[25]

S. K. Reinhardt, M. D. Hill, J. R. Larus, A. R. Lebeck, J. C. Lewis, and D. A. Wood. The Wisconsin Wind Tunnel: Virtual prototyping of parallel computers. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 48--60, May 1993.

Digital Library

[26]

T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 45--57, Oct. 2002.

Digital Library

[27]

D. J. Sorin, V. S. Pai, S. V. Adve, M. K. Vernon, and D. A. Wood. Analytic evaluation of shared-memory systems with ILP processors. In Proceedings of the 25th Annual International Symposium on Computer Architecture (ISCA), pages 380--391, June 1998.

Digital Library

[28]

V. Uzelac and A. Milenkovic. Experiment flows and microbenchmarks for reverse engineering of branch predictor structures. In 2009 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 207--217, 2009.

[29]

J. Wawrzynek, D. Patterson, M. Oskin, S.-L. Lu, C. Kozyrakis, J. C. Hoe, D. Chiou, and K. Asanovic. RAMP: Research accelerator for multiple processors. IEEE Micro, 27(2):46--57, Mar. 2007.

Digital Library

[30]

S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd International Symposium on Computer Architecture (ISCA), pages 24--36, June 1995.

Digital Library

[31]

R. E. Wunderlich, T. F. Wenisch, B. Falsafi, and J. C. Hoe. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA), pages 84--95, June 2003.

Digital Library

[32]

M. Yourst. PTLsim: A cycle accurate full system x86-64 microarchitectural simulator. In Proceedings of the 2007 IEEE International Symmposium on Performance Analysis of Systems and Software (ISPASS), pages 23--34. Apr. 2007.

Cited By

Pătrăușanu AFlorea ANeghină MDicoiu AChiș R(2024)A Systematic Review of Multi-Objective Evolutionary Algorithms Optimization FrameworksProcesses10.3390/pr1205086912:5(869)Online publication date: 26-Apr-2024
https://doi.org/10.3390/pr12050869
Buduleci CGellert AFlorea ABrad R(2024)Architectural and Technological Approaches for Efficient Energy Management in Multicore ProcessorsComputers10.3390/computers1304008413:4(84)Online publication date: 22-Mar-2024
https://doi.org/10.3390/computers13040084
Robens MKleijnen RSchiek Mvan Waasen S(2024)NoC simulation steered by NEST: McAERsim and a Noxim patchFrontiers in Neuroscience10.3389/fnins.2024.137110318Online publication date: 20-Jun-2024
https://doi.org/10.3389/fnins.2024.1371103
Show More Cited By

Index Terms

Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies

Recommendations

An Evaluation of High-Level Mechanistic Core Models

Large core counts and complex cache hierarchies are increasing the burden placed on commonly used simulation and modeling techniques. Although analytical models provide fast results, they do not apply to complex, many-core shared-memory systems. In ...
Power-aware multi-core simulation for early design stage hardware/software co-optimization
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

Stringent performance targets and power constraints push designers towards building specialized workload-optimized systems across a broad spectrum of the computing arena, including supercomputing applications as exemplified by the IBM BlueGene and Intel ...
An Implementation of Parallel 1-D FFT Using AVX Instructions on Multi-core Processors
IWIA '12: Proceedings of the 2012 International Workshop on Innovative Architecture for Future Generation Processors and Systems

In this paper, we propose an implementation of a parallel one-dimensional fast Fourier transform (FFT) using Intel Advanced Vector Extensions (AVX) instructions on multi-core processors. The combination of vectorization and a block six-step FFT ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis

November 2011

866 pages

ISBN:9781450307710

DOI:10.1145/2063384

Conference Chair:
Scott Lathrop
University of Chicago
,
Program Chairs:
Jim Costa
Sandia National Laboratories
,
William Kramer
National Center for Supercomputing Applications

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Seventh Framework Programme
The Research Foundation - Flanders projects
Universiteit Gent
European Research Council

Conference

SC '11

Sponsor:

SIGARCH
IEEE-CS

SC '11: International Conference for High Performance Computing, Networking, Storage and Analysis

November 12 - 18, 2011

Washington, Seattle

Acceptance Rates

SC '11 Paper Acceptance Rate 74 of 352 submissions, 21%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

625
Total Citations
View Citations
1,847
Total Downloads

Downloads (Last 12 months)197
Downloads (Last 6 weeks)24

Reflects downloads up to 12 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pătrăușanu AFlorea ANeghină MDicoiu AChiș R(2024)A Systematic Review of Multi-Objective Evolutionary Algorithms Optimization FrameworksProcesses10.3390/pr1205086912:5(869)Online publication date: 26-Apr-2024
https://doi.org/10.3390/pr12050869
Buduleci CGellert AFlorea ABrad R(2024)Architectural and Technological Approaches for Efficient Energy Management in Multicore ProcessorsComputers10.3390/computers1304008413:4(84)Online publication date: 22-Mar-2024
https://doi.org/10.3390/computers13040084
Robens MKleijnen RSchiek Mvan Waasen S(2024)NoC simulation steered by NEST: McAERsim and a Noxim patchFrontiers in Neuroscience10.3389/fnins.2024.137110318Online publication date: 20-Jun-2024
https://doi.org/10.3389/fnins.2024.1371103
Pandey SYazdanbakhsh ALiu H(2024)TAO: Re-Thinking DL-based Microarchitecture SimulationProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36560128:2(1-25)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3656012
Puri ABellamkonda KNarreddy KJose JTamarapalli VNarayanan V(2024)DRackSim: Simulating CXL-enabled Large-Scale Disaggregated Memory SystemsProceedings of the 38th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3615979.3656059(3-14)Online publication date: 24-Jun-2024
https://dl.acm.org/doi/10.1145/3615979.3656059
Choudhary RPatel JSingh V(2024)Early Execution for Soft Error Detection2024 37th International Conference on VLSI Design and 2024 23rd International Conference on Embedded Systems (VLSID)10.1109/VLSID60093.2024.00067(366-371)Online publication date: 6-Jan-2024
https://doi.org/10.1109/VLSID60093.2024.00067
Zhu HGuo XJin YZhang X(2024)PowerScout: Security-Oriented Power Delivery Network Modeling for Side-Channel Vulnerability AnalysisIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2023.325782612:2(532-545)Online publication date: Apr-2024
https://doi.org/10.1109/TETC.2023.3257826
Zhang XLiu CNi JCheng YZhang LLi HLi X(2024)PDG: A Prefetcher for Dynamic Graph UpdatingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333588043:4(1246-1259)Online publication date: Apr-2024
https://doi.org/10.1109/TCAD.2023.3335880
Qiu YHuang TTang YLiu YKong YYu XZeng XFan Y(2024)Gem5Tune: A Parameter Auto-Tuning Framework for Gem5 Simulator to Reduce ErrorsIEEE Transactions on Computers10.1109/TC.2023.334767573:3(902-914)Online publication date: Mar-2024
https://doi.org/10.1109/TC.2023.3347675
Mishra DKanellopoulos KPanwar ASriraman ASeshadri VMutlu OMowry T(2024)Address Scaling: Architectural Support for Fine-Grained Thread-Safe Metadata ManagementIEEE Computer Architecture Letters10.1109/LCA.2024.337376023:1(69-72)Online publication date: Jan-2024
https://doi.org/10.1109/LCA.2024.3373760
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents