research-article

Open access

On the simulation of large-scale architectures using multiple application abstraction levels

Authors:

Alejandro Rico,

Felipe Cabarcas,

Carlos Villavieja,

Milan Pavlovic,

Mateo ValeroAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 8, Issue 4

Article No.: 36, Pages 1 - 20

https://doi.org/10.1145/2086696.2086715

Published: 26 January 2012 Publication History

Abstract

Simulation is a key tool for computer architecture research. In particular, cycle-accurate simulators are extremely important for microarchitecture exploration and detailed design decisions, but they are slow and, so, not suitable for simulating large-scale architectures, nor are they meant for this. Moreover, microarchitecture design decisions are irrelevant, or even misleading, for early processor design stages and high-level explorations. This allows one to raise the abstraction level of the simulated architecture, and also the application abstraction level, as it does not necessarily have to be represented as an instruction stream.

In this paper we introduce a definition of different application abstraction levels, and how these are employed in TaskSim, a multi-core architecture simulator, to provide several architecture modeling abstractions, and simulate large-scale architectures with hundreds of cores. We compare the simulation speed of these abstraction levels to the ones in existing simulation tools, and also evaluate their utility and accuracy. Our simulations show that a very high-level abstraction, which may be even faster than native execution, is useful for scalability studies on parallel applications; and that just simulating explicit memory transfers, we achieve accurate simulations for architectures using non-coherent scratchpad memories, with just a 25x slowdown compared to native execution. Furthermore, we revisit trace memory simulation techniques, that are more abstract than instruction-by-instruction simulations and provide an 18x simulation speedup.

References

[1]

2011. Mercurium Project website. https://pm.bsc.es/projects/mcxx.

[2]

2011. NANOS++ Project website. https://pm.bsc.es/projects/nanox.

[3]

Austin, T., Larson, E., and Ernst, D. 2002. SimpleScalar: An infrastructure for computer system modeling. Computer 35, 2, 59--67.

Digital Library

[4]

Badia, R. M., Labarta, J., Gimenez, J., and Escalé., F. 2003. DIMEMAS: Predicting MPI applications behavior in Grid environments. In Proceedings of the Workshop on Grid Applications and Programming Tools.

[5]

Barker, K. J., Davis, K., Hoisie, A., Kerbyson, D. J., Lang, M., Pakin, S., and Sancho, J. C. 2008. Entering the petaflop era: The architecture and performance of Roadrunner. In Proceedings of SC '08. 1:1--1:11.

Digital Library

[6]

Bellens, P., Perez, J. M., Badia, R. M., and Labarta, J. 2006. CellSs: A Programming model for the Cell BE architecture. In Proceedings of SC '06. 86.

Digital Library

[7]

Binkert, N. L., Dreslinski, R. G., Hsu, L. R., Lim, K. T., Saidi, A. G., and Reinhardt, S. K. 2006. The M5 simulator: Modeling networked systems. IEEE Micro 26, 4, 52--60.

Digital Library

[8]

Black, B., Huang, A. S., Lipasti, M. H., and Shen, J. P. 1996. Can trace-driven simulators accurately predict superscalar performance&quest;In Proceedings of ICCD '96. 478--485.

Digital Library

[9]

Blumofe, R. D., Joerg, C. F., Kuszmaul, B. C., Leiserson, C. E., Randall, K. H., and Zhou, Y. 1995. Cilk: An efficient multithreaded runtime system. SIGPLAN Not. 30, 8, 207--216.

Digital Library

[10]

Bose, P. 2011. Integrated modeling challenges in extreme-scale computing. Proceedings of ISPASS'11.

Digital Library

[11]

Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., and Sarkar, V. 2005. X10: An object-oriented approach to non-uniform cluster computing. In Proceedings of OOPSLA '05. 519--538.

Digital Library

[12]

Chen, J., Annavaram, M., and Dubois, M. 2009. SlackSim: A platform for parallel simulations of CMPs on CMPs. SIGARCH Comput. Archit. News 37, 20--29.

Digital Library

[13]

Duran, A., Ayguadé, E., Badia, R. M., Labarta, J., Martinell, L., Martorell, X., and Planas, J. 2011. Ompss: A Proposal for Programming Heterogeneous Multi-Core Architectures. Parall. Proc. Lett. 21, 2, 173--193.

[14]

Genbrugge, D., Eyerman, S., and Eeckhout, L. 2010. Interval simulation: Raising the level of abstraction in architectural simulation. In Proceedings of HPCA '10. 1--12.

[15]

Gonzalez, J., Gimenez, J., Casas, M., Moreto, M., Ramirez, A., Labarta, J., and Valero, M. 2011. Simulating whole supercomputer applications. IEEE Micro 31, 3, 32--45.

Digital Library

[16]

Jefferson, D. R. and Sowrizal, H. A. 1982. Fast concurrent simulation using the Time Warp mechanism, part I: Local control. Rand Note N-1906AF, the Rand Corp.

[17]

Kahle, J. A., Day, M. N., Hofstee, H. P., Johns, C. R., Maeurer, T. R., and Shippy, D. 2005. Introduction to the Cell multiprocessor. IBM J. Res. Dev. 49, 4/5, 589--604.

Digital Library

[18]

Lee, H., Jin, L., Lee, K., Demetriades, S., Moeng, M., and Cho, S. 2010. Two-phase trace-driven simulation (TPTS): A fast multicore processor architecture simulation approach. Softw. Pract. Exper. 40, 239--258.

Digital Library

[19]

Lee, K., Evans, S., and Cho, S. 2009. Accurately approximating superscalar processor performance from traces. In Proceedings of ISPASS'09. 238--248.

[20]

Luk, C.-K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Janapa, V., and Hazelwood, R. K. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of PLDI '05. 190--200.

Digital Library

[21]

Magnusson, P. S., Christensson, M., Eskilson, J., Forsgren, D., Hållberg, G., Högberg, J., Larsson, F., Moestedt, A., and Werner, B. 2002. Simics: A full system simulation platform. IEEE Computer 35, 2, 50--58.

Digital Library

[22]

Martin, M. M. K., Sorin, D. J., Beckmann, B. M., Marty, M. R., Xu, M., Alameldeen, A. R., Moore, K. E., Hill, M. D., and Wood, D. A. 2005. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Archit. News 33, 4, 92--99.

Digital Library

[23]

Miller, J. E., Kasture, H., Kurian, G., Beckmann, N., III, C. G., Celio, C., Eastep, J., and Agarwal, A. 2009. Graphite: A distributed parallel simulator for multicores. Tech. rep. MIT-CSAIL-TR-2009-056, Massachusetts Institute of Technology.

[24]

Moudgill, M., Bose, P., and Moreno, J. 1999. Validation of Turandot, a fast processor model for microarchitecture exploration. In Proceedings of IPCCC'99. 451--457.

[25]

Mukherjee, S. S., Reinhardt, S. K., Falsafi, B., Litzkow, M., Hill, M. D., Wood, D. A., Huss-Lederman, S., and Larus, J. R. 2000. Wisconsin wind tunnel II: A fast, portable parallel architecture simulator. IEEE Concurrency 8, 12--20.

Digital Library

[26]

Perelman, E., Hamerly, G., Van Biesbrouck, M., Sherwood, T., and Calder, B. 2003. Using SimPoint for accurate and efficient simulation. In Proceedings of SIGMETRICS '03. 318--319.

Digital Library

[27]

Puzak, T. R. 1985. Analysis of cache replacement-algorithms. Ph.D. thesis. AAI8509594.

Digital Library

[28]

Ramirez, A., Cabarcas, F., Juurlink, B., Mesa, A., Sanchez, F., Azevedo, A., Meenderinck, C., Ciobanu, C., Isaza, S., and Gaydadjiev, G. 2010. The SARC architecture. IEEE Micro 30, 5, 16--29.

Digital Library

[29]

Reinders, J. 2007. Intel Threading Building Blocks. O'Reilly.

Digital Library

[30]

Rico, A., Duran, A., Cabarcas, F., Etsion, Y., Ramirez, A., and Valero, M. 2011. Trace-driven simulation of multithreaded applications. In Proceedings of ISPASS'11. 87--96.

Digital Library

[31]

Rico, A., Ramirez, A., and Valero, M. 2009. Available task-level parallelism on the Cell BE. Scientific Program. 17, 1-2, 59--76.

Digital Library

[32]

Tikir, M. M., Laurenzano, M. A., Carrington, L., and Snavely, A. 2009. PSINS: An open source event tracer and execution simulator for MPI applications. In Proceedings of Euro-Par '09. 135--148.

Digital Library

[33]

Uhlig, R. A. and Mudge, T. N. 1997. Trace-driven memory simulation: A survey. ACM Comput. Surv. 29, 128--170.

Digital Library

[34]

Vega, A., Rico, A., Cabarcas, F., Ramírez, A., and Valero, M. 2010. Comparing last-level cache designs for CMP architectures. In Proceedings of IFMT '10. 2:1--2:11.

Digital Library

[35]

Wang, W.-H. and Baer, J.-L. 1990. Efficient trace-driven simulation method for cache performance analysis. In Proceedings of SIGMETRICS'90. 27--36.

Digital Library

[36]

Wenisch, T. F., Wunderlich, R. E., Falsafi, B., and Hoe, J. C. 2005. TurboSMARTS: accurate microarchitecture simulation sampling in minutes. In Proceedings of SIGMETRICS '05. 408--409.

Digital Library

[37]

Wunderlich, R. E., Wenisch, T. F., Falsafi, B., and Hoe, J. C. 2003. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In Proceedings of ISCA '03. 84--97.

Digital Library

[38]

Yi, J. J., Eeckhout, L., Lilja, D. J., Calder, B., John, L. K., and Smith, J. E. 2006. The future of simulation: A field of dreams. Computer 39, 22--29.

Digital Library

Cited By

Dimić VMoretó MCasas MValero M(2021)PrioRAT: Criticality-Driven Prioritization Inside the On-Chip Memory HierarchyEuro-Par 2021: Parallel Processing10.1007/978-3-030-85665-6_37(599-615)Online publication date: 25-Aug-2021
https://doi.org/10.1007/978-3-030-85665-6_37
Jaulmes LMoretó MValero MErez MCasas MCuicchi CQualters IKramer W(2020)Runtime-guided ECC protection using online estimation of memory vulnerabilityProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433802(1-14)Online publication date: 9-Nov-2020
https://dl.acm.org/doi/10.5555/3433701.3433802
Dimić VMoretó MCasas MCiesko JValero MAyguadé EHwu WBadia RHofstee H(2020)RICHProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392736(1-13)Online publication date: 29-Jun-2020
https://dl.acm.org/doi/10.1145/3392717.3392736
Show More Cited By

Index Terms

On the simulation of large-scale architectures using multiple application abstraction levels
1. Computing methodologies
  1. Modeling and simulation

Recommendations

Time warp on the go
SIMUTOOLS '12: Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques

In this paper we deal with the impact of multi and many-core processor architectures on simulation. Despite the fact that modern CPUs have an increasingly large number of cores, most softwares are still unable to take advantage of them. In the last ...
A Simulation and Exploration Technology for Multimedia-Application-Driven Architectures

The increasing of computational power requirements for DSP and Multimedia application and the needs of easy-to-program development environment has driven recent programmable devices toward Very Long Instruction Word (VLIW) [1] architectures and Hw-Sw co-...
Hybridizing S3D into an exascale application using OpenACC: an approach for moving to multi-petaflops and beyond
SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Hybridization is the process of converting an application with a single level of parallelism to an application with multiple levels of parallelism. Over the past 15 years a majority of the applications that run on High Performance Computing systems have ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 8, Issue 4

Special Issue on High-Performance Embedded Architectures and Compilers

January 2012

765 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/2086696

Issue’s Table of Contents

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 January 2012

Accepted: 01 November 2011

Revised: 01 October 2011

Received: 01 July 2011

Published in TACO Volume 8, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

European Union Program of High Level Scholarships for Latin America
Ministerio de Ciencia e Innovación
Seventh Framework Programme

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

38
Total Citations
View Citations
971
Total Downloads

Downloads (Last 12 months)78
Downloads (Last 6 weeks)15

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Dimić VMoretó MCasas MValero M(2021)PrioRAT: Criticality-Driven Prioritization Inside the On-Chip Memory HierarchyEuro-Par 2021: Parallel Processing10.1007/978-3-030-85665-6_37(599-615)Online publication date: 25-Aug-2021
https://doi.org/10.1007/978-3-030-85665-6_37
Jaulmes LMoretó MValero MErez MCasas MCuicchi CQualters IKramer W(2020)Runtime-guided ECC protection using online estimation of memory vulnerabilityProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433802(1-14)Online publication date: 9-Nov-2020
https://dl.acm.org/doi/10.5555/3433701.3433802
Dimić VMoretó MCasas MCiesko JValero MAyguadé EHwu WBadia RHofstee H(2020)RICHProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392736(1-13)Online publication date: 29-Jun-2020
https://dl.acm.org/doi/10.1145/3392717.3392736
Jaulmes LMoreto MValero MErez MCasas M(2020)Runtime-Guided ECC Protection using Online Estimation of Memory VulnerabilitySC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00080(1-14)Online publication date: Nov-2020
https://doi.org/10.1109/SC41405.2020.00080
Grass TCarlson TRico ACeballos GAyguade ECasas MMoreto M(2019)Sampled Simulation of Task-Based ProgramsIEEE Transactions on Computers10.1109/TC.2018.286001268:2(255-269)Online publication date: 1-Feb-2019
https://dl.acm.org/doi/10.1109/TC.2018.2860012
Gomez CMartinez FArmejach AMoreto MMantovani FCasas M(2019)Design Space Exploration of Next-Generation HPC Machines2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00017(54-65)Online publication date: May-2019
https://doi.org/10.1109/IPDPS.2019.00017
Jaulmes LMoreto MValero MCasas M(2019)A Vulnerability Factor for ECC-protected Memory2019 IEEE 25th International Symposium on On-Line Testing and Robust System Design (IOLTS)10.1109/IOLTS.2019.8854397(176-181)Online publication date: Jul-2019
https://doi.org/10.1109/IOLTS.2019.8854397
Tanabe NEndo TGuan S(2018)Exhaustive evaluation of memory-latency sensitivity on manycore processors with large cacheProceedings of the 2nd International Conference on High Performance Compilation, Computing and Communications10.1145/3195612.3195616(27-34)Online publication date: 15-Mar-2018
https://dl.acm.org/doi/10.1145/3195612.3195616
Badr MJerger N(2018)Fast and Accurate Performance Analysis of SynchronizationProceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3178442.3178446(31-40)Online publication date: 24-Feb-2018
https://dl.acm.org/doi/10.1145/3178442.3178446
Bekele MPierdicca RFrontoni EMalinverni EGain J(2018)A Survey of Augmented, Virtual, and Mixed Reality for Cultural HeritageJournal on Computing and Cultural Heritage 10.1145/314553411:2(1-36)Online publication date: 22-Mar-2018
https://dl.acm.org/doi/10.1145/3145534
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents