Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

On the simulation of large-scale architectures using multiple application abstraction levels

Published: 26 January 2012 Publication History

Abstract

Simulation is a key tool for computer architecture research. In particular, cycle-accurate simulators are extremely important for microarchitecture exploration and detailed design decisions, but they are slow and, so, not suitable for simulating large-scale architectures, nor are they meant for this. Moreover, microarchitecture design decisions are irrelevant, or even misleading, for early processor design stages and high-level explorations. This allows one to raise the abstraction level of the simulated architecture, and also the application abstraction level, as it does not necessarily have to be represented as an instruction stream.
In this paper we introduce a definition of different application abstraction levels, and how these are employed in TaskSim, a multi-core architecture simulator, to provide several architecture modeling abstractions, and simulate large-scale architectures with hundreds of cores. We compare the simulation speed of these abstraction levels to the ones in existing simulation tools, and also evaluate their utility and accuracy. Our simulations show that a very high-level abstraction, which may be even faster than native execution, is useful for scalability studies on parallel applications; and that just simulating explicit memory transfers, we achieve accurate simulations for architectures using non-coherent scratchpad memories, with just a 25x slowdown compared to native execution. Furthermore, we revisit trace memory simulation techniques, that are more abstract than instruction-by-instruction simulations and provide an 18x simulation speedup.

References

[1]
2011. Mercurium Project website. https://pm.bsc.es/projects/mcxx.
[2]
2011. NANOS++ Project website. https://pm.bsc.es/projects/nanox.
[3]
Austin, T., Larson, E., and Ernst, D. 2002. SimpleScalar: An infrastructure for computer system modeling. Computer 35, 2, 59--67.
[4]
Badia, R. M., Labarta, J., Gimenez, J., and Escalé., F. 2003. DIMEMAS: Predicting MPI applications behavior in Grid environments. In Proceedings of the Workshop on Grid Applications and Programming Tools.
[5]
Barker, K. J., Davis, K., Hoisie, A., Kerbyson, D. J., Lang, M., Pakin, S., and Sancho, J. C. 2008. Entering the petaflop era: The architecture and performance of Roadrunner. In Proceedings of SC '08. 1:1--1:11.
[6]
Bellens, P., Perez, J. M., Badia, R. M., and Labarta, J. 2006. CellSs: A Programming model for the Cell BE architecture. In Proceedings of SC '06. 86.
[7]
Binkert, N. L., Dreslinski, R. G., Hsu, L. R., Lim, K. T., Saidi, A. G., and Reinhardt, S. K. 2006. The M5 simulator: Modeling networked systems. IEEE Micro 26, 4, 52--60.
[8]
Black, B., Huang, A. S., Lipasti, M. H., and Shen, J. P. 1996. Can trace-driven simulators accurately predict superscalar performance?In Proceedings of ICCD '96. 478--485.
[9]
Blumofe, R. D., Joerg, C. F., Kuszmaul, B. C., Leiserson, C. E., Randall, K. H., and Zhou, Y. 1995. Cilk: An efficient multithreaded runtime system. SIGPLAN Not. 30, 8, 207--216.
[10]
Bose, P. 2011. Integrated modeling challenges in extreme-scale computing. Proceedings of ISPASS'11.
[11]
Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., and Sarkar, V. 2005. X10: An object-oriented approach to non-uniform cluster computing. In Proceedings of OOPSLA '05. 519--538.
[12]
Chen, J., Annavaram, M., and Dubois, M. 2009. SlackSim: A platform for parallel simulations of CMPs on CMPs. SIGARCH Comput. Archit. News 37, 20--29.
[13]
Duran, A., Ayguadé, E., Badia, R. M., Labarta, J., Martinell, L., Martorell, X., and Planas, J. 2011. Ompss: A Proposal for Programming Heterogeneous Multi-Core Architectures. Parall. Proc. Lett. 21, 2, 173--193.
[14]
Genbrugge, D., Eyerman, S., and Eeckhout, L. 2010. Interval simulation: Raising the level of abstraction in architectural simulation. In Proceedings of HPCA '10. 1--12.
[15]
Gonzalez, J., Gimenez, J., Casas, M., Moreto, M., Ramirez, A., Labarta, J., and Valero, M. 2011. Simulating whole supercomputer applications. IEEE Micro 31, 3, 32--45.
[16]
Jefferson, D. R. and Sowrizal, H. A. 1982. Fast concurrent simulation using the Time Warp mechanism, part I: Local control. Rand Note N-1906AF, the Rand Corp.
[17]
Kahle, J. A., Day, M. N., Hofstee, H. P., Johns, C. R., Maeurer, T. R., and Shippy, D. 2005. Introduction to the Cell multiprocessor. IBM J. Res. Dev. 49, 4/5, 589--604.
[18]
Lee, H., Jin, L., Lee, K., Demetriades, S., Moeng, M., and Cho, S. 2010. Two-phase trace-driven simulation (TPTS): A fast multicore processor architecture simulation approach. Softw. Pract. Exper. 40, 239--258.
[19]
Lee, K., Evans, S., and Cho, S. 2009. Accurately approximating superscalar processor performance from traces. In Proceedings of ISPASS'09. 238--248.
[20]
Luk, C.-K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Janapa, V., and Hazelwood, R. K. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of PLDI '05. 190--200.
[21]
Magnusson, P. S., Christensson, M., Eskilson, J., Forsgren, D., Hållberg, G., Högberg, J., Larsson, F., Moestedt, A., and Werner, B. 2002. Simics: A full system simulation platform. IEEE Computer 35, 2, 50--58.
[22]
Martin, M. M. K., Sorin, D. J., Beckmann, B. M., Marty, M. R., Xu, M., Alameldeen, A. R., Moore, K. E., Hill, M. D., and Wood, D. A. 2005. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Archit. News 33, 4, 92--99.
[23]
Miller, J. E., Kasture, H., Kurian, G., Beckmann, N., III, C. G., Celio, C., Eastep, J., and Agarwal, A. 2009. Graphite: A distributed parallel simulator for multicores. Tech. rep. MIT-CSAIL-TR-2009-056, Massachusetts Institute of Technology.
[24]
Moudgill, M., Bose, P., and Moreno, J. 1999. Validation of Turandot, a fast processor model for microarchitecture exploration. In Proceedings of IPCCC'99. 451--457.
[25]
Mukherjee, S. S., Reinhardt, S. K., Falsafi, B., Litzkow, M., Hill, M. D., Wood, D. A., Huss-Lederman, S., and Larus, J. R. 2000. Wisconsin wind tunnel II: A fast, portable parallel architecture simulator. IEEE Concurrency 8, 12--20.
[26]
Perelman, E., Hamerly, G., Van Biesbrouck, M., Sherwood, T., and Calder, B. 2003. Using SimPoint for accurate and efficient simulation. In Proceedings of SIGMETRICS '03. 318--319.
[27]
Puzak, T. R. 1985. Analysis of cache replacement-algorithms. Ph.D. thesis. AAI8509594.
[28]
Ramirez, A., Cabarcas, F., Juurlink, B., Mesa, A., Sanchez, F., Azevedo, A., Meenderinck, C., Ciobanu, C., Isaza, S., and Gaydadjiev, G. 2010. The SARC architecture. IEEE Micro 30, 5, 16--29.
[29]
Reinders, J. 2007. Intel Threading Building Blocks. O'Reilly.
[30]
Rico, A., Duran, A., Cabarcas, F., Etsion, Y., Ramirez, A., and Valero, M. 2011. Trace-driven simulation of multithreaded applications. In Proceedings of ISPASS'11. 87--96.
[31]
Rico, A., Ramirez, A., and Valero, M. 2009. Available task-level parallelism on the Cell BE. Scientific Program. 17, 1-2, 59--76.
[32]
Tikir, M. M., Laurenzano, M. A., Carrington, L., and Snavely, A. 2009. PSINS: An open source event tracer and execution simulator for MPI applications. In Proceedings of Euro-Par '09. 135--148.
[33]
Uhlig, R. A. and Mudge, T. N. 1997. Trace-driven memory simulation: A survey. ACM Comput. Surv. 29, 128--170.
[34]
Vega, A., Rico, A., Cabarcas, F., Ramírez, A., and Valero, M. 2010. Comparing last-level cache designs for CMP architectures. In Proceedings of IFMT '10. 2:1--2:11.
[35]
Wang, W.-H. and Baer, J.-L. 1990. Efficient trace-driven simulation method for cache performance analysis. In Proceedings of SIGMETRICS'90. 27--36.
[36]
Wenisch, T. F., Wunderlich, R. E., Falsafi, B., and Hoe, J. C. 2005. TurboSMARTS: accurate microarchitecture simulation sampling in minutes. In Proceedings of SIGMETRICS '05. 408--409.
[37]
Wunderlich, R. E., Wenisch, T. F., Falsafi, B., and Hoe, J. C. 2003. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In Proceedings of ISCA '03. 84--97.
[38]
Yi, J. J., Eeckhout, L., Lilja, D. J., Calder, B., John, L. K., and Smith, J. E. 2006. The future of simulation: A field of dreams. Computer 39, 22--29.

Cited By

View all
  • (2021)PrioRAT: Criticality-Driven Prioritization Inside the On-Chip Memory HierarchyEuro-Par 2021: Parallel Processing10.1007/978-3-030-85665-6_37(599-615)Online publication date: 25-Aug-2021
  • (2020)Runtime-guided ECC protection using online estimation of memory vulnerabilityProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433802(1-14)Online publication date: 9-Nov-2020
  • (2020)RICHProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392736(1-13)Online publication date: 29-Jun-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 8, Issue 4
Special Issue on High-Performance Embedded Architectures and Compilers
January 2012
765 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2086696
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 January 2012
Accepted: 01 November 2011
Revised: 01 October 2011
Received: 01 July 2011
Published in TACO Volume 8, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Multi-core
  2. abstraction levels
  3. simulation

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)78
  • Downloads (Last 6 weeks)15
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)PrioRAT: Criticality-Driven Prioritization Inside the On-Chip Memory HierarchyEuro-Par 2021: Parallel Processing10.1007/978-3-030-85665-6_37(599-615)Online publication date: 25-Aug-2021
  • (2020)Runtime-guided ECC protection using online estimation of memory vulnerabilityProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433802(1-14)Online publication date: 9-Nov-2020
  • (2020)RICHProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392736(1-13)Online publication date: 29-Jun-2020
  • (2020)Runtime-Guided ECC Protection using Online Estimation of Memory VulnerabilitySC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00080(1-14)Online publication date: Nov-2020
  • (2019)Sampled Simulation of Task-Based ProgramsIEEE Transactions on Computers10.1109/TC.2018.286001268:2(255-269)Online publication date: 1-Feb-2019
  • (2019)Design Space Exploration of Next-Generation HPC Machines2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00017(54-65)Online publication date: May-2019
  • (2019)A Vulnerability Factor for ECC-protected Memory2019 IEEE 25th International Symposium on On-Line Testing and Robust System Design (IOLTS)10.1109/IOLTS.2019.8854397(176-181)Online publication date: Jul-2019
  • (2018)Exhaustive evaluation of memory-latency sensitivity on manycore processors with large cacheProceedings of the 2nd International Conference on High Performance Compilation, Computing and Communications10.1145/3195612.3195616(27-34)Online publication date: 15-Mar-2018
  • (2018)Fast and Accurate Performance Analysis of SynchronizationProceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3178442.3178446(31-40)Online publication date: 24-Feb-2018
  • (2018)A Survey of Augmented, Virtual, and Mixed Reality for Cultural HeritageJournal on Computing and Cultural Heritage 10.1145/314553411:2(1-36)Online publication date: 22-Mar-2018
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media