Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2601381.2601392acmconferencesArticle/Chapter ViewAbstractPublication PagespadsConference Proceedingsconference-collections
research-article

Exploring many-core architecture design space for parallel discrete event simulation

Published: 18 May 2014 Publication History

Abstract

As multicore and manycore processor architectures are emerging and the core counts per chip continue to increase, it is important to evaluate and understand the performance and scalability of Parallel Discrete Event Simulation (PDES) on these platforms. Most existing architectures are still limited to a modest number of cores, feature simple designs and do not exhibit heterogeneity, making it impossible to perform comprehensive analysis and evaluations of PDES on these platforms. Instead, in this paper we evaluate PDES using a full-system cycle-accurate simulator of a multicore processor and memory subsystem. With this approach, it is possible to flexibly configure the simulator and perform exploration of the impact of architecture design choices on the performance of PDES. In particular, we answer the following four questions with respect to PDES performance and scalability: (1) For the same total chip area, what is the best design point in terms of the number of cores and the size of the on-chip cache? (2) What is the impact of using in-order vs. out-of-order cores? (3) What is the impact of a heterogeneous system with a mix of in-order and out-of-order cores? (4) What is the impact of object partitioning on PDES performance in heterogeneous systems? To answer these questions, we use MARSSx86 simulator for evaluating performance, and rely on Cacti and McPAT tools to derive the area and latency estimates for cores and caches.

References

[1]
O. Azizi, A. Mahesri, S. Patel, and M. Horowitz. Area-efficiency in cmp core design: co-optimization of microarchitecture and physical design. ACM SIGARCH Computer Architecture News, 37(2):56--65, 2009.
[2]
K. Bahulkar, N. Hofmann, D. Jagtap, N. B. Abu-Ghazaleh, and D. Ponomarev. Performance evaluation of PDES on multi-core clusters. In Proceedings of the 2010 IEEE/ACM 14th International Symposium on Distributed Simulation and Real Time Applications, (DS-RT 10), pages 131--140, 2010.
[3]
M. L. Bailey, J. V. Briner, Jr., and R. D. Chamberlain. Parallel logic simulation of VLSI systems. ACM Computing Surveys, 26(3):255--294, Sept. 1994.
[4]
J. Balfour and W. Dally. Design tradeoffs for tiled cmp on-chip networks. In Proceedings of the 20th annual international conference on Supercomputing, pages 187--198. ACM, 2006.
[5]
M. Bhadauria, V. Weaver, and S. McKee. Parsec: hardware profiling of emerging workloads for cmp design. In Proceedings of the 23rd international conference on Supercomputing, pages 509--510. ACM, 2009.
[6]
C. D. Carothers, D. Bauer, and S. Pearce. Ross: A high-performance, low-memory, modular time warp system. Journal of Parallel and Distributed Computing, 62(11):1648--1669, 2002.
[7]
R. Child and P. Wilsey. Dynamically adjusting core frequencies to accelerate time warp simulations in many-core processors. In Principles of Advanced and Distributed Simulation (PADS), pages 35--43. IEEE, 2012.
[8]
P. A. Fishwick. Simulation Model Design and Execution: Building Digital Worlds. Prentice Hall, Englewood Cliffs, NJ, 1995.
[9]
R. Fujimoto. Parallel discrete event simulation. Communications of the ACM, 33(10):30--53, Oct. 1990.
[10]
R. Fujimoto. Performance of time warp under synthetic workloads. 1990.
[11]
R. M. Fujimoto. Parallel and Distributed Simulation Systems. Wiley Interscience, Jan. 2000.
[12]
Z. Guz, O. Itzhak, I. Keidar, A. Kolodny, A. Mendelson, and U. Weiser. Threads vs. caches: modeling the behavior of parallel workloads. In Computer Design (ICCD), 2010 IEEE International Conference on, pages 274--281. IEEE, 2010.
[13]
J. Huh, D. Burger, and S. Keckler. Exploring the design space of future cmps. In Parallel Architectures and Compilation Techniques, 2001. Proceedings. 2001 International Conference on, pages 199--210. IEEE, 2001.
[14]
D. Jagtap, N. Abu-Ghazaleh, and D. Ponomarev. Optimization of parallel discrete event simulator for multi-core systems. In Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, pages 520--531. IEEE, 2012.
[15]
D. Jagtap, K. Bahulkar, D. Ponomarev, and N. Abu-Ghazaleh. Characterizing and understanding pdes behavior on tilera architecture. In Workshop on Principles of Advanced and Distributed Simulation (PADS 12), July 2012.
[16]
A. Jaleel, M. Mattina, and B. Jacob. Last level cache (llc) performance of data mining workloads on a cmp-a case study of parallel bioinformatics workloads. In High-Performance Computer Architecture, 2006. The Twelfth International Symposium on, pages 88--98. IEEE, 2006.
[17]
D. James. Intel ivy bridge unveiled -- the first commercial tri-gate, high-k, metal-gate cpu. In Custom Integrated Circuits Conference (CICC), pages 1--4, 2012.
[18]
D. Jefferson. Virtual time. ACM Transactions on Programming Languages and Systems, 7(3):405--425, July 1985.
[19]
R. Kumar, V. Zyuban, and D. Tullsen. Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. In Computer Architecture, 2005. ISCA'05. Proceedings. 32nd International Symposium on, pages 408--419. IEEE, 2005.
[20]
A. M. Law and W. D. Kelton. Simulation Modeling and Analysis. McGraw-Hill, 3rd edition, 2000.
[21]
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. Mcpat: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, pages 469--480, New York, NY, USA, 2009. ACM.
[22]
P. Lotfi-Kamran, B. Grot, M. Ferdman, S. Volos, O. Kocberber, J. Picorel, A. Adileh, D. Jevdjic, S. Idgunji, E. Ozer, et al. Scale-out processors. In Proceedings of the 39th International Symposium on Computer Architecture, pages 500--511. IEEE Press, 2012.
[23]
J. Misra. Distributed discrete-event simulation. Computing Surveys, 18(1):39--65, Mar. 1986.
[24]
M. Monchiero, R. Canal, and A. González. Design space exploration for multicore architectures: a power/performance/thermal view. In Proceedings of the 20th annual international conference on Supercomputing, pages 177--186. ACM, 2006.
[25]
H. Najaf-Abadi and E. Rotenberg. Configurational workload characterization. In Performance Analysis of Systems and software, 2008. ISPASS 2008. IEEE International Symposium on, pages 147--156. IEEE, 2008.
[26]
T. Oh, H. Lee, K. Lee, and S. Cho. An analytical model to study optimal area breakdown between cores and caches in a chip multiprocessor. In VLSI, 2009. ISVLSI'09. IEEE Computer Society Annual Symposium on, pages 181--186. IEEE, 2009.
[27]
A. Patel, F. Afram, S. Chen, and K. Ghose. MARSSx86: A Full System Simulator for x86 CPUs. In Design Automation Conference 2011 (DAC'11), 2011.
[28]
P. F. Reynolds Jr. A spectrum of options for parallel simulation. In Winter Simulation Conference, pages 325--332. Society for Computer Simulation, 1988.
[29]
S. Thoziyoor, N. Muralimanohar, J. Ahn, and N. Jouppi. Cacti 5.3. HP Laboratories, Palo Alto, CA, 2008.
[30]
R. Vitali, A. Pellegrini, and G. Cerasuolo. Cache-aware memory manager for optimistic simulations. In Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques, pages 129--138. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), 2012.
[31]
R. Vitali, A. Pellegrini, and F. Quaglia. Assessing load-sharing within optimistic simulation platforms. In Proceedings of the 2012 Winter Simulation Conference. IEEE, 2012.
[32]
R. Vitali, A. Pellegrini, and F. Quaglia. Towards symmetric multi-threaded optimistic simulation kernels. In Principles of Advanced and Distributed Simulation (PADS), pages 211--220. IEEE, 2012.
[33]
D. Wentzlaff, N. Beckmann, J. Miller, and A. Agarwal. Core count vs cache size for manycore architectures in the cloud. 2010.
[34]
B. P. Zeigler. Multifacetted Modelling and Discrete Event Simulation. Academic Press Inc. (London) Ltd., 24/28 Oval Road, London NW1, 1984.
[35]
E. Zhang, Y. Jiang, and X. Shen. Does cache sharing on modern cmp matter to the performance of contemporary multithreaded programs? In ACM Sigplan Notices, volume 45, pages 203--212. ACM, 2010.

Cited By

View all
  • (2015)Can MIC find its place in the field of PDES?Proceedings of the 19th International Symposium on Distributed Simulation and Real Time Applications10.1109/DS-RT.2015.23(41-49)Online publication date: 14-Oct-2015

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGSIM PADS '14: Proceedings of the 2nd ACM SIGSIM Conference on Principles of Advanced Discrete Simulation
May 2014
222 pages
ISBN:9781450327947
DOI:10.1145/2601381
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. PDES
  2. full-system simulation
  3. multi-cores

Qualifiers

  • Research-article

Funding Sources

Conference

SIGSIM-PADS '14
Sponsor:

Acceptance Rates

SIGSIM PADS '14 Paper Acceptance Rate 19 of 33 submissions, 58%;
Overall Acceptance Rate 398 of 779 submissions, 51%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 02 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2015)Can MIC find its place in the field of PDES?Proceedings of the 19th International Symposium on Distributed Simulation and Real Time Applications10.1109/DS-RT.2015.23(41-49)Online publication date: 14-Oct-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media