Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2486092.2486098acmconferencesArticle/Chapter ViewAbstractPublication PagespadsConference Proceedingsconference-collections
research-article

Can PDES scale in environments with heterogeneous delays?

Published: 19 May 2013 Publication History

Abstract

The performance and scalability of Parallel Discrete Event Simulation (PDES) is often limited by communication latencies and overheads. The emergence of multi-core processors and their expected evolution into many-cores offers the promise of low latency communication and tight memory integration between cores; these properties should significantly improve the performance of PDES in such environments. However, on clusters of multi-cores (CMs), the latency and processing overheads incurred when communicating between different machines (nodes) far outweigh those between cores on the same chip, especially when commodity networking fabrics and communication software are used. It is unclear if there is any benefit to the low latency among cores on the same node given that communication links across nodes are significantly worse. In this study, we examine the performance of a multi-threaded implementation of PDES on CMs. We demonstrate that the inter-node communication costs impose a substantial bottleneck on PDES and demonstrate that without optimizations addressing these long latencies, multi-threaded PDES does not significantly outperform the multiprocess version despite direct communication through shared memory on the individual nodes. We then propose three optimizations: message consolidation and routing, infrequent polling and latency-sensitive model partitioning. We show that with these optimizations in place, threaded implementation of PDES significantly outperforms process-based implementation even on CMs.

References

[1]
K. Bahulkar, J. Wang, N. Abu-Ghazaleh, and D. Ponomarev. Partitioning on dynamic behavior for parallel discrete event simulation. In Principles of Advanced and Distributed Simulation (PADS), pages 221--230. IEEE, 2012.
[2]
M. L. Bailey, J. V. Briner, Jr., and R. D. Chamberlain. Parallel logic simulation of VLSI systems. ACM Computing Surveys, 26(3):255--294, sep 1994.
[3]
D. Bauer, C. Carothers, and A. Holder. Scalable time warp on bluegene supercomputer. In Principles of Advanced and Distributed Simulation (PADS), pages 35--44, 2009.
[4]
A. Boukerche and S. Das. Dynamic load balancing strategies for conservative parallel simulation. In Principles of Advanced and Distributed Simulation (PADS), pages 32--37, 1997.
[5]
A. Canedo, T. Yoshizawa, and H.Komatsu. Automatic parallelization of simulink applications. In Proc. of CGO, pages 151--159, 2010.
[6]
C. Carothers, D. Bauer, and S. Pearce. ROSS: A high-performance, low memory, modular time warp system. In Principles of Advanced and Distributed Simulation (PADS), pages 53--60. IEEE, 2000.
[7]
C. D. Carothers, R. M. Fujimoto, and P. England. Effect of communication overheads on Time Warp performance: An experimental study. In Principles of Advanced and Distributed Simulation (PADS), pages 118--125, jul 1994.
[8]
C. D. Carothers, R. M. Fujimoto, and Y.-B. Lin. A case study in simulating pcs networks using time warp. In Principles of Advanced and Distributed Simulation (PADS), pages 87--94. IEEE, 1995.
[9]
C. Chen, J. Zhang, R. Cohen, and P.Ho. Secure and efficient trust opinion aggregation for vehicular ad-hoc networks. In Proc. of VTC, pages 1--5, 2010.
[10]
L. Chen, Y. Lu, Y. Yao, S. Peng, and L. Wu. A well-balanced time warp system on multi-core environments. In Principles of Advanced and Distributed Simulation (PADS), pages 1--9. IEEE, 2011.
[11]
M. Chetlur, N. Abu-Ghazaleh, R. Radhakrishnan, and P. A. Wilsey. Optimizing communication in Time-Warp simulators. In Principles of Advanced and Distributed Simulation (PADS), pages 64--71. IEEE, 1998.
[12]
R. Child and P. Wilsey. Dynamically adjusting core frequencies to accelerate time warp simulations in many-core processors. In Principles of Advanced and Distributed Simulation (PADS), pages 35--43. IEEE, 2012.
[13]
J. Cloutier. Model partitioning and the performance of distributed timewarp simulation of logic circuits. Simulation Practice and Theory, 5(1):83--99, 1997.
[14]
J. Doi and Y. Negishi. Overlapping methods of all-to-all communication and FFT algorithms for torus-connected massively parallel supercomputers. In Proc. of Int'l Conference on Supercomputing, pages 1--9, 2010.
[15]
K. El-Khatib and C. Tropper. On metrics for the dynamic load balancing of optimistic simulations. In Proc. 32nd Hawaii International Conference on Systems Science (HICCS), 1999.
[16]
R. Fujimoto. Parallel discrete event simulation. Communications of the ACM, 33(10):30--53, oct 1990.
[17]
R. Fujimoto. Performance of time warp under synthetic workloads. Proceedings of the SCS Multiconference on Distributed Simulation, 22(1):23--28, 1990.
[18]
D. Jagtap, K. Bahulkar, D.Ponomarev, and N.Abu-Ghazaleh. Characterizing and understanding pdes behavior on tilera architecture. In Principles of Advanced and Distributed Simulation (PADS), pages 53--62. IEEE, 2012.
[19]
D. Jagtap, N.Abu-Ghazaleh, and D.Ponomarev. Optimization of parallel discrete event simulator for multi-core systems. In Parallel and Distributed Processing Symposium (IPDPS), pages 520--531. IEEE, 2012.
[20]
G. Karypis and V. Kumar. hmetis: a hypergraph partitioning package. Available on WWW at URL: http://www.cs.umn.edu/ karypis/metis/hmetis.
[21]
K.Bahulkar, N.Hofmann, D.Jagtap, N.Abu-Ghazaleh, and D.Ponomarev. Performance evaluation of pdes on multicore clusters. In 14th IEEE/ACM International Symposium on Distributed Simulation and Real-Time Applications (DS-RT), pages 131--140, 2010.
[22]
K.S.Perumalla. Scaling time warp-based discrete event execution to 104 processors on a blue gene supercomputer. In in Proceedings of the ACM Computing Frontiers, pages 69--76, 2007.
[23]
L. Li and C. Tropper. A design-driven partitioning algorithm for distributed verilog simulation. In Principles of Advanced and Distributed Simulation (PADS), pages 211--218. IEEE, 2007.
[24]
J. Liu, B. chandrasekaran, J. Wu, W. Jiang, S. Kini, W. Yu, D. Buntinas, P. Wyckoff, and D. Panda. Performance comparison of mpi implementations over infiniband, myrinet and quadrics. In Proc. of ACM/IEEE conference on Supercomputing, pages 58--71. IEEE, nov 2003.
[25]
J. Liu and R. Rong. Hierarchical composite synchronization. In Principles of Advanced and Distributed Simulation (PADS), pages 3--12. IEEE, 2012.
[26]
P. Peschlow, T. Honecker, and P. Martini. A flexible dynamic partitioning algorithm for optimistic distributed simulation. In Principles of Advanced and Distributed Simulation (PADS), pages 219--228. IEEE, 2007.
[27]
R. Preissl, N. Wichmann, B. Long, J. Shalf, S. Ethier, and A. Koniges. Multithreaded global address space communication techniques for gyrokinetic fusion applications on ultra-scale platforms. In Proc. of Int'l Conference on Supercomputing, 2011.
[28]
V. Sarkar and J. Hennessy. Compile-time partitioning and scheduling of parallel programs. In Proc. of the SIGPLAN Symposium on Compiler construction, pages 17--26, 1986.
[29]
G. D. Sharma, N. B. Abu-Ghazaleh, U. V. Rajasekaran, and P. A. Wilsey. Optimizing message delivery in asynchronous distributed applications. In Proc. of Euro-Par, pages 1204--1208, 1998.
[30]
G. D. Sharma, R. Radhakrishnan, U. V. Rajesekaran, N. B. Abu-Ghazaleh, and P. A. Wilsey. Time warp simulation on clumps. In Principles of Advanced and Distributed Simulation (PADS), pages 174--181, may 1999.
[31]
R. Vitali, A. Pellegrini, and F. Quaglia. Assessing load-sharing within optimistic simulation platforms. In Proceedings of the 2012 Winter Simulation Conference. IEEE, 2012.
[32]
R. Vitali, A. Pellegrini, and F. Quaglia. Towards symmetric multi-threaded optimistic simulation kernels. In Principles of Advanced and Distributed Simulation (PADS), pages 211--220. IEEE, 2012.
[33]
J. Wang, D.Ponomarev, and N.Abu-Ghazaleh. Performance analysis of a multithreaded pdes simulator on multicore clusters. In Principles of Advanced and Distributed Simulation (PADS) (Short Paper), pages 93--95. IEEE, 2012.

Cited By

View all
  • (2021)High-Performance PDES on Manycore ClustersProceedings of the 2021 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3437959.3459252(153-164)Online publication date: 21-May-2021
  • (2019)Controlled Asynchronous GVTProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337927(1-10)Online publication date: 5-Aug-2019
  • (2018)Leveraging shared memory in the ross time warp simulator for complex network simulationsProceedings of the 2018 Winter Simulation Conference10.5555/3320516.3320974(3837-3848)Online publication date: 9-Dec-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGSIM PADS '13: Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation
May 2013
426 pages
ISBN:9781450319201
DOI:10.1145/2486092
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 May 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cluster of multi-cores
  2. multi-thread
  3. pdes

Qualifiers

  • Research-article

Conference

SIGSIM-PADS '13
Sponsor:

Acceptance Rates

SIGSIM PADS '13 Paper Acceptance Rate 29 of 75 submissions, 39%;
Overall Acceptance Rate 398 of 779 submissions, 51%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 31 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)High-Performance PDES on Manycore ClustersProceedings of the 2021 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3437959.3459252(153-164)Online publication date: 21-May-2021
  • (2019)Controlled Asynchronous GVTProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337927(1-10)Online publication date: 5-Aug-2019
  • (2018)Leveraging shared memory in the ross time warp simulator for complex network simulationsProceedings of the 2018 Winter Simulation Conference10.5555/3320516.3320974(3837-3848)Online publication date: 9-Dec-2018
  • (2015)AIRACM Transactions on Modeling and Computer Simulation10.1145/270142025:3(1-25)Online publication date: 16-Apr-2015
  • (2014)Parallel Discrete Event Simulation for Multi-Core SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2013.19325:6(1574-1584)Online publication date: 1-Jun-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media