Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3337821.3337927acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Controlled Asynchronous GVT: Accelerating Parallel Discrete Event Simulation on Many-Core Clusters

Published: 05 August 2019 Publication History

Abstract

In this paper, we investigate the performance of Parallel Discrete Event Simulation (PDES) on a cluster of many-core Intel KNL processors. Specifically, we analyze the impact of different Global Virtual Time (GVT) algorithms in this environment and contribute three significant results. First, we show that it is essential to isolate the thread performing MPI communications from the task of processing simulation events, otherwise the simulation is significantly imbalanced and performs poorly. This applies to both synchronous and asynchronous GVT algorithms. Second, we demonstrate that synchronous GVT algorithm based on barrier synchronization is a better choice for communication-dominated models, while asynchronous GVT based on Mattern's algorithm performs better for computation-dominated scenarios. Third, we propose Controlled Asynchronous GVT (CA-GVT) algorithm that selectively adds synchronization to Mattern-style GVT based on simulation conditions. We demonstrate that CA-GVT outperforms both barrier and Mattern's GVT and achieves about 8% performance improvement on mixed computation-communication models. This is a reasonable improvement for a simple modification to a GVT algorithm.

References

[1]
A. Sodani amd R. Gramunt, J. Corbal, H. Kim, K. Vinod, S. Chinthamani, S. HUtsell, R. Agarwal, and Y. Liu. 2016. Knights Landing: Second-Generation Intel Xeon Phi Product. In IEEE Micro.
[2]
Abdelhalim Amer, Huiwei Lu, Yanjie Wei, Pavan Balaji, and Satoshi Matsuoka. 2015. MPI+ threads: Runtime contention and remedies. ACM SIGPLAN Notices 50, 8 (2015), 239--248.
[3]
Peter D Barnes Jr, Christopher D Carothers, David R Jefferson, and Justin M LaPre. 2013. Warp speed: executing time warp on 1,966,080 cores. In Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. ACM, 327--336.
[4]
D. Bauer, C. Carothers, and A. Holder. 2009. Scalable Time Warp on Bluegene Supercomputer. In Proc. of the ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS).
[5]
C. Carothers, D. Bauer, and S. Pearce. 2000. ROSS: A High-Performance, Low Memory, Modular Time Warp System. In Proc of the 11th Workshop on Parallel and Distributed Simulation (PADS).
[6]
K. M. Chandy and L. Lamport. 1985. Distributed Snapshots: Determining Global States of Distributed Systems. ACM Transactions on Computer Systems 3, 1 (Feb. 1985), 63--75.
[7]
H. Chen, Y.Yao, and W. Tang. 2015. Can MIC Find Its Place in the World of PDES?. In Proceedings of International Symposium on Distributed Simulation and Real Time Systems (DS-RT).
[8]
Gabriele D'Angelo, Stefano Ferretti, and Moreno Marzolla. 2012. Time Warp on the Go. In Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques (SIMUTOOLS '12). ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), ICST, Brussels, Belgium, Belgium, 242--248. http://dl.acm.org/citation.cfm?id=2263019.2263057
[9]
Ali Eker, Barry Williams, Nitesh Mishra, Dushyant Thakur, Kenneth Chiu, Dmitry Ponomarev, and Nael Abu-Ghazaleh. 2018. Performance Implications of Global Virtual Time Algorithms on a Knights Landing Processor. In 2018 IEEE/ACM 22nd International Symposium on Distributed Simulation and Real Time Applications (DS-RT). IEEE, 1--10.
[10]
R. Fujimoto. 1990. Parallel Discrete Event Simulation. Commun. ACM 33, 10 (Oct. 1990), 30--53.
[11]
R. Fujimoto. 1990. Performance of Time Warp under synthetic workloads. Proceedings of the SCS Multiconference on Distributed Simulation 22, 1 (Jan. 1990), 23--28.
[12]
G.Chrysos. 2012. Intel Xeon Phi x100 Family Coprocessor - the Architecture. In Intel white paper.
[13]
S. Gupta and P. A. Wilsey. 2014. Lock-Free Pending Event Set Management in Time Warp. In ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (PADS).
[14]
A. Heinecke, K. Vaidanathan, M. Smelianskiy, A. Kobutov, R. Dubtsov, G. Henri, A. Shet, G. Chrysos, and P. Dubey. 2013. Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems based on Intel Xeon Phi Coprocessor. In Proceedings of International Parallel and Distributed Processing Symposium (IPDPS).
[15]
M. Ianni, R. Marotta, D. Cingolani, A. Pellegrini, and F. Quaglia. 2018. The Ultimate Share-Everything PDES System. In 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. 73--84.
[16]
M. Ianni, R. Marotta, A. Pellegrini, and F. Quaglia. 2017. A non-blocking global virtual time algorithm with logarithmic number of memory operations. In 2017 IEEE/ACM 21st International Symposium on Distributed Simulation and Real Time Applications (DS-RT). 1--8.
[17]
Deepak Jagtap, Ketan Bahulkar, Dmitry Ponomarev, and Nael Abu-Ghazaleh. 2012. Characterizing and Understanding PDES Behavior on Tilera Architecture. In Workshop on Principles of Advanced and Distributed Simulation (PADS 12).
[18]
D. Jagtap, N.Abu-Ghazaleh, and D.Ponomarev. 2012. Optimization of Parallel Discrete Event Simulator for Multi-core Systems. In International Parallel and Distributed Processing Symposium.
[19]
D. Jefferson. 1985. Virtual Time. ACM Transactions on Programming Languages and Systems 7, 3 (July 1985), 405--425.
[20]
Z. Lin and Y. Yao. 2015. An asynchronous GVT computing algorithm in neuron time warp-multi thread. In 2015 Winter Simulation Conference (WSC). 1115--1126.
[21]
Jonatan Linden, Pavol Bauer, Stefan Engblom, and Bengt Jonsson. 2019. Exposing Inter-process Information for Efficient PDES of Spatial Stochastic Systems on Multicores. ACM Transactions on Modeling and Computer Simulation 29, 2, 0--25.
[22]
M. Lu, L. Zhang, H. Hyunh, Z. Ong, Y. Liang, B. He, R. Goh, and R. Huynh. 2013. Optimizing the MapReduce Framework on Intel Xeon Phi Coprocessor. In Proceedings of International Conference on Big Data.
[23]
F. Mattern. 1993. Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation. J. Parallel and Distrib. Comput. 18, 4 (Aug. 1993), 423--434.
[24]
G. Misra, N. Kurkure, A. Das, M.Valmiki, S. Das, and A. Gupta. 2013. Evaluation of Rodinia Codes on Intel Xeon Phi. In Proceedings of the 4th International Conference on Intelligent Systems, Modelling and Simulation.
[25]
Alessandro Pellegrini and Francesco Quaglia. 2014. Wait-free global virtual time computation in shared memory timewarp systems. In Computer Architecture and High Performance Computing (SBAC-PAD), 2014 IEEE 26th International Symposium on. IEEE, 9--16.
[26]
S. Pennycook, C. Hughes, M. Smelianskiy, and S. Jarvis. 2013. Exploring SIMD for Molecular Dynamics Using Intel Xeon Processor and Intel Xeon Phi Coprocessors. In Proceedings of International Parallel and Distributed Processing Symposium (IPDPS).
[27]
A. Ramachandran, J. Vienne, R. Wijmgaart, L. Koesterke, and I. Sharapov. 2013. Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi. In Proceedings of International Conference on Parallel Processing (ICPP).
[28]
B. Samadi. 1985. Distributed Simulation, Algorithms and Performance Analysis. Ph.D. Dissertation. Computer Science Department, University of California, Los Angeles, CA.
[29]
Jeff S. Steinman. 1993. Breathing Time Warp. In PADS '93 Proceedings of the seventh workshop on Parallel and distributed simulation. ACM, 109--118.
[30]
Jeff S. Steinman, Craig A. Lee, Linda F. Wilson, and David M. Nicol. 1995. Global virtual time and distributed synchronization. In Proceedings 9th Workshop on Parallel and Distributed Simulation (ACM/IEEE). IEEE, 139--148.
[31]
Jingjing Wang, Ketan Bahulkar, Dmitry Ponomarev, and Nael Abu-Ghazaleh. 2013. Can pdes scale in environments with heterogeneous delays?. In Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. ACM, 35--46.
[32]
Jingjing Wang, Deepak Jagtap, Nael Abu-Ghazaleh, and Dmitry Ponomarev. 2014. Parallel discrete event simulation for multi-core systems: Analysis and optimization. IEEE Transactions on Parallel and Distributed Systems 25, 6 (2014), 1574--1584.
[33]
Barry Williams, Dmitry Ponomarev, Nael Abu-Ghazaleh, and Philip Wilsey. 2017. Performance characterization of parallel discrete event simulation on knights landing processor. In Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. ACM, 121--132.
[34]
Biwei Xie, Xu Liu, Jianfeng Zhan, Zhen Jia, Yuqing Zhu, Lei Wang, and Lixin Zhang. 2015. Characterizing Data Analytics Workloads on Intel Xeon Phi. In Workload Characterization (IISWC), 2015 IEEE International Symposium on. IEEE, 114--115.

Cited By

View all
  • (2024)Devastator: A Scalable Parallel Discrete Event Simulation Framework for Modern C++Proceedings of the 38th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3615979.3656061(35-46)Online publication date: 24-Jun-2024
  • (2023)A Receiver-Driven Transport Protocol With High Link Utilization Using Anti-ECN Marking in Data Center NetworksIEEE Transactions on Network and Service Management10.1109/TNSM.2022.321834320:2(1898-1912)Online publication date: Jul-2023
  • (2022)NetGVTProceedings of the Symposium on SDN Research10.1145/3563647.3563648(16-24)Online publication date: 19-Oct-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '19: Proceedings of the 48th International Conference on Parallel Processing
August 2019
1107 pages
ISBN:9781450362955
DOI:10.1145/3337821
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • University of Tsukuba: University of Tsukuba

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 August 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Global Virtual Time
  2. Intel Xeon Phi
  3. Knights Landing
  4. Manycore Architectures
  5. Parallel Discrete Event Simulation
  6. Performance

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICPP 2019

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)2
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Devastator: A Scalable Parallel Discrete Event Simulation Framework for Modern C++Proceedings of the 38th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3615979.3656061(35-46)Online publication date: 24-Jun-2024
  • (2023)A Receiver-Driven Transport Protocol With High Link Utilization Using Anti-ECN Marking in Data Center NetworksIEEE Transactions on Network and Service Management10.1109/TNSM.2022.321834320:2(1898-1912)Online publication date: Jul-2023
  • (2022)NetGVTProceedings of the Symposium on SDN Research10.1145/3563647.3563648(16-24)Online publication date: 19-Oct-2022
  • (2021)GVT-Guided Demand-Driven Scheduling in Parallel Discrete Event SimulationProceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472470(1-10)Online publication date: 9-Aug-2021
  • (2021)High-Performance PDES on Manycore ClustersProceedings of the 2021 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3437959.3459252(153-164)Online publication date: 21-May-2021
  • (2021)Load-Aware Dynamic Time Synchronization in Parallel Discrete Event SimulationProceedings of the 2021 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3437959.3459249(95-105)Online publication date: 21-May-2021
  • (2020)AMRT: Anti-ECN Marking to Improve Utilization of Receiver-driven Transmission in Data CenterProceedings of the 49th International Conference on Parallel Processing10.1145/3404397.3404412(1-10)Online publication date: 17-Aug-2020
  • (2020)Demand-Driven PDESProceedings of the 2020 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3384441.3395976(39-48)Online publication date: 15-Jun-2020
  • (2019)Locality-aware process placement for parallel and distributed simulation in cloud data centersThe Journal of Supercomputing10.1007/s11227-019-02973-975:11(7723-7745)Online publication date: 28-Aug-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media