research-article

Controlled Asynchronous GVT: Accelerating Parallel Discrete Event Simulation on Many-Core Clusters

Authors:

Barry Williams,

Dmitry PonomarevAuthors Info & Claims

ICPP '19: Proceedings of the 48th International Conference on Parallel Processing

Article No.: 64, Pages 1 - 10

https://doi.org/10.1145/3337821.3337927

Published: 05 August 2019 Publication History

Abstract

In this paper, we investigate the performance of Parallel Discrete Event Simulation (PDES) on a cluster of many-core Intel KNL processors. Specifically, we analyze the impact of different Global Virtual Time (GVT) algorithms in this environment and contribute three significant results. First, we show that it is essential to isolate the thread performing MPI communications from the task of processing simulation events, otherwise the simulation is significantly imbalanced and performs poorly. This applies to both synchronous and asynchronous GVT algorithms. Second, we demonstrate that synchronous GVT algorithm based on barrier synchronization is a better choice for communication-dominated models, while asynchronous GVT based on Mattern's algorithm performs better for computation-dominated scenarios. Third, we propose Controlled Asynchronous GVT (CA-GVT) algorithm that selectively adds synchronization to Mattern-style GVT based on simulation conditions. We demonstrate that CA-GVT outperforms both barrier and Mattern's GVT and achieves about 8% performance improvement on mixed computation-communication models. This is a reasonable improvement for a simple modification to a GVT algorithm.

References

[1]

A. Sodani amd R. Gramunt, J. Corbal, H. Kim, K. Vinod, S. Chinthamani, S. HUtsell, R. Agarwal, and Y. Liu. 2016. Knights Landing: Second-Generation Intel Xeon Phi Product. In IEEE Micro.

Digital Library

[2]

Abdelhalim Amer, Huiwei Lu, Yanjie Wei, Pavan Balaji, and Satoshi Matsuoka. 2015. MPI+ threads: Runtime contention and remedies. ACM SIGPLAN Notices 50, 8 (2015), 239--248.

Digital Library

[3]

Peter D Barnes Jr, Christopher D Carothers, David R Jefferson, and Justin M LaPre. 2013. Warp speed: executing time warp on 1,966,080 cores. In Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. ACM, 327--336.

Digital Library

[4]

D. Bauer, C. Carothers, and A. Holder. 2009. Scalable Time Warp on Bluegene Supercomputer. In Proc. of the ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS).

Digital Library

[5]

C. Carothers, D. Bauer, and S. Pearce. 2000. ROSS: A High-Performance, Low Memory, Modular Time Warp System. In Proc of the 11th Workshop on Parallel and Distributed Simulation (PADS).

Digital Library

[6]

K. M. Chandy and L. Lamport. 1985. Distributed Snapshots: Determining Global States of Distributed Systems. ACM Transactions on Computer Systems 3, 1 (Feb. 1985), 63--75.

Digital Library

[7]

H. Chen, Y.Yao, and W. Tang. 2015. Can MIC Find Its Place in the World of PDES?. In Proceedings of International Symposium on Distributed Simulation and Real Time Systems (DS-RT).

[8]

Gabriele D'Angelo, Stefano Ferretti, and Moreno Marzolla. 2012. Time Warp on the Go. In Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques (SIMUTOOLS '12). ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), ICST, Brussels, Belgium, Belgium, 242--248. http://dl.acm.org/citation.cfm?id=2263019.2263057

Digital Library

[9]

Ali Eker, Barry Williams, Nitesh Mishra, Dushyant Thakur, Kenneth Chiu, Dmitry Ponomarev, and Nael Abu-Ghazaleh. 2018. Performance Implications of Global Virtual Time Algorithms on a Knights Landing Processor. In 2018 IEEE/ACM 22nd International Symposium on Distributed Simulation and Real Time Applications (DS-RT). IEEE, 1--10.

Digital Library

[10]

R. Fujimoto. 1990. Parallel Discrete Event Simulation. Commun. ACM 33, 10 (Oct. 1990), 30--53.

Digital Library

[11]

R. Fujimoto. 1990. Performance of Time Warp under synthetic workloads. Proceedings of the SCS Multiconference on Distributed Simulation 22, 1 (Jan. 1990), 23--28.

[12]

G.Chrysos. 2012. Intel Xeon Phi x100 Family Coprocessor - the Architecture. In Intel white paper.

[13]

S. Gupta and P. A. Wilsey. 2014. Lock-Free Pending Event Set Management in Time Warp. In ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (PADS).

Digital Library

[14]

A. Heinecke, K. Vaidanathan, M. Smelianskiy, A. Kobutov, R. Dubtsov, G. Henri, A. Shet, G. Chrysos, and P. Dubey. 2013. Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems based on Intel Xeon Phi Coprocessor. In Proceedings of International Parallel and Distributed Processing Symposium (IPDPS).

Digital Library

[15]

M. Ianni, R. Marotta, D. Cingolani, A. Pellegrini, and F. Quaglia. 2018. The Ultimate Share-Everything PDES System. In 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. 73--84.

Digital Library

[16]

M. Ianni, R. Marotta, A. Pellegrini, and F. Quaglia. 2017. A non-blocking global virtual time algorithm with logarithmic number of memory operations. In 2017 IEEE/ACM 21st International Symposium on Distributed Simulation and Real Time Applications (DS-RT). 1--8.

Digital Library

[17]

Deepak Jagtap, Ketan Bahulkar, Dmitry Ponomarev, and Nael Abu-Ghazaleh. 2012. Characterizing and Understanding PDES Behavior on Tilera Architecture. In Workshop on Principles of Advanced and Distributed Simulation (PADS 12).

Digital Library

[18]

D. Jagtap, N.Abu-Ghazaleh, and D.Ponomarev. 2012. Optimization of Parallel Discrete Event Simulator for Multi-core Systems. In International Parallel and Distributed Processing Symposium.

Digital Library

[19]

D. Jefferson. 1985. Virtual Time. ACM Transactions on Programming Languages and Systems 7, 3 (July 1985), 405--425.

Digital Library

[20]

Z. Lin and Y. Yao. 2015. An asynchronous GVT computing algorithm in neuron time warp-multi thread. In 2015 Winter Simulation Conference (WSC). 1115--1126.

Digital Library

[21]

Jonatan Linden, Pavol Bauer, Stefan Engblom, and Bengt Jonsson. 2019. Exposing Inter-process Information for Efficient PDES of Spatial Stochastic Systems on Multicores. ACM Transactions on Modeling and Computer Simulation 29, 2, 0--25.

Digital Library

[22]

M. Lu, L. Zhang, H. Hyunh, Z. Ong, Y. Liang, B. He, R. Goh, and R. Huynh. 2013. Optimizing the MapReduce Framework on Intel Xeon Phi Coprocessor. In Proceedings of International Conference on Big Data.

[23]

F. Mattern. 1993. Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation. J. Parallel and Distrib. Comput. 18, 4 (Aug. 1993), 423--434.

Digital Library

[24]

G. Misra, N. Kurkure, A. Das, M.Valmiki, S. Das, and A. Gupta. 2013. Evaluation of Rodinia Codes on Intel Xeon Phi. In Proceedings of the 4th International Conference on Intelligent Systems, Modelling and Simulation.

Digital Library

[25]

Alessandro Pellegrini and Francesco Quaglia. 2014. Wait-free global virtual time computation in shared memory timewarp systems. In Computer Architecture and High Performance Computing (SBAC-PAD), 2014 IEEE 26th International Symposium on. IEEE, 9--16.

Digital Library

[26]

S. Pennycook, C. Hughes, M. Smelianskiy, and S. Jarvis. 2013. Exploring SIMD for Molecular Dynamics Using Intel Xeon Processor and Intel Xeon Phi Coprocessors. In Proceedings of International Parallel and Distributed Processing Symposium (IPDPS).

Digital Library

[27]

A. Ramachandran, J. Vienne, R. Wijmgaart, L. Koesterke, and I. Sharapov. 2013. Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi. In Proceedings of International Conference on Parallel Processing (ICPP).

Digital Library

[28]

B. Samadi. 1985. Distributed Simulation, Algorithms and Performance Analysis. Ph.D. Dissertation. Computer Science Department, University of California, Los Angeles, CA.

Digital Library

[29]

Jeff S. Steinman. 1993. Breathing Time Warp. In PADS '93 Proceedings of the seventh workshop on Parallel and distributed simulation. ACM, 109--118.

Digital Library

[30]

Jeff S. Steinman, Craig A. Lee, Linda F. Wilson, and David M. Nicol. 1995. Global virtual time and distributed synchronization. In Proceedings 9th Workshop on Parallel and Distributed Simulation (ACM/IEEE). IEEE, 139--148.

Digital Library

[31]

Jingjing Wang, Ketan Bahulkar, Dmitry Ponomarev, and Nael Abu-Ghazaleh. 2013. Can pdes scale in environments with heterogeneous delays?. In Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. ACM, 35--46.

Digital Library

[32]

Jingjing Wang, Deepak Jagtap, Nael Abu-Ghazaleh, and Dmitry Ponomarev. 2014. Parallel discrete event simulation for multi-core systems: Analysis and optimization. IEEE Transactions on Parallel and Distributed Systems 25, 6 (2014), 1574--1584.

Digital Library

[33]

Barry Williams, Dmitry Ponomarev, Nael Abu-Ghazaleh, and Philip Wilsey. 2017. Performance characterization of parallel discrete event simulation on knights landing processor. In Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. ACM, 121--132.

Digital Library

[34]

Biwei Xie, Xu Liu, Jianfeng Zhan, Zhen Jia, Yuqing Zhu, Lei Wang, and Lixin Zhang. 2015. Characterizing Data Analytics Workloads on Intel Xeon Phi. In Workload Characterization (IISWC), 2015 IEEE International Symposium on. IEEE, 114--115.

Digital Library

Cited By

Bachan JYe JJiang XNguyen TNatarajan MBremer MChan C(2024)Devastator: A Scalable Parallel Discrete Event Simulation Framework for Modern C++Proceedings of the 38th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3615979.3656061(35-46)Online publication date: 24-Jun-2024
https://dl.acm.org/doi/10.1145/3615979.3656061
Hu JHuang JLi ZWang JHe T(2023)A Receiver-Driven Transport Protocol With High Link Utilization Using Anti-ECN Marking in Data Center NetworksIEEE Transactions on Network and Service Management10.1109/TNSM.2022.321834320:2(1898-1912)Online publication date: Jul-2023
https://doi.org/10.1109/TNSM.2022.3218343
Parizotto RMello BHaque ISchaeffer-Filho A(2022)NetGVTProceedings of the Symposium on SDN Research10.1145/3563647.3563648(16-24)Online publication date: 19-Oct-2022
https://dl.acm.org/doi/10.1145/3563647.3563648
Show More Cited By

Recommendations

GVT-Guided Demand-Driven Scheduling in Parallel Discrete Event Simulation
ICPP '21: Proceedings of the 50th International Conference on Parallel Processing

The performance and scalability of Parallel Discrete Event Simulation (PDES) can be significantly impacted by temporarily inactive threads that occupy CPU resources but do no useful processing. A recent design called Demand-Driven PDES (DD-PDES) ...
Demand-Driven PDES: Exploiting Locality in Simulation Models
SIGSIM-PADS '20: Proceedings of the 2020 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

Traditional parallel discrete event simulation (PDES) systems treat each simulation thread in the same manner, regardless of whether a thread has events to process in its input queue or not. At the same time, many real-life simulation models exhibit ...
Performance implications of global virtual time algorithms on a knights landing processor
DS-RT '18: Proceedings of the 22nd International Symposium on Distributed Simulation and Real Time Applications

Recent studies investigated the performance of Parallel Discrete Event Simulation (PDES) on Intel Xeon Phi many-core processors, but generally reported underwhelming performance results, especially at high scales when all cores and thread contexts are ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '19: Proceedings of the 48th International Conference on Parallel Processing

August 2019

1107 pages

ISBN:9781450362955

DOI:10.1145/3337821

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

University of Tsukuba: University of Tsukuba

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 August 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICPP 2019

ICPP 2019: 48th International Conference on Parallel Processing

August 5 - 8, 2019

Kyoto, Japan

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
88
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)2

Reflects downloads up to 28 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bachan JYe JJiang XNguyen TNatarajan MBremer MChan C(2024)Devastator: A Scalable Parallel Discrete Event Simulation Framework for Modern C++Proceedings of the 38th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3615979.3656061(35-46)Online publication date: 24-Jun-2024
https://dl.acm.org/doi/10.1145/3615979.3656061
Hu JHuang JLi ZWang JHe T(2023)A Receiver-Driven Transport Protocol With High Link Utilization Using Anti-ECN Marking in Data Center NetworksIEEE Transactions on Network and Service Management10.1109/TNSM.2022.321834320:2(1898-1912)Online publication date: Jul-2023
https://doi.org/10.1109/TNSM.2022.3218343
Parizotto RMello BHaque ISchaeffer-Filho A(2022)NetGVTProceedings of the Symposium on SDN Research10.1145/3563647.3563648(16-24)Online publication date: 19-Oct-2022
https://dl.acm.org/doi/10.1145/3563647.3563648
Eker ATimmerman DWilliams BChiu KPonomarev D(2021)GVT-Guided Demand-Driven Scheduling in Parallel Discrete Event SimulationProceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472470(1-10)Online publication date: 9-Aug-2021
https://dl.acm.org/doi/10.1145/3472456.3472470
Williams BEker AChiu KPonomarev DDiallo STolk AGiabbanelli P(2021)High-Performance PDES on Manycore ClustersProceedings of the 2021 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3437959.3459252(153-164)Online publication date: 21-May-2021
https://dl.acm.org/doi/10.1145/3437959.3459252
Eker AArafa YBadawy ASanthi NEidenbenz SPonomarev DDiallo STolk AGiabbanelli P(2021)Load-Aware Dynamic Time Synchronization in Parallel Discrete Event SimulationProceedings of the 2021 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3437959.3459249(95-105)Online publication date: 21-May-2021
https://dl.acm.org/doi/10.1145/3437959.3459249
Hu JHuang JLi ZWang JHe T(2020)AMRT: Anti-ECN Marking to Improve Utilization of Receiver-driven Transmission in Data CenterProceedings of the 49th International Conference on Parallel Processing10.1145/3404397.3404412(1-10)Online publication date: 17-Aug-2020
https://dl.acm.org/doi/10.1145/3404397.3404412
Eker AWilliams BChiu KPonomarev DLiu JGiabbanelli PCarothers C(2020)Demand-Driven PDESProceedings of the 2020 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3384441.3395976(39-48)Online publication date: 15-Jun-2020
https://dl.acm.org/doi/10.1145/3384441.3395976
Zaheer SMalik ARahman AKhan S(2019)Locality-aware process placement for parallel and distributed simulation in cloud data centersThe Journal of Supercomputing10.1007/s11227-019-02973-975:11(7723-7745)Online publication date: 28-Aug-2019
https://doi.org/10.1007/s11227-019-02973-9

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents